Developing algorithms capable of generating realistic, high quality 3D content at scale has been a long standing problem in Computer Vision and Graphics. We anticipate that having generative models that can reliably synthesize meaningful 3D content will completely revolutionize the workflow of artists and content creators, and will also enable new levels of creativity through ``generative art". Although recently there has been considerable success in generating photorealistic images, the quality and generality of 3D generative models has lagged behind their 2D counterparts. Additionally, efficiently controlling what needs to be generated and scaling these approaches to complex scenes with several static and dynamic objects still remains an open challenge.
In this workshop, we seek to bring together researchers working on generative models for 3D shapes, humans, and scenes to discuss the latest advances, existing limitations and next steps towards developing generative pipelines capable of producing fully controllable 3D environments with multiple humans interacting with each other or with objects in the scene. In the last few years, there has been significant progress in generating 3D objects, humans, and scenes independently, but only recently has the research community shifted their attention towards generating meaningful dynamics and interactions between humans or humans and other scene elements. To this end, in our workshop we look forward to cover the following topics:
- Representations: What is the most appropriate representation for generating meaningful high-quality 3D textured objects? What is the best representation that can enable intuitive control over the generated objects? How can we effectively represent interactions between humans and objects?
- Modelling: How can we construct powerful models that can reliably generate humans performing plausible real life activities? How can we construct an accurate 3D model of our world that could allow us to manipulate both the appearance of the scene elements as well as their spatial composition?
- Common Sense Reasoning and Knowledge: How can we incorporate common sense knowledge about 3D objects and scenes, such as part structures, arrangements of objects from Large Language Models and Vision Language Models to enable training with fewer data?
- Applications: How can we leverage 2D priors to enable photo-realistic 3D content creation? What are the creative tools that designers and product designers really need make their workflow more efficient? Are there new fields that could benefit from generative AI, such as construction and agriculture?
- Risks and Ethical Considerations: What are the ethical implications that arise from artificially-generated 3D content and how we can address them?