A project page for procedural, controllable, and scalable 3D city generation.
The automated generation of interactive 3D cities is a critical challenge with broad applications in autonomous driving, virtual reality, and embodied intelligence. While recent advances in generative models and procedural techniques have improved the realism and scalability of city generation, existing methods often struggle with high-fidelity asset creation, controllability, and manipulation. In this work, we present CityGenAgent, a natural language-driven framework based on large language models (LLMs) for hierarchical procedural generation of high-quality 3D cities. Our approach introduces two core programs—Block Program and Building Program—which decompose city generation into interpretable and editable components. BlockGen and BuildingGen are trained to generate and execute these programs. We design Spatial Alignment Reward to enhance spatial reasoning and Visual Consistency Reward to bridge the gap between textual program descriptions and their 3D visual realizations. Additionally, benefiting from the use of programs and the model's generalization capabilities, our framework allows users to manipulate the results via natural language. Comprehensive evaluations show that CityGenAgent achieves impressive semantic alignment and higher visual quality, establishing a stronger foundation for broad applications.