
A new frontier in artificial intelligence is taking shape around “world models,” systems designed to simulate and understand three-dimensional environments rather than just process language. Chinese tech giants and leading researchers, including Fei-Fei Li, are moving quickly to establish leadership in this emerging space, signaling a shift beyond large language models toward more physically grounded AI, tells South China Morning Post.
Companies such as Alibaba Group are investing heavily in these capabilities. Alibaba recently introduced a system called Happy Oyster, described as a “world model” capable of generating and interacting with virtual environments in real time. The technology is intended to support applications ranging from gaming and digital content creation to robotics and autonomous systems, where understanding physical space is essential.
At the same time, Li’s startup, World Labs, is advancing its own models, including Spark 2.0, aimed at building “spatial intelligence.” These systems seek to interpret and simulate real-world physics and environments, a capability widely seen as critical for the next generation of AI.
The race reflects a broader strategic shift within the AI industry. While language-based systems have dominated recent progress, world models are viewed as a key step toward more general-purpose intelligence. They could enable machines to reason about physical interactions, train robots in simulated environments, and create immersive digital worlds with minimal human input.
China’s ecosystem may offer advantages in this domain. The country’s strong industrial base and access to large volumes of real-world data provide a foundation for training models that simulate physical processes. Faster deployment cycles also allow companies to test and refine systems more quickly than competitors.
As competition intensifies, the development of world models is emerging as a defining battleground in AI. The outcome could shape not only virtual environments but also the future of robotics, automation, and human–machine interaction.