A concept in artificial intelligence referring to internal representations that allow AI systems to make sense of the complex, unpredictable physical or virtual environments in which they are put to work. The idea dates back to a 1943 book by Kenneth Craik, a Scottish psychologist, who suggested that organisms carry a "small-scale model" of the world inside their heads, to test hypotheses on before carrying them out in reality.
Giving that ability to AI systems was a promising area of research as far back as the 1990s, before large language models sucked away the world's attention. Interest has since returned.
Three approaches are being explored. The first starts from AI video generators: generating a coherent video depends on simulating a coherent world, and such rudimentary world models can fill in details beyond what they have been fed. Google's Project Genie, released in January 2026, is the culmination of this approach—an experimental model that generates interactive worlds from a prompt, including images and text. Its simulations, however, run for a maximum of 60 seconds before fraying at the edges.
The second approach seeks to create full 3D environments rather than 2D simulations. Fei Fei Li, a computer scientist at Stanford University, calls this "spatial intelligence". Her startup, World Labs, has built a world model called Marble that creates internally consistent, complete digital 3D worlds.
The third, pursued by Yann LeCun, formerly Meta's chief AI scientist, argues that focusing on real spaces is a distraction. His Joint-Embedding Predictive Architecture (JEPA) would allow an AI to simulate complex features of the real world and plan ahead—gauging the weather before deciding on an umbrella, say—without needing to visualise every single second of the day.
Some researchers believe existing generative AI systems already contain world models within them. Ilya Sutskever, an OpenAI co-founder, has said that training a large language model is "no more than learning a world model". In 2023 a language model trained on a list of moves in the game Othello was shown to have reflected the board state within its own neural network—even though it had never seen a board nor been taught the rules. Anthropic has found clusters of artificial neurons in its Claude models that correspond to anything from feelings of guilt to the Golden Gate bridge; reaching in and changing them causes corresponding changes to the models' behaviour, suggesting a consistent internal understanding of physical features. Dr Li disagrees, arguing that language alone cannot give a grounded understanding of the world.
Life is cheap, but the accessories can kill you.