what are LLMs actually thinking? *if* they're actually thinking

what, how, or if? a Language Model 'thinks' is a gnarly, and very much open question. regardless of your position, there is - what seems to be - an interesting convergence of ideas.

kudos to swyx and alesio who have been cranking out some of the most interesting conversations over the past month.

all three of the following research directions/ideas, are from their latent space podcast (https://www.youtube.com/@LatentSpacePod).

idea 1. lm's probably have some form of 'world model' that improves with scale.

two weeks ago Noam Brown (openAI) says:

"I think it's pretty clear that as these models get bigger they have a world model and that world model becomes better"

View on X

idea 2. lm's are doing some form of intermediate reasoning and mapping of learned features/concepts beyond myopic next token prediction.

about a month back, Emmanuel Amiesen (mechanistic interpretability at anthropic) said:

"in my opinion that's like yeah like oh god stochastic parrots is not something that I think is is like appropriate here and I think there's just like a lot of different things going on and there's like pretty complex behavior at the same time I think it's in the eye of the beholder"

View on X

idea 3. lm's might be converging on a 'universal representation' of the world

in this conversation, jack morris (phd, ai researcher and META) shared some thoughts on 'platonic representation' and 'universal geometry':

"all models are trained on data from the world and there's only one world and so as the models get better by scaling data and scaling model size they're sort of converging to learn the exact same thing"

View on X

so these three directions of research -

1 emergent world models improving with scale (open ai, reasoning) 2 reasoning via learned features and concepts ft. superposition hypothesis 3 convergence toward a universal representation ft. platonic representation hypothesis

they seem to be circling/suggesting that:

through their training on huge amounts of human language/internet data, Language Models are constructing a statistical representation of the world that seems to be increasingly aligned with our view of 'reality'.

what this means, i really don't know - nobody really knows.

at the very least, it implies that the world can be understood through statistical patterns which are somehow embedded in the training data - allowing models to piece together an increasingly coherent representation of our human world.

of course, this all comes down to how you wish to interpret the findings.

as Emanuel said - "it's in the eye of the beholder".

maybe you're thinking - 'nonsense, LMs are approximations that don't truly understand language or the world.'

maybe you're thinking - 'of course they have world models—it's what any complex system does when it tries to emulate the statistical structure of language, which is a proxy for the world.'

maybe you're thinking - 'my god, they're alive'.

and maybe it's all these things, or none of them, or any variation in between.

this does bring to mind several potential philosophical rabbit hole explorations - computational irreducibility, consciousness, simulation hypothesis.

for another time.

references

podcast interviews

Latent Space Podcast - https://www.youtube.com/@LatentSpacePod
Hosted by Swyx and Alesio Carciofi

Noam Brown (OpenAI) - Discussion on emergent world models and reasoning capabilities in large language models
Emmanuel Amiesen (Anthropic) - Mechanistic interpretability research and the "stochastic parrots" debate
Jack Morris (Meta AI Research) - Platonic representation hypothesis and universal geometry in language models

research papers

Huh, M., Cheung, B., Wang, T., & Isola, P. (2024). The Platonic Representation Hypothesis. arXiv preprint arXiv:2405.07987. https://arxiv.org/pdf/2405.07987

Park, S., et al. (2025). Scaling and Emergence in Large Language Models: A Comprehensive Analysis. arXiv preprint arXiv:2505.12540. https://arxiv.org/pdf/2505.12540

key concepts explored

World Models: Internal representations that language models develop to understand and predict real-world dynamics
Superposition Hypothesis: The theory that neural networks can represent more features than they have neurons through superposition
Platonic Representation Hypothesis: The idea that sufficiently trained models converge toward universal, shared representations of reality
Stochastic Parrots: The critique that language models merely recombine training data without true understanding
Computational Irreducibility: Wolfram's concept that some systems cannot be simplified and must be computed step-by-step
Simulation Hypothesis: The philosophical proposition that reality might be an artificial simulation