the world model paradox

on a recent latent space pod, noam brown said:

"I think it's pretty clear that as these models get bigger they have a world model and that world model becomes better uh with scale so they are implicitly developing a world model and I don't think it's something that you need to explicitly model".

the 'world model' thing has been gnawing at the core of my being since listening to this conversation last week.

what does it actually mean for a language model to have a 'world model'?

this is me attempting to make sense of the world model thing.

"the cat sat on the ... "

IMAGE.

language models like chatGPT are designed to predict and generate the next token/word in a sentence. when a language model is first trained, it starts with zilch.

ie, it has no idea 'mat' should follow 'cat'.

during training - it's fed many sequences of tokens (numerical representations of parts of sentences), it predicts which tokens come next, and then adjusts itself to make better predictions over time.

if you ask chatGPT -

"what type of mats do cats like best?"

it's going to give you a unique and thoughtful response. a response which 'feels' as though the model has certainly made sense of the world - it's going to tell you that cats like soft, warm, furry mats - wool or fleece, and that cats don't don't like cold, smooth, or crinkly surfaces or plastic or foil.

IMAGE.

with more data and more training, the model is able to capture statistical patterns - grammatical structure and semantic relationships - and (almost) magically, produce nuanced and thoughtful responses. there's some additional training (fine-tuning) to ensure the models are behaving like human-friendly chatbots.

unfortunately, the scale of training and data make it (almost) impossible to reverse engineer, and really understand how the models are predicting the next word in the sentence.

to make things a little more confusing - newer 'reasoning' models are trained to generate a 'chain of thought' before answering -

let me think about this question... cats are known for seeking warmth and comfort... they prefer soft surfaces that retain heat...

that kind of thing.

from my understanding, it's still the same process of sequential token generation, it's just generating longer internal "chain of thought" sequences which are hidden from the final output.

and so the trillion dollar question.

do llm's like GPT form and store a 'deeper understanding' beyond their ability to statistically predict next token?

in other words, when i ask my question about cats and mats, either:

a) the llm draws from an 'inner-map' which includes the concepts of 'cats' and 'mats' and 'warmth' - and how they all relate to each other?

b) the model is essentially a very sophisticated autocomplete machine - it has learned from the training data that cats + prefer + mats → output tokens that statistically follow this pattern.

the paradox here, is that both might be happening.

llm's obviously aren't going to have a 'world model' identical to you and me.

.. how do you know cats like fleecy mats?

likely, because you've seen many cozy cats curled up on fleece mats. you've experienced 'what it feels like' yourself. in fact, there's probably millions or billions of connected data points from your training process which have enabled you to generate a coherent response.

so from one perspective, your world model is fundamentally different - it emerges from rich, multi-sensory interaction with the real world. you have embodied experiences of the warmth of fleece, you've watched cats choose cozy spots - these sybmols and concepts exist who knows where, maybe distributed across your brain. whereas, an llm has only seen text.

from another perspective however, we might both be doing the same thing. sophisticated pattern recognition from our training data. you've processed (*millions?) of sensory experiences and built relationships between cats and mats and warmth. the llm has processed (*billions?) of relationships between these same things from internet text.

in this sense, a 'world model' might just be an abstract and implicit representation that emerges when pattern matching gets sophisticated enough - the LLM isn't explicitly building a map of reality, but through the process of predicting the next token, those representations naturally emerge.

just as cats and mats and warmth and comfiness' exist (somewhere?) for us, they might exist (somewhere else?) for LLM's.