last week, arc-agi-3 dropped. tldr - arc is a bunch of basic, never-before-seen, interactive computer games - you're basically looking at a simple visual puzzle, identifying hidden rules and then applying that rule to solve a new puzzle.
interestingly, while most humans can solve the games with little difficulty - ai systems have a lot of trouble.
it's an innovative approach and a beneficial project, for many reasons:
1. continually pushes the frontier of ai research incentivising new ideas like test-time-adaptation/fine-tuning
2. reveals patterns of over-fitting where models are relying on patterns and training data and not truly understanding context
3. provides a genuine test and reality check can ai reason and think abstractly like a human?
4. evaluates efficiency how much computational power is required to solve problems?
perhaps the most important insight to come from arc - we might need fundamentally new ideas to scale to human-like general intelligence.
according to Chollet, intelligence is 'skill acquisition under resource constraint', adaptability and learning efficiency - or more simply, how well and how fast can you adapt to new situations on the fly and solve problems you've never seen before?
this is directly from the abstract of his 2019 paper 'On the Measure of Intelligence':
"To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI."
On the Measure of Intelligence paper
here's the kicker.
disentangling 'human-like intelligence' from 'intelligence'
it's important to disentangle 'human-like intelligence' from the broader term 'intelligence'.
if you were to attempt the puzzle, you would rely on a complex (and poorly understood) cocktail of cognitive processes - visual processing, spatial reasoning, logical thinking, intuitive insight - to help you identify the required rules and patterns to win.
this makes arc a great test for efficient human-like reasoning and abstraction, but a poor measure of diverse intelligence in non-human entities.
imagine you were dropped in the middle of the atlantic with a pod of dolphins and tested on your ability to detect whether something miles away was prey or predator?
or imagine being shrunk to a microscopic organism and tested on your ability to navigate chemical gradients and find nutrients while avoiding toxins?
conversely, you wouldn't expect a monkey to write a book or complete the arc challenge.
recognising diverse intelligence
nobody articulates the ideas of 'diverse intelligence' better than michael levin.
so when we think of 'intelligence' as navigating our familiar 3d human world - with physical objects, cause and effect etc - current ai systems are extremely limited.
but if you step back and consider intelligence more broadly, current ai systems - even the non-reasoning, non tool-using language models, might already be far more 'generally intelligent' than humans in ways we haven't yet recognised or appreciated.
references
ARC-AGI-3 video explanation https://youtu.be/3vFu79ccDcc?si=INByjq5Z07XSROST&t=663Deep Learning for ARC research paper https://github.com/MohamedOsman1998/deep-learning-for-arc/blob/main/deep_learning_for_arc.pdf
Chollet on intelligence https://youtu.be/JTU8Ha4Jyfc?si=e7K0ZJ6DzMjJZPcHOn the Measure of Intelligence paper https://arxiv.org/abs/1911.01547
Michael Levin on diverse intelligence https://youtu.be/un9yp7MQFlo?si=GLDBwyFAnsBQbni2&t=2534