Andrej Karpathy's LLM Year in Review: 6 Paradigm Shifts That Changed AI in 2025
By Github Awesome
Summary
Topics Covered
- RLVR Drives LLM Reasoning
- LLMs Evolve Jagged Intelligence
- LLM Apps Turn Generalists Professional
- Vibe Coding Frees Imagination
- Visual Interfaces Replace Chat
Full Transcript
Today we're diving into something special. Andre Carpathy, AI researcher,
special. Andre Carpathy, AI researcher, just dropped his 2025 year in review of LLM, just published what might be the clearest summary of where LLMs actually
are right now. Let's dive in.
One, RLVR, the new engine of LLM progress. For years, the LLM training
progress. For years, the LLM training recipe was stable. Pre-training,
supervised fine-tuning, RLHF. In 2025, a new stage became dominant. Reinforcement
learning from verifiable rewards, RLVR.
Instead of humans judging outputs, models train against objective, automatically checkable rewards. Things
like math problems, coding tasks, and puzzles. The result, models start to
puzzles. The result, models start to develop behaviors that look like reasoning. They break problems into
reasoning. They break problems into steps. They backtrack. They explore
steps. They backtrack. They explore
strategies, not because humans told them how, but because it worked. This shift
unlocked longer training runs, better capability per dollar, and a brand new control knob, test time compute. The
ability to trade time for intelligence by letting the model think longer. Open
AAIS01 hinted at this. 03 made it obvious. And most of 2025's progress
obvious. And most of 2025's progress came not from bigger models, but from longer, deeper RL runs. Two, animals
versus ghosts, jagged intelligence. One
of Karpathy's most important insights this year is this. LLMs are not evolving animals. They are summoned ghosts. Their
animals. They are summoned ghosts. Their
intelligence is optimized for completely different pressures than humans. Not
survival, but imitation, rewards, and benchmarks, and that creates jagged intelligence. LLMs can be genius level
intelligence. LLMs can be genius level at math or code and immediately fall apart on trivial reasoning or social traps. This also explains why benchmarks
traps. This also explains why benchmarks stopped meaning much in 2025. Benchmarks
are verifiable environments, which makes them easy targets for RLVR. Labs didn't
just beat benchmarks, they grew spikes precisely around them. You can crush every benchmark and still be nowhere near AGI. This year forced the industry
near AGI. This year forced the industry to finally internalize that. Three,
cursor and the rise of the LLM app.
Layer cursor wasn't just another coding tool. It revealed an entirely new
tool. It revealed an entirely new application layer. People started saying
application layer. People started saying cursor for X. Why? Because these apps don't just call an LLM. They engineer
context, orchestrate multiple model calls, balance cost and performance, provide guies for humans, and expose an autonomy slider. Carpathy's take is
autonomy slider. Carpathy's take is clear. LLM labs will graduate
clear. LLM labs will graduate generalists like capable college students. But LLM apps will turn them
students. But LLM apps will turn them into professionals by adding tools, data, feedback loops, and structure.
This layer is thicker and more valuable than many people expected. Four, Claude
Code, AI that lives on your computer.
Claude Code marked another big shift.
For the first time, an AI agent felt native. Not a wet cloud container, but
native. Not a wet cloud container, but something that lives on your machine. It
has access to your files, your environment, your context. This matters
more than raw compute. In a world of jagged intelligence, low latency, and deep integration beat massive cloud orchestration. Cloud code changed the
orchestration. Cloud code changed the mental model. AI isn't just a service
mental model. AI isn't just a service anymore. It's a resident spirit on your
anymore. It's a resident spirit on your computer. Five. Vibe coding. When code
computer. Five. Vibe coding. When code
becomes free, 2025 is the year programming crossed a threshold. You can
now build serious software using natural language alone, often forgetting the code even exists. Carpathy calls this vibe coding. This isn't just about
vibe coding. This isn't just about beginners. It supercharges
beginners. It supercharges professionals. Code becomes cheap,
professionals. Code becomes cheap, ephemeral, disposable, all and abundant.
You write software to test ideas, to find bugs, to prototype tools that never existed before. Programming is no longer
existed before. Programming is no longer the bottleneck. Imagination is. This
the bottleneck. Imagination is. This
will fundamentally reshape how software and software jobs work. Six. Nano banana
in the future of LLM interfaces.
Finally, one of the most underrated shifts of 2025, LLM UI. Chat is the command line of AI. Useful but not human friendly. People think visually,
friendly. People think visually, spatially, graphically. Google's Gemini
spatially, graphically. Google's Gemini Nano Banana hints at what comes next.
LLMs that communicate via images, diagrams slides whiteboards and interactive apps, not walls of text. The
real breakthrough isn't image generation alone. It's text, vision, and world
alone. It's text, vision, and world knowledge fused together. This is the gooey moment for AI. Conclusion:
Carpathy's verdict on 2025 is perfectly paradoxical. LLMs are smarter than we
paradoxical. LLMs are smarter than we expected and dumber than we expected.
They're incredibly useful and we've probably explored less than 10% of their potential. Progress will keep
potential. Progress will keep accelerating, but there's still a massive amount of work to do. The field
feels wide open again. So, yeah, strap in.
>> [music]
Loading video analysis...