Andrej Karpathy's LLM Year in Review: 6 Paradigm Shifts That Changed AI in 2025

By Github Awesome

Summary

Topics Covered

RLVR Drives LLM Reasoning
LLMs Evolve Jagged Intelligence
LLM Apps Turn Generalists Professional
Vibe Coding Frees Imagination
Visual Interfaces Replace Chat

Full Transcript

Today we're diving into something special. Andre Carpathy, AI researcher,

special. Andre Carpathy, AI researcher, just dropped his 2025 year in review of LLM, just published what might be the clearest summary of where LLMs actually

are right now. Let's dive in.

One, RLVR, the new engine of LLM progress. For years, the LLM training

progress. For years, the LLM training recipe was stable. Pre-training,

supervised fine-tuning, RLHF. In 2025, a new stage became dominant. Reinforcement

learning from verifiable rewards, RLVR.

Instead of humans judging outputs, models train against objective, automatically checkable rewards. Things

like math problems, coding tasks, and puzzles. The result, models start to

puzzles. The result, models start to develop behaviors that look like reasoning. They break problems into

reasoning. They break problems into steps. They backtrack. They explore

steps. They backtrack. They explore

strategies, not because humans told them how, but because it worked. This shift

unlocked longer training runs, better capability per dollar, and a brand new control knob, test time compute. The

ability to trade time for intelligence by letting the model think longer. Open

AAIS01 hinted at this. 03 made it obvious. And most of 2025's progress

obvious. And most of 2025's progress came not from bigger models, but from longer, deeper RL runs. Two, animals

versus ghosts, jagged intelligence. One

of Karpathy's most important insights this year is this. LLMs are not evolving animals. They are summoned ghosts. Their

animals. They are summoned ghosts. Their

intelligence is optimized for completely different pressures than humans. Not

survival, but imitation, rewards, and benchmarks, and that creates jagged intelligence. LLMs can be genius level

intelligence. LLMs can be genius level at math or code and immediately fall apart on trivial reasoning or social traps. This also explains why benchmarks

traps. This also explains why benchmarks stopped meaning much in 2025. Benchmarks

are verifiable environments, which makes them easy targets for RLVR. Labs didn't

just beat benchmarks, they grew spikes precisely around them. You can crush every benchmark and still be nowhere near AGI. This year forced the industry

near AGI. This year forced the industry to finally internalize that. Three,

cursor and the rise of the LLM app.

Layer cursor wasn't just another coding tool. It revealed an entirely new

tool. It revealed an entirely new application layer. People started saying

application layer. People started saying cursor for X. Why? Because these apps don't just call an LLM. They engineer

context, orchestrate multiple model calls, balance cost and performance, provide guies for humans, and expose an autonomy slider. Carpathy's take is

autonomy slider. Carpathy's take is clear. LLM labs will graduate

clear. LLM labs will graduate generalists like capable college students. But LLM apps will turn them

students. But LLM apps will turn them into professionals by adding tools, data, feedback loops, and structure.

This layer is thicker and more valuable than many people expected. Four, Claude

Code, AI that lives on your computer.

Claude Code marked another big shift.

For the first time, an AI agent felt native. Not a wet cloud container, but

native. Not a wet cloud container, but something that lives on your machine. It

has access to your files, your environment, your context. This matters

more than raw compute. In a world of jagged intelligence, low latency, and deep integration beat massive cloud orchestration. Cloud code changed the

orchestration. Cloud code changed the mental model. AI isn't just a service

mental model. AI isn't just a service anymore. It's a resident spirit on your

anymore. It's a resident spirit on your computer. Five. Vibe coding. When code

computer. Five. Vibe coding. When code

becomes free, 2025 is the year programming crossed a threshold. You can

now build serious software using natural language alone, often forgetting the code even exists. Carpathy calls this vibe coding. This isn't just about

vibe coding. This isn't just about beginners. It supercharges

beginners. It supercharges professionals. Code becomes cheap,

professionals. Code becomes cheap, ephemeral, disposable, all and abundant.

You write software to test ideas, to find bugs, to prototype tools that never existed before. Programming is no longer

existed before. Programming is no longer the bottleneck. Imagination is. This

the bottleneck. Imagination is. This

will fundamentally reshape how software and software jobs work. Six. Nano banana

in the future of LLM interfaces.

Finally, one of the most underrated shifts of 2025, LLM UI. Chat is the command line of AI. Useful but not human friendly. People think visually,

friendly. People think visually, spatially, graphically. Google's Gemini

spatially, graphically. Google's Gemini Nano Banana hints at what comes next.

LLMs that communicate via images, diagrams slides whiteboards and interactive apps, not walls of text. The

real breakthrough isn't image generation alone. It's text, vision, and world

alone. It's text, vision, and world knowledge fused together. This is the gooey moment for AI. Conclusion:

Carpathy's verdict on 2025 is perfectly paradoxical. LLMs are smarter than we

paradoxical. LLMs are smarter than we expected and dumber than we expected.

They're incredibly useful and we've probably explored less than 10% of their potential. Progress will keep

potential. Progress will keep accelerating, but there's still a massive amount of work to do. The field

feels wide open again. So, yeah, strap in.

>> [music]

Loading...

Loading video analysis...