The Post-Human Research Era: Karpathy’s Loop, Sakana’s Swarm, and Self-Revising AI. Frontier Models.
By Byte Goose AI.
Summary
Topics Covered
- Discovery is knocking down walls, not rearranging furniture
- AI scientists delete complexity to find elegant truths
- Five minutes forces algorithmic innovation over brute force
- Agent isolation prevents orchestration collapse
- Better organization beats bigger brains
Full Transcript
Imagine going to sleep tonight and while you're dreaming an AI running on your laptop just decides to write a completely new piece of software.
Right, completely autonomously.
Yeah, autonomously. It tests the code, realizes it's, you know, slightly inefficient, deletes it, rewrites its own core architecture, and by the time your morning alarm goes off it has
mathematically discovered a completely new physical law.
It sounds like a movie plot, but we are actually talking about research and systems that have just been published and deployed.
Exactly. I mean, usually when we think about artificial intelligence doing research we picture a sort of ultra-fast library assistant. You give
it a prompt and it zips through billions of documents to retrieve the exact fact you need.
Yeah, it's incredibly fast, but it's fundamentally playing a game where humans wrote all the rules.
Right, but today we are stepping way past that.
Welcome to a deep dive that looks at the absolute bleeding edge of artificial intelligence.
We are exploring systems that actively rewrite their own code, orchestrate teams of specialized AI agents, and like mathematically redefine the very concept of scientific discovery.
It is a massive paradigm shift.
It really is. Our mission today is to unpack and connect three groundbreaking technical concepts that basically outline the future of scientific research. First, a new category
research. First, a new category theoretic mathematical framework for how AI systems can self-revise.
And actually verify they've made a true scientific discovery, which is key.
Yes. Second, we'll look at Andrej Karpathy's auto research setup where an AI autonomously loops to improve its own neural network. And third, Sakana AI's
neural network. And third, Sakana AI's Fugu system. This is a single model
Fugu system. This is a single model acting as an invisible conductor to orchestrate an entire swarm of frontier models.
It's basically an AI managing other AIs.
Exactly. Okay, well, let's unpack this.
We're moving from AI as a helpful tool to AI as a self-optimizing scientist.
And to do that we really have to start with the math. What exactly is a discovery?
That is the perfect place to start. So,
this comes from a fascinating new paper on self-revising discovery systems for science.
The fundamental premise they introduce is that true scientific discovery isn't just about generating new answers. It's
not just better search, and it's not faster retrieval. Discovery,
faster retrieval. Discovery, mathematically speaking, is the revision of the representational regime itself.
Okay, revision of the representational regime, that sounds a bit heavy. Let's break that down for the listener. How are the authors
the listener. How are the authors defining the difference between retrieval, search, and actual discovery?
Think of it like this. Retrieval is just pulling a specific artifact out of an existing filing cabinet, you know, like finding a known protein sequence. Search
is finding a new path within that same filing system, like testing different combinations of known variables to find the most efficient one. But discovery,
discovery is changing the filing system itself.
Oh, wow.
Yeah, it creates entirely new types of data, new tools to measure them, or entirely new ways to verify them.
Wait, so if I understand this, search is kind of like rearranging the furniture in a room to find the best layout, but discovery is knocking down a wall and building a completely new wing of the house?
That's a great analogy. You are changing the fundamental structure of the space you are working in.
And the paper uses something called category theory to formalize this whole process.
Category theory, okay.
Yeah, they describe the current state of any scientific system in a fixed regime as a copper sheaf.
Uh, I am definitely going to need you to translate copper sheaf into plain English.
Fair enough.
Imagine a master map. This master map connects an abstract scientific category. Let's say the concept of
category. Let's say the concept of material flexibility to the actual concrete physical data in your database.
Okay, so mapping concepts to data.
Exactly. That map is the copper sheaf.
It assigns real artifacts to abstract types.
They also rigorously track the provenance, which is the step-by-step history of how every single piece of data was created or measured.
Okay, I'm with you. We have a master map of our current scientific reality. But
here's my question. Neural networks are, you know, famous for hallucinating or finding sneaky shortcuts that look right but are totally wrong.
Oh, absolutely.
So, if the AI decides to knock down a wall and propose a totally new scientific concept, how does it mathematically know it actually built something new and didn't just forget where it put the old data?
This is where the math gets incredibly elegant. When the AI proposes a
elegant. When the AI proposes a transition to a new regime, a new map, basically the system uses a mathematical operation called a left can extension.
Another big term. What is the left can extension actually doing under the hood?
Think of the left can extension like translating a highly complex poem from French to English. The math attempts to take all the old evidence, all the old physical data from the previous regime,
and map it perfectly into the new vocabulary the AI just invented.
forcing the old data into the new structure to see if it fits.
Exactly. And the goal is to see if anything gets lost in translation. True
discovery is identified by looking for what they call the residual content.
Residual content?
Yeah, this is the stuff that lies entirely outside the mathematical transport of the old evidence.
If the left can extension can explain the new scientific state perfectly using only the old data, then the AI hasn't discovered anything.
It just renamed things.
Ah, it just gave it a new coat of paint.
Right. But if there is a residual a mechanism or a concept left over that simply cannot be mapped from the old regime, that mathematical check proves the AI learned something genuinely new.
That is wild.
It completely separates objective scientific discovery from just subjective novelty.
Okay, so we now have a strict mathematical definition for knocking down the wall.
But it's one thing to have a theory on a whiteboard.
How does an AI actually execute one of these regime transitions in practice.
To understand that, the sources dive into a real-world system called the builder-breaker protein mechanics model.
Right. How does this builder-breaker system work?
It grounds all that abstract math in a concrete scientific task. You have a multi-agent system trying to figure out the mechanical flexibility of proteins.
And it uses two distinct adversarial agents, a breaker and a builder.
A breaker and a builder?
Yeah. The breaker agent's entire job is to be a relentless skeptic. It actively
hunts through data for proteins that expose flaws, anomalies, or edge cases in the AI's current working model.
So, it's basically the ultimate stress tester trying to break the rules. And
when it succeeds?
When the breaker finds an anomaly that the current model just can't explain, the builder agent steps in. The builder
proposes symbolic DAG edits, directed acyclic graph edits, the underlying mathematical equations to account for the new evidence.
But wait, if the builder's only job is to fix the error, couldn't just add a massive complex addendum to the equation for every single anomaly? Like, you
know, the law of gravity is X except on Tuesdays when it rains, then it's Y.
Which is exactly what you don't want.
Right. That's what we usually expect from machine learning building these massive, incomprehensible black boxes.
And that is exactly what they had to prevent. To stop the AI from just
prevent. To stop the AI from just infinitely expanding the equation, they force the builder to pass its ideas through an MDL gate. MDL stands for minimum description length.
Okay, so what is the MDL gate actually calculating?
It acts as a mathematical Occam's razor.
The gate measures two things.
The size and complexity of the new proposed model itself, plus the size of the errors left over when you apply that model to the data.
So, balancing complexity and accuracy.
Exactly. If the builder proposes a massively complex new rule just to explain one anomaly, the description length of the rule explodes and the gate simply rejects it. The only way an edit
gets accepted is if it compresses the total information better than the old law did.
So, out of all the ideas the builder threw at the wall to fix these protein anomalies, how many actually made it through this Occam's razor gate?
Out of 388 completely new laws proposed by the builder, the gate only accepted 25. And it heavily favored edits that
25. And it heavily favored edits that actually shed model complexity.
Wait, I really want to highlight this for the listener. Here's where it gets really interesting.
The AI isn't just endlessly making the equation longer. It's actually
equation longer. It's actually retracting terms. Yes, exactly.
It's deleting complexity to make the physical laws simpler and more elegant, just like a human physicist would.
Yes. And in the final transition, it made a massive breakthrough. By
aggressively simplifying, it discovered what the researchers called mode conditioned compliance.
Mode conditioned compliance, what does that mean physically?
The AI realized that the local flexibility of a protein wasn't just an isolated equation. It was conditioned by
isolated equation. It was conditioned by slow, large-scale movements of the entire protein structure. This wasn't
just a new variable categorically, it was a newly admitted multi-input interaction type.
A completely new physical relationship that didn't exist in the old schema, and it passed the left can't extension test.
The math verified it as a true discovery. And this new, simpler, more
discovery. And this new, simpler, more elegant law gave a massive 54.3-bit compression gain on the accumulated evidence.
Oh, wow.
It rigorously proved that AI can discover physical laws that are simpler and truer than the ones it started with.
Okay, so we have the math to verify a true discovery, and we have an MDL gate to force the AI to seek elegant truths, rather than just memorizing data. That
is huge.
Yeah.
But right now, humans still have to meticulously set up that builder-breaker game.
Right, it's still a sandbox we build.
Yeah.
If we want an AI to be a true, tireless scientist, we have to cut the leash. We
have to set up an environment where the AI is constantly running these loops, testing hypotheses on its own, and rewriting its own code while we sleep.
And that brings us to Andrej Karpathy's auto research setup.
Karpathy paints a very vivid picture of where this is going. He notes that the era of the meat computer human researcher is basically ending.
The meat computer?
Yeah, you know the old cycle. A human
writes some code, runs a test, goes to sleep, comes back to the lab to check the results, and repeats.
With auto research, you give an AI a single GPU setup, and you just let it run overnight.
And the sandbox he built for the AI is beautifully simple. There are really
beautifully simple. There are really only three files that matter here.
There's a file called prepare.py,
which handles the fixed data and the evaluation metrics. The AI is strictly
evaluation metrics. The AI is strictly locked out of this.
Right, it can't cheat by changing the test.
Exactly. Then there's train.py.
This contains the neural network architecture, the optimizer, the training loop itself. This is the AI's playground. The agent edits this file
playground. The agent edits this file over and over. And finally, there is program.md,
program.md, which is the prompt.
Yes, the prompt, the instruction manual.
And this is the only thing the human actually writes.
The paradigm shift here is that the human stops programming the Python code and starts programming the agent's behavioral goals.
But the genius of Karpathy's specific auto research setup is the bizarre constraint he puts on the AI.
Good, what constraint?
The agent gets a strict 5-minute wall clock time limit to train whatever new code it just wrote.
Hold on, I have to stop you there. 5
minutes? Is 5 minutes really enough time for an AI to know if it just made a fundamental breakthrough in neural network architecture?
OpenAI spends months training a model.
I know, it sounds counterintuitive, but what's fascinating here is it's actually a brilliant scientific constraint.
Because it normalizes the compute.
Normalizes it how?
Think about the mechanics. If the AI agent decides to rewrite its code to double the model size, it's going to run much slower, right? Which means it will process half as much data in those 5
minutes. If it changes the optimizer to
minutes. If it changes the optimizer to something highly complex, the step time increases, so it takes fewer learning steps before the buzzer sounds.
Ah, I see. By fixing the clock at exactly 5 minutes, every single experiment is forced onto a level playing field.
Yeah.
The AI can't just win by training longer. It has to find a genuinely more
longer. It has to find a genuinely more compute-efficient mathematical path.
Exactly. It forces true algorithmic innovation over raw brute force. And
because the test is only 5 minutes long, the agent can run around 100 autonomous complete scientific experiments overnight.
Wow, 100 experiments while you sleep.
Yeah, it tries an idea, trains for 5 minutes, and checks its score.
Specifically, it's evaluated purely on validation bits per byte, where lower is better. If the new code lowers the
better. If the new code lowers the score, it commits the change. If it
fails, it deletes the code and tries something else.
So, by the time you wake up, it has systematically evolved the neural network through 100 generations of ideas.
Yeah.
That is incredible for a single AI agent iterating on his own code.
It is, but there's a limit to what one agent can do.
Exactly. Let's look at the real world.
The problems we face in bleeding-edge science are often too complex for one AI, no matter how smart it is.
You don't just need a coder, you need a specialized swarm. A mathematician, a
specialized swarm. A mathematician, a coder, a debugger. And if you have a swarm, you need a system to manage them.
Enter Sakana AI's Fugu.
Yes. Sakana Fugu completely flips the script on how we use models. Instead of
treating a single giant model as the end-all be-all, Fugu is a family of orchestrator models.
orchestrator.
Right. It acts as a single, invisible interface that dynamically scaffolds an entire team of frontier models. We're
talking about orchestrating top-tier models like Claude Opus 4.8, Gemini 3.1 Pro, and GPT 5.5.
Fugu doesn't do the work itself. It
decides who to call, when to call them, and how they should talk to each other.
Sakana released two variants of this, and the engineering behind both is fascinating. Let's look at the first
fascinating. Let's look at the first one, which is simply called Fugu. This
one is latency optimized. It's meant for rapid-fire use, where you can't wait for the AI to decide who to route your question to. How do they solve the speed
question to. How do they solve the speed problem?
They solved it by bypassing text generation entirely. Usually, an
generation entirely. Usually, an orchestrator has to slowly spit out words token by token to declare, you know, I will route this to the math model. Sakana didn't want that leg. So,
model. Sakana didn't want that leg. So,
they attached a lightweight prediction head that operates in parallel to the main language model head.
So, how does the prediction head know where to send the query without reading the text?
It looks at the math behind the text.
It takes the hidden state of the orchestrator, which is basically the dense internal mathematical representation of the user's prompt, and it uses that raw numerical feeling to
directly output logits to select a worker model in milliseconds.
the latency of auto-regressive text generation.
Exactly. They use singular value fine-tuning on just a fraction of the model's layers. So, it learns to
model's layers. So, it learns to instantly route the query without losing any of its underlying intelligence.
That is slick.
Right.
But, then there's the big sibling, Fugu Ultra. This is the performance-optimized
Ultra. This is the performance-optimized variant. It doesn't just route a single
variant. It doesn't just route a single question. It builds complex agentic
question. It builds complex agentic workflows up to five steps deep. And the
way they trained it is very specific.
Yeah, they use reinforcement learning.
Right. With a GRPO objective without a KL penalty. What does that actually mean
KL penalty. What does that actually mean for the AI's behavior?
Well, GRPO stands for group relative policy optimization. Basically, they
policy optimization. Basically, they reward the system heavily when it orchestrates a team that gets the final answer right. But, the crucial part is
answer right. But, the crucial part is removing the KL penalty.
What does the KL penalty normally do?
In most AI training, the KL penalty is a mathematical leash. It forces the new
mathematical leash. It forces the new AI's behavior to stay relatively close to the original slower model it was based on, so it doesn't go completely off the rails.
So, they took the leash off.
They took the leash completely off. By
removing the KL penalty, the AI is free to invent completely bizarre, highly counterintuitive workflow strategies human engineers would never think of just as long as they get the job done
efficiently.
But managing a five-step workflow with multiple economist agents running around without a leash introduces a massive engineering hurdle. The sources talk a
engineering hurdle. The sources talk a lot about memory management.
Specifically, this tension between intra-workflow agent isolation and persistent shared memory.
Yes, that balance is critical.
Why isolate them at all? Why not just let all the agents in the swarm see everything the other agents are doing?
Because of a very dangerous phenomenon called orchestration collapse.
Orchestration collapse?
Yeah, let's say Frugal Ultra deploys a coding agent first and that agent makes a subtle logical error.
If Frugal then brings in a brilliant math agent and forces it to read the flawed reasoning of the coder, the math agent might just blindly anchor onto that mistake. It trusts its teammate too
that mistake. It trusts its teammate too much and the whole system collapses into error.
Oh, I see.
So, Frugal Ultra strictly isolates the agents' scratch pads during a specific task so they are forced to reason from first principles.
But wait, if they are totally isolated, couldn't two agents sequentially query the exact same database and learn nothing new?
Precisely, which is why Frugal Ultra balances that isolation with a persistent shared memory across the whole conversation so agents don't redundantly call the same tools.
Uh-oh, okay. Let me see if I can picture this. So, for you listening, what does
this. So, for you listening, what does this all mean?
It's kind of like having a brilliant hospital administrator coordinating a complex surgery.
Okay, I like that.
Frugal doesn't hold the scalpel itself.
It looks at the patient's overall chart, that hidden state we talked about, and it instantly pages the exact right specialist, the mathematician, the coder, or the debugger. It routes them into the operating room and it lets them
see the high-level patient history so they don't order the same blood test twice.
But it keeps their private notes strictly isolated, so the overly confident surgeon doesn't accidentally bias the careful anesthesiologist.
That is exactly how it works. It
coordinates their actions while fiercely protecting their independent reasoning.
Which brings us to the ultimate convergence. We have our self-verifying
convergence. We have our self-verifying math to prove discovery. We have our builder-breaker loops forcing elegance.
We have our auto research sandbox.
And we have our Fugu Ultra Orchestrator.
The pieces are all there.
What happens when we put it all together?
What happens when Sakana turns Fugu Ultra loose on Andrej Karpathy's auto research benchmark?
We get to see collective intelligence fundamentally outperform single models.
The Sakana team gave Fugu Ultra the exact auto research setup. They let it run for 123 autonomous experiments, which took about 14 hours of wall clock time.
And what were the results of the swarm compared to a single model acting alone?
Fugu Ultra achieved a validation bits per byte of 0.9748.
That score beat every individual frontier model acting by itself.
Right.
GPT 5.5, Claude Opus 4.8, Gemini 3.1 Pro.
None of them working as a solo agent could optimize the neural network as well as Fugu orchestrating them together.
How did Fugu actually deploy the swarm to achieve that?
Mhm.
We know it had the leash taken off during training, so what kind of weird emergent strategy did it invent to win?
It used a technique called Sep-CMA-ES, which is a form of evolutionary optimization to maximize the expected terminal reward. You can think of it as
terminal reward. You can think of it as natural selection for problem-solving strategies.
Okay.
Over those 14 hours, Fugu discovered that the most successful strategy was a highly rigid build and debug cycle. It
would specifically deploy GPT 5.5 as a builder to write the complex code because GPT 5.5 is exceptionally fluent at structural coding.
And once GPT wrote the code?
Instead of just trusting the code and running the 5-minute training loop, Fugu Ultra would dynamically pull in Claude Opus 4.8 to act strictly as a debugger.
Ah.
Claude would be brought in completely cold, totally isolated from GPT's initial struggles, to aggressively hunt for vulnerabilities, expose logic errors, and re-verify the mathematical
quality of the code before committing to the compute cycle.
It's literally peer review.
Mhm.
Fully automated, rigorous peer review happening iteratively at the speed of compute.
Exactly. And this collective approach didn't just win on the Auto Research benchmark. Because Fugu had learned how
benchmark. Because Fugu had learned how to perfectly orchestrate these models, the system achieved state-of-the-art results across the board.
Like what else?
Well, it swept SWE Bench Pro, which is a brutal test of real-world software engineering. It dominated Terminal Bench
engineering. It dominated Terminal Bench 2.1. It even successfully orchestrated a
2.1. It even successfully orchestrated a swarm to write a solver from scratch that solved 300 out of 300 Rubik's Cubes with near-optimal half-turn metric lengths.
That is staggering.
And it did all of this without training a massive new foundation model.
Which brings up a really profound point about efficiency for you listening.
Right now, tech giants are spending billions of dollars, consuming small countries worth of electricity, to train one massive new model in the hopes that it's, you know, 5% smarter than the last one.
Right, the brute-force approach.
Yeah.
But Sakana AI just proved that you can take the models we already have sitting on servers today, and by simply teaching an orchestrator how to make them collaborate properly, you achieve next-generation
capabilities. Efficiency isn't just
capabilities. Efficiency isn't just about making the brain bigger, it's about organizing the minds you already have.
It really is a perfect synthesis of mathematics, engineering, and collective intelligence.
We have covered some immense ground today. We started with the abstract
today. We started with the abstract mathematics of scientific discovery, understanding how a representational regime transitions.
Right, tracking provenance with copper sheets and using left can extensions.
Exactly, which allows an AI to mathematically prove it has found something genuinely new, rather than just recombining old data.
We then looked at how that math is grounded in reality using an MDL gate to force parsimony.
Showing us that AI can act like a true physicist, actively deleting complex data to discover elegant compressed truths like mode conditioned compliance.
Which naturally led us to Andrej Karpathy's auto research showing how a brilliantly simple 5-minute compute constraint normalizes the playing field, allowing an AI to iteratively evolve its
own code base overnight.
And finally, we saw how Sakana AI's Fugu scales that entire concept.
Right. It acts as an invisible hospital administrator orchestrating an entire team of AI models protecting their reasoning by isolating their biases, sharing their high-level memory, and conducting bleeding edge research faster
and better than any single system could on its own.
It's clear that the bottleneck is no longer the AI's ability to process data, but our ability to design the automated workflows that let it experiment freely.
Which leaves us with one final kind of dizzying thought for you to ponder.
If AI systems can now autonomously orchestrate their own specialized research teams, if they can mathematically verify their own paradigm-shifting discoveries, and if they can actively delete and rewrite
their own core code while we sleep, [snorts] does the role of the human scientist permanently shift?
Do we go from being the ones who peer into the microscope and make the discovery to simply being the ones who write the initial program.md file for the universe? Think about that ultrafast
the universe? Think about that ultrafast library assistant from the beginning of our deep dive. The AI isn't just fetching the books anymore. It's
actively rewriting the library.
Loading video analysis...