What Did Ilya See?
By Run The Numbers
Summary
Topics Covered
- Neural Nets Died from Minsky's Kill Shot
- GPUs Unlock Neural Net Scaling
- AlexNet Crushes ImageNet on Bedroom PC
- Scale Predicts Intelligence Linearly
- Sam's Lies Trigger Board Coup
Full Transcript
It's summer 2023.
This is Ilia Sudskver, OpenAI's chief scientist and basically the mastermind behind ChatGpt.
In the last 9 months, Chat GPT became the fastest growing consumer product in history.
Experts value the company at $29 billion and Sudskver has become an accidental celebrity.
But we find him here in his office at OpenAI reviewing blueprints for a bunker. A
bunker to hide in if the AI he built becomes uncontrollable.
He does not sleep much.
In 4 months, he will gather the board of OpenAI to fire his CEO, Sam Alman, because he is terrified of Sam being in charge if this thing loses control.
This all started with one petty phone call his professor made 11 years earlier.
This is Jeffrey Hinton.
It's a summer weekend and he's hunched over his computer coding.
He runs the AI lab at the University of Toronto, but he's not fielding million-dollar job offers from Google.
In fact, the lab is barely funded.
He hasn't made a noteworthy breakthrough since the8s, but then nobody has.
I said that Hinton studies AI, but he actually studies neural networks, the field of AI that tries to make computers learn in the way humans do. To your
average AI researcher back then, conflating neural networks with serious AI research was like comparing astrology to physics.
But it wasn't always this bad. Not long
ago, it was seen as the path to the singularity.
Then some really bad stuff happened. But
Hinton is a believer. Decades of
disappointment and his research still makes him so excited that he forgets to eat. And he also doesn't want to be
eat. And he also doesn't want to be remembered as a man who spends his time toiling away on some toy idea. He's
nearing 60 years old and his back problems are bad enough that on some days he wakes up paralyzed. He's
fighting the clock.
Who could that possibly be on a summer weekend visiting Hinton's deserted AI lab?
>> I am Ilia. I want to study neural networks.
>> Little did anyone know, least of all hinton, but that knock would change the course of human history.
The year is 1956.
America won the war and its next frontier was space and robots.
>> Discoveries that were miracles a few short years ago are accepted as commonplace today.
>> Is a concern of a new field of science called space medicine.
>> Thank you, Garo.
>> Within a year, we'd be in a space race with Russia. The future was now. We're
with Russia. The future was now. We're
in Tomorrowland.
That summer in New Hampshire, 10 men gathered to ask the question.
>> Well, now seriously, professor, do you think that one day machines will really be able to think?
>> Well, I think so, but people still disagree about it.
>> And the field of artificial intelligence was born. But unlike math or physics, AI
was born. But unlike math or physics, AI had no inherited wisdom or proven frameworks. It's just 10 men arguing in
frameworks. It's just 10 men arguing in a room. From that two camps emerged who
a room. From that two camps emerged who thought they had the way forward and they hated each other.
Leading the first camp was Marvin Minsky, an MIT professor who would become known as the godfather of AI. His
approach was called symbolic AI. It
involved taking human knowledge, rules, logic, decision trees, and encoding them into machines. To teach a computer
into machines. To teach a computer chess, you'd program in the rules and an algorithm for selecting the best move.
>> That man isn't playing checkers against the computer, is he?
>> Sure. And it plays pretty well.
>> It was all very clean and logical. And
Minsky had the entire AI establishment behind him. The opposition studied what
behind him. The opposition studied what were called neural networks, and they were the underdogs.
While Minsky and the Symbolists wanted to hardcode computers with our knowledge, neural nets people wanted to raise a baby computer. They didn't
program in rules or knowledge. They
believed real intelligence had to be learned, not given. The human brain is made up of 100 trillion tiny dials. Each
position finely tuned through years of life experience and learning. Neural
nets were a crude model of this with far fewer dials. But the core principle was
fewer dials. But the core principle was that intelligence isn't programmed. It's
iteratively tuned.
Take a simple challenge. Distinguish
between pictures of cats and dogs. The
Minskyled symbolist approach might encode specific features of dogs. Flappy
ears, a tail, a wet nose. But edge cases could fool this system, requiring humans to continually update the rules as things go wrong. Neural nets, on the
other hand, would be shown thousands of labeled pictures of dogs and cats, attempting to guess at each one. For
each wrong guess, it would make slight adjustments to his human apaped dials to get it right the next time. Repeated
thousands of times, the system will have learned a concept of cats and dogs. the
same way a human baby gains an intuition for the family dog, not through a checklist of rules, but by the experience of seeing the dog over and over again. So this humanity that neural
over again. So this humanity that neural nets exhibited, learning from experience, caught the public's imagination far quicker than the more
grounded symbolist approach.
In 1958, the day after the first prototype of a neural network was showcased, the so-called perceptron, the New York Times wrote this.
The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself, and be
conscious of its existence. The creator
of the perceptron, Frank Rosenblad, and the leader of the neural net movement went on to tell the times, "Later perceptrons, Dr. Rosenlat said, will be able to recognize people and call out
their names. Printed pages, longhand
their names. Printed pages, longhand letters, and even speech commands are within its reach."
Only one more step of development, a difficult step, he said, is needed for the device to hear speech in one language and instantly translate it to speech or writing in another language.
But in reality, these early prototypes of perceptrons were primitive. They
solved toy problems like distinguishing between a square and a circle. But the
proof of concept was key. A computer had learned in a humanlike way. The promise
lied in what it could do when it quote unquote grows up. In 1958, they sounded insane. To that doubt, the neural nets
insane. To that doubt, the neural nets people might retort, explain how you immediately recognize your mom. Do you go through a checklist
your mom. Do you go through a checklist of features, hair color, nose shape? No,
you just have an intuition. It was that intuition and fuzzy pattern recognition that neural nets people wanted to encode in their systems. But as time went on,
it started to dawn on Rosenblat and the rest of the field that neural nets were further off than one difficult step from human level intelligence. What we began to see is that the things that people
think are hard are actually rather easy and the things that people think are easy are very hard.
>> The techno optimism of the 1950s started to give way to realism in the60s.
Subsequent neural nets only made marginal progress over prototypes. They
were still just solving toy problems. Whereas the Minsky led symbolist approach was still gaining ground. a
checker's engine that beat a human expert, the first general purpose robot, and even a proto chatbot that while not quite chat GBT, fooled many into believing that a human was on the other
side. So for the first time since the
side. So for the first time since the field's genesis, neural net funding was under threat.
With no business use case yet, nearly all of AI research was funded by the government. And that pool of funds had
government. And that pool of funds had to be split by the symbolists and neural nets researchers. This drove Minsky mad.
nets researchers. This drove Minsky mad.
The systems that could barely tell shapes apart were still getting funding.
His funding in his eyes already extremely vocal about his distaste for neural nets. He started devising a kill
neural nets. He started devising a kill shot. And who better than him, the
shot. And who better than him, the legendary Marvin Minsky, the godfather of AI, to deliver it.
He and a colleague penned a book titled perceptrons that basically claimed creating strong neural nets was impossible. That there was a
impossible. That there was a mathematical ceiling on their capabilities.
Such a definitive and convicted claim coming from the deacto leader of the field hit hard.
DARPA, the primary funer of AI, quickly started cutting funding to neural net projects. And pretty soon, no academic
projects. And pretty soon, no academic journal or conference would accept research papers on neural nets. And just
two years after Minsky's book came out, Frank Rosenblat would die in a boating accident. The field had lost its
accident. The field had lost its funding, reputation, and then their leader.
Neural nets were officially dead. They
lost the battle to the symbolists. And
even decades later, Minsk's book echoed for the few remaining neural nets researchers like Hinton.
Everyone knew neural networks didn't work and forgot about them. Everyone
except Hinton who started his PhD just a year later in 1972.
Back to the modern day 2003 in Hinden's lab. This 17-year-old overconfident
lab. This 17-year-old overconfident Russian kid insists on meeting with Hinton and quickly makes himself useful.
Hinton gave Ilyas some research to look over just to see if he'd show up again.
A few days later, he came back with a veteran level insight about training neural nets.
>> And I gave him a paper to read which was the nature paper on back propagation.
And he came back and he said, "I didn't understand it." And I was very
understand it." And I was very disappointed. I thought he seemed like a
disappointed. I thought he seemed like a bright guy, but it's only the chain rule. It's not that hard to understand.
rule. It's not that hard to understand.
And he said, "Oh, no, no, I understood that. I just don't understand why you
that. I just don't understand why you don't give the gradient to a central a sensible function optimizer, which took us quite a few years to think about."
His raw intuitions about things were always very good. Hinton had misjudged Ilia as a wannabe. In university at just 17 years old, brand new to the study of
neural nets, he was easily picking up concepts. even Hinton's brightest
concepts. even Hinton's brightest students struggled with. And over the next few years, Ilia would become Hinton's most valued pupil. He was
taking the lead on difficult research questions and making real contributions to Hinton's most pressing problems. Lately, they've been working on Hinton's
career defining obsession to this point.
Deep belief networks.
Hinton's biggest problem is that it seems impossible to make a large neural network, one capable of impressive feats. At a certain network size, the
feats. At a certain network size, the model just fails to learn more stuff.
And deep belief nets were hinted solution to this. You can't train a giant model. You just train a bunch of
giant model. You just train a bunch of tiny models and duct tape them together.
It sounds almost too simple to work. I
mean, someone must have tried this before, right? But no, nobody's tried it
before, right? But no, nobody's tried it in this specific fashion. There are only a few dozen people on Earth seriously studying this stuff, and they collectively had less funding than
Google's annual cafeteria budget.
In the 34 years since Minsky published Perceptrons, neural nets had actually graduated from solving toy problems like recognizing shapes to doing practical
yet unremarkable things that had smallcale commercial appeal. At Bell
Labs in the 1980s, Yan Lun created a neural net that AT&T used in ATM machines. Hinton worked with a Wall
machines. Hinton worked with a Wall Street firm to predict stock prices, which actually made money for a bit until the signals got overcrowded. I was
actually once the technical guy on a little mutual fund. There was a neural net that decided what sort of phase of the market you were in. And there's
another neural net that told you which stocks would do better than the market in 6 months time. It actually performed extremely well.
>> Then there was a quiet neural net revolution in Japan during the 1980s which was largely invisible to Westerners. They were putting neural
Westerners. They were putting neural nets in camcorders for stabilization, using them in industrial applications like detecting flaws in welding jobs and even sticking tiny neural nets in rice
cookers to learn how sticky you like your rice. The problem for Hinton was
your rice. The problem for Hinton was that neural nets were invisible and unremarkable. They were doing little
unremarkable. They were doing little background tasks that while useful were just things we took for granted. It was
a massive difference from Rosenblat's predictions of reaching human level intelligence within a few years back in the 1950s.
>> I confidently expect that within a matter of 10 or 15 years, something will emerge from the laboratories which is not too far from the robot of science fiction fame. And despite decades of
fiction fame. And despite decades of marginal gains, Hinton still believed in that Rosenbladesque vision of neural nets that reach or even exceed human capabilities. But you weren't going to
capabilities. But you weren't going to do that with these baby models used for recognizing digits or learning how you like your rice. The models of the day were equivalent to maybe the brain of a
fruitly. Back of the napkin math says
fruitly. Back of the napkin math says you need models about a half a million times larger just to be equivalent to the size of the human brain. So these
deep belief nets were the first step to climbing that mountain.
Hinton, Ilia, and the rest of the lab at the University of Toronto spent the last few years working on deep belief nets.
And it was this year that Hinton published the first research on the subject called a fast learning algorithm for deep belief nets. 3 weeks of training the model on a university
supercomput and they made a model significantly larger than the current state-of-the-art.
The experiment to make a large neural net worked but just barely.
Its performance benchmarked by its accuracy and digit recognition did improve compared to the smaller models.
So in that sense it was a worthy proof of concept but it wasn't groundbreaking.
It was marginal not exponential improvement.
So in a sense, Hinton's hypothesis was correct that a bigger model equals a bigger brain and this paper provided a blueprint to successfully train a bigger
model. But it was just better at solving
model. But it was just better at solving the same unremarkable sort of background problems. So the question is what's next? How do we get to this Rosenblat
next? How do we get to this Rosenblat vision? Fulfilling Rosenblat's visions
vision? Fulfilling Rosenblat's visions of machines that could see, speak, and think would require a miracle.
And that miracle was sitting on the shelf of Best Buy for $499.
According to Moore's law, we'd need decades before computers could train a model approaching that of house cat intelligence.
Hinton was nearing 60 years old. He
didn't have decades, so there had to be another way.
But somewhere along the way, somebody made the relatively mundane realizations that GPUs, mostly used for PC games like Grand Theft Auto and Counterstrike, were
accidentally perfect for the exact type of math that neural nets run on. Matrix
multiplication.
Theoretically, using a $500 GPU would be like having a supercomput from 10 or 20 years in the future. And just as this became clear, Nvidia made their GPUs
programmable. So suddenly they weren't
programmable. So suddenly they weren't just for graphics anymore.
The only problem was that nobody in Hinton's lab knew how to program a GPU.
That was John Carmarmac type of stuff.
AI researchers wrote math proofs and waxed philosophical about robots taking over. They didn't do low-level systems
over. They didn't do low-level systems programming.
So they needed an extremely rare type of person with both the Carmarmacesque skills and expert neural net knowledge.
In 2007, that person did not exist until another strange Soviet kid wandered into Hinton's office.
His name was Alex Kruvki, and he was nothing like Ilia or Hinton.
Alex didn't dream about what AI could be one day. He just liked programming and
one day. He just liked programming and solving problems. And some of the bedroom experiments he was running, training neural nets on GPUs, were already more impressive than
some of Hinton's models, all on his $1,000 gaming computer at his mom's house.
If this works, it's a huge jump in computing power, going from wood to iron tools in Minecraft, from Glocks to AKs and Counter-Strike. But it's still an
and Counter-Strike. But it's still an open question if this will work at all.
Companies like Nvidia and Silicon Graphics were making GPUs way back in the '90s.
If they were some magical key to AI, surely someone would have noticed by now.
Yet, there was no theoretical reason it shouldn't work.
The dark years of neural nets were full of these moments. Breakthroughs that are obvious in hindsight that sat untouched for decades. But with maybe a few dozen
for decades. But with maybe a few dozen people seriously studying neural nets worldwide and tiny budgets, plenty slipped through the cracks.
Hinton learned one thing from deep belief nets. Scaling the size of neural
belief nets. Scaling the size of neural networks kind of works.
Bottleneck was how much you could spend on server hardware. not necessarily your research savvy. The theoretical research
research savvy. The theoretical research tests proved this out. And now with the realization that GPUs fast forwarded Moore's law by a couple decades, they finally had the pieces in place to
actually do useful things. But they had to do it fast because time was running out. Hinton was getting older, 64 years
out. Hinton was getting older, 64 years old now. Ilia and Alex were wrapping up
old now. Ilia and Alex were wrapping up their PhDs and would be leaving Hinton's lab soon. Google was quietly
lab soon. Google was quietly experimenting with neural nets for image recognition with unlimited money.
Meanwhile, other researchers in academia were figuring out GPUs.
If they didn't make a breakthrough, another lab would.
So, the only thing they needed now was a target, a proof of concept to train neural nets on GPUs.
And thank God that Hinton is so petty.
when a friendly adversary, a neural net skeptic, was publicly doubting neural net technology, Hitton called him up to trash talk and accidentally blurted out that he and his students could solve one
of the toughest problems in AI, image recognition.
Ever since the 50s, researchers have been trying to train neural nets and other AI models to identify objects within images. Show it a picture of a
within images. Show it a picture of a dog and it will tell you, "Yeah, that's a dog." Cracking the problem was a
a dog." Cracking the problem was a potential billion-dollar solution. It
could power self-driving cars, medical imagery like X-rays, and automated surveillance. But 60 years later, the
surveillance. But 60 years later, the current solutions are still very brittle. They make obvious errors, and
brittle. They make obvious errors, and they can only recognize a small amount of objects.
A true killer app for image recognition would be a general purpose, highly accurate system that you didn't need to hire several people to babysit.
and that didn't exist yet and there was little public hope that it would soon.
So, Jatendra Malik, the skeptic hint and called the trash talk, has spent his entire career trying to make this image recognition thing happen, and he's only
made marginal progress.
He's tried all the methods, and for him, neural nets have showed no promise.
Meanwhile, Hinton, mostly unfamiliar with the field of computer vision, was basically telling Malik that he could solve his life's work in just a few months as a side project. From the
outside looking in, Hinton probably rubbed many as arrogant and bitter, constantly claiming he could solve life-changing problems. And yet, neural nets were still incapable of mundane
utility. But then Malik nor the rest of
utility. But then Malik nor the rest of the AI community knew about Hinton's GPU breakthrough. They didn't know that the
breakthrough. They didn't know that the problem wasn't neural nets. It was
computers. And that problem, at least they hoped, was now taken care of.
So, if you wanted to prove that you can give a computer a set of eyes back in the early 2010s, you would train it on ImageNet, the largest data set of images ever collected. It's millions of images
ever collected. It's millions of images in thousands of different categories. If
an AI model could learn this data set, it would basically solve that billion-dollar question.
But the problem was few even bothered to try. It was so many times bigger than
try. It was so many times bigger than the standard data sets of the time. For
reference, Pascal, the most popular image data set of the time, was just 20,000 images across 20 classes, with classes being things like dog, cat,
truck, or person.
Imageet was 14 million images across 20,000 different classes. You weren't
training a computer to recognize a dog.
You were training it to recognize a noral terrier. And you had to make sure
noral terrier. And you had to make sure it didn't mix that up with the Norwich Terrier. Here's those two side by side.
Terrier. Here's those two side by side.
Can you even recognize the difference with your eyes?
The way the AI community gauged state-of-the-art and computer vision was to hold annual contests. Everyone takes
the same data set like Pascal or Imagenet and trains an AI model on the images. The model that can most
images. The model that can most accurately guess which class a picture belongs to wins the contest.
The contest for Pascal, the small data set, regularly had a few dozen teams participating, but only four teams entered the ImageNet
contest in 2011.
And the winning model had just 74.3% accuracy. not even close to good enough
accuracy. not even close to good enough to be useful in the real world.
So, this is what Hinton was signing them up for. A challenge so tough that most
up for. A challenge so tough that most didn't even bother trying. And they had just a few months to turn in a working model.
And what's more is their hardware budget was basically just Alex's $1,000 gaming computer at his mom's house. Their
competitors, meanwhile, had six figure server racks.
There's a USB stick in Hinton's pocket.
On it is either a path to AGI or professional suicide.
In a conference of 500 computer vision researchers, maybe a dozen believed that neural networks could work. The rest
were here to watch them fail.
From the outside, failure looked likely.
They were one of just seven teams brave enough to take on ImageNet that year.
The Pascal contest had 24 entrance.
Rumors circulated around the conference that one of the six teams in the Imagenet competition were using a neural net, which was confusing because that's
never happened before.
In the 6 years Pascal's been running and the two years for ImageNet, not one of the 100 plus teams has entered a neural net.
And the first time it does happen, it's Jeff Hinton, the biggest name in the niche world of neural nets.
So to put that in perspective, the last time most of these researchers heard about neural nets and vision was probably Yan Lunberg in the '90s, the
digit reading stuff that AT&T used.
And yet Hinton is entering the competition as if he's basically fast forwarded a decade plus of progress and he's entering the Imagenet contest
challenge so hard that basically everyone was shying away from it. He's
not even starting with the far easier Pascal contest.
So, naturally, people were speculating.
Hinton and his crew either found a time machine or they're going to embarrass themselves hard.
It's Friday, the last real day of the conference. After a week of
conference. After a week of presentations, today is the day for the two big competitions, Pascal first and then Imagenet. Throughout
then Imagenet. Throughout the Pascal competition, the auditorium is mostly half empty, the usual. This
contest's been running since 2006.
There's not much left to discover.
But once it came time for the Image competition, there wasn't an empty seat in the house. People are sitting on the floor, standing in the doorway, peeking over one another's shoulders to get a
look.
Vision contests never draw crowds like this.
Everyone knew something was coming.
The question was what?
The host calls up team supervision led by Alex Kusvki.
And out comes Alex Kusvki who'd clearly rather be coding than presenting.
Alex introduces himself. His voice
cracks. He's got zero stage presence.
>> My name is Alex Kvski. Um, I'll tell you about I'll tell you about some of this work that we did with um some of my awesome collaborators, Elia Sutzk and Jeff Hinton. And yet nobody's checking
Jeff Hinton. And yet nobody's checking email. Everyone's locked in like they
email. Everyone's locked in like they know something's coming. He details the architecture of the neural net. He talks
about the GPUs they used. All pretty
run-of-the-mill. Uh so the model is pretty simple. It's it's completely
pretty simple. It's it's completely supervised. It's it's deep it's a deep
supervised. It's it's deep it's a deep convolutional neural net that has five convolutional layers. Um two fully
convolutional layers. Um two fully connected layers. It's
connected layers. It's uh what else?
Yes, it's trained with very ordinary SGD.
It's trained on two Nvidia GPUs. It was
trained actually um in my bedroom.
But then he gets to slide three and there's one bolded number, 84.7%.
The previous best was 74.3%.
They had just obliterated the state-of-the-art on their first try and the room erupted in chaos. Arguments
started breaking out.
Researchers are saying the data set was flawed, that the code won't replicate, that this approach won't scale. Even
Jendra Malik, the professor who Hinton made the bet with originally, was still skeptical. He wanted to see it work on
skeptical. He wanted to see it work on more data sets. But within a year, he'd be writing neural nets papers himself.
The next team was supposed to present after them, but nobody remembers what they showed.
20 years of computer vision research were just rendered obsolete with a single PowerPoint slide.
And from here on out, nearly all computer vision research would be neural nets. The entire field would now have to
nets. The entire field would now have to learn Hinton's language.
There's 40 years in the wilderness, decades of mockery, funding cuts, dismissal.
People still referencing Minsky's book anytime Hinton brought up neural nets.
At 64 years old, with back problems so severe he sometimes woke up paralyzed, Hinton made history in the final hour.
So while Hinton was arguing back and forth with the skeptical audience, I was probably just in his head doing math.
Every experiment they did leading up to this made one thing clear.
The more compute you throw at training, the smarter the model gets. It's almost
linear.
And they just changed the entire field with $1,000 worth of GPUs.
What could they do with $100,000 or $10 million?
Google's revenue last year was $38 billion.
Multiplying Alex Net by 10,000 would be a rounding error for them. So, the
implications were obvious.
Within hours of Alex's presentation at a conference of maybe 100 people, every venture capitalist in Silicon Valley knew the names Ilia Sudskver, Alex
Kusvki, and Jeffrey Hinton. from
irrelevant to influencing the flow of billions of dollars with no time to prepare for that.
Within weeks, millions of dollars are changing hands, corporate buyouts, GPU purchase orders, and every tech company is doing it.
The recipe was so simple. More GPUs and more data. No governor, no discipline,
more data. No governor, no discipline, just the race to scale the fastest.
These were academics in a field that's been dead for 30 years. They were
suddenly getting million-dollar offers and blank check budgets to train models.
They could build whatever they wanted.
They weren't trained to be skeptical of the industry because before they had never had to be.
And Google moved by far the fastest.
Within 2 months of Alexet, they bought Hinton's team for $44 million. They had
no products. They had no assets. Google
was just buying the brains of Alex, Ilia, and Hinton for $44 million. And in
2014, they bought AI Lab Deep Mine for $500 million.
Larry Paige gave one of his lieutenants a simple mandate. Fly around the world and acquire any promising AI team with basically no budget limit. If you
published an interesting AI paper around that time, you were getting calls from Google recruiters offering to pay you like an NFL player.
And the same recipe that Alex Net provided continued to work. Throw more
GPUs and data at things until it does something scary.
For the first time, neural nets were being deployed at scale. Image
recognition and Google Photos or language models and Google search. the
stuff they were building wasn't staying in university labs anymore. It was no longer theoretical. So, a realization
longer theoretical. So, a realization started to dawn on some of these AI researchers. That for decades, the idea
researchers. That for decades, the idea of AI safety was basically just a thought experiment. It was fun
thought experiment. It was fun speculation about the idea of robots maybe taking over one day.
Now, it was a question of when a dangerous model might ship to billions of people through Google.
These researchers spent their careers just trying to make the math work, just trying to get something to work. Now,
they had to consider what were they building?
Is it dangerous?
And who should control it? Should Google
control it?
But the question was already answered.
It was the company that removed don't be evil from their motto, who helped the NSA with mass surveillance.
the company that was a surveillance machine themselves.
Google had acquired the entire field before anyone thought to ask if that was a good idea for society.
And Ilia, a Google employee himself, was starting to feel this the most. He'd
helped create the breakthrough that started this AI race. And now he was working for the winner, watching them scale it faster than anyone could think through the implications.
It's summer 2015. At a private room at the Rosewood Hotel in Silicon Valley, Ilia is sat across from Sam Alman, president of Y Combinator, which is the
most powerful startup incubator in Silicon Valley, and Greg Brockman, former CTO of Stripe. It's three of the
sharpest guys in tech just waiting.
And finally, the guest of honor, Elon Musk, shows up an hour late. The entire
restaurant is put on notice when he comes in. Elon's already talking before
comes in. Elon's already talking before he takes a seat.
He was there for Ilia and his message was basically, Ilia, super intelligence is coming.
>> I mean, with artificial intelligence, we are summoning the demon >> probably within the decade. Google's
building it and you're helping and nobody's going to stop this. So,
someone else needs to act as a counterweight.
Elon had already ruined his friendship with Larry Page over this topic. Larry
Page turned to me.
>> He goes, "You know, if I were to get hit by a bus today, I should leave all of it to Elon Musk."
>> Really? Yeah.
>> He said that.
>> Yeah.
>> So, like he's like he's a good friend of mine. I met Larry before he got venture
mine. I met Larry before he got venture funding. Paige called Elon a specist for
funding. Paige called Elon a specist for caring about human extinction from AGI.
Paige thought it was small-minded and tribalist.
That's who's building this stuff.
So his pitch was this.
Leave your multi-million dollar Google salary, leave your unlimited compute budget, and come work at a nonprofit that will pay you like a recent college
grad. And you're going to work out of an
grad. And you're going to work out of an old chocolate factory in San Francisco.
Your new boss is this guy, Sam Alman, who has zero AI experience, but is a killer tech investor, which admittedly is pretty useless at a nonprofit.
On the other hand, we have Greg Brockman, also zero AI experience, but is brilliant at optimizing huge mature code bases, which is really exactly what
you don't need at a scrappy research lab that has no code.
It was a bad and messy pitch that made no sense at all except for two things.
One, it was Elon Musk asking you to help him save the world. And two, he was the only one asking.
December 2015 at the neural information processing systems conference in Montreal Canada Ilia said yes. He was the credibility.
Him joining in inspired other researchers to join. Without Ilia, it was just Elon Musk and two guys with zero AI experience asking researchers to
take massive pay cuts.
They announced that the biggest AI conference of the year with a blog post introducing Open AI, but it almost didn't happen. Google's
Deep Mind was literally cornering people at the conference, making counter offers on the spot. But enough researchers signed on.
The combination of Elon's vision, mounting safety concerns, and Ilia's presence was enough to overcome entry-level salaries and minimal perks other than making the world a better
place.
Their mission statement read, "OpenAI is a nonprofit artificial intelligence research company. Our goal is to advance
research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. Since our
research is free from financial obligations, we can better focus on a positive human impact.
The idea was to emulate the great research labs of the 20th century, Bell Labs and Xerox Park. Get the world's smartest people in a room. let them
build the future unconstrained by profit motives or project managers.
The idealism was kind of compromised from the start. Months before they even existed, Sam Alman was emailing Musk about having an ongoing conversation
about what work should be open sourced and what shouldn't. And even Ilia from early on was fully aware of appealing to open source ideals to recruit researchers.
In a January 2016 email to Musk, he wrote, "As we get closer to building AI, it will make sense to start being less open. The open and open AI means that
open. The open and open AI means that everyone should benefit from the fruits of AI efforts built, but it's totally okay to not share the science, even though sharing everything is definitely
the right strategy in the short and possibly medium-term for recruitment purposes.
Open source was a recruiting tool from the start.
After the initial luster of the Elon Musk funded AI nonprofit wore off, Open AI was rootless.
They had some of the top researchers in AI, but no vision, no leadership.
Everyone was off doing their own personal projects.
They read about Bell Labs and assumed you could just copy and paste the formula of hiring smart people, give them freedom, and wait for breakthroughs.
That's not how it was working out.
Instead, researcher focus was fragmented across several projects, ripping off the latest Deep Mind paper or doing gimmicks like making a robot hand solve a Rubik's
cube. When asked what OpenAI's goal was,
cube. When asked what OpenAI's goal was, Greg Brockman said, "Our goal right now is to do the best thing there is to do.
It's a little vague."
The luster was clearly wearing off. In
March 2016, just 4 months into OpenAI's launch, Google's DeepMind beat the world champion at Go, a game most viewed as impossible for computers because it required intuition, not just
calculation.
Over 60 million people watched Alph Go demolish lease at all.
OpenAI was just building Python libraries.
And Ilia had a realization.
In 2016, he made the biggest neural net breakthrough since the 1980s because of scale. Alex's GPUs were multiple times
scale. Alex's GPUs were multiple times more powerful than the CPU rigs they were used to training on, allowing them to train a bigger model on more data for longer.
And the neural net revolution that soon followed was built on that very idea.
That's why Google and Facebook were building giant GPU data centers. But
despite being one of the three people who made this breakthrough, the core lesson had escaped him. At the precipice of this change, he left Google's
worldclass GPU farm for a scrappy startup with a few dozen MacBooks.
It's becoming abundantly clear that to hold a candle to Google, they need billions of dollars worth of compute.
The nonprofit open-source structure that got them here, that yanked top researchers away from millions of dollars, is now the very roadblock to scale.
Elon's funding won't be enough. Begging
every tech billionaire for donations won't be enough.
And basically, everybody knew it, too.
Leadership was already internally planning to shed the nonprofit structure by the end of 2017.
Internal emails show Greg Brockman planning this very thing. They drew in idealistic researchers with appeals to open source standing up to Google Bell
Labs nostalgia.
But now they needed to become Google just to have a chance at fighting them.
September 20th, 2017, Ilia and Greg are given an impossible choice.
Elon put a deal on the table. Fold open
AI into Tesla. It'll give them unlimited compute budget, leadership from the greatest entrepreneur of their generation, and everything they needed to compete with Google. But OpenAI would
become a subsidiary of Tesla. Elon would
get sole control, CEO and supermajority board power, the very same concentration of power they founded OpenAI to prevent.
The alternative is Elon walks. Their
only real source of funding and recruiting power disappears.
Elon was ready to make Tesla the cash cow for OpenAI's research, but the price was total control.
In the final hour, IA just couldn't do it. He and Greg drafted an email to Elon
it. He and Greg drafted an email to Elon that was vulnerable and honest and detailed every concern they had with the structure.
This process has been the highest stakes conversation that Greg and I have ever participated in. And if the project
participated in. And if the project succeeds, it'll turn out to have been the highest stakes conversation the world has seen.
We did not speak our full truth during a negotiation.
We have our excuses, but it was damaging to the process, and we may lose both Sam and Elon as a result.
Elon's response came the same day.
Guys, I've had enough. This is the final straw. Either go do something on your
straw. Either go do something on your own or continue with OpenAI as a nonprofit. I will no longer fund OpenAI
nonprofit. I will no longer fund OpenAI until you have made a firm commitment to stay or I'm just being a fool who is essentially providing free funding for you to create a startup. discussions are
over. To be clear, this is not an ultimatum to accept what was discussed before. That is no longer on the table.
before. That is no longer on the table.
Elon made the impossible choice for them.
By January of 2018, they're considering an initial coin offering from Elon's checkbook to hoping to become a memecoin.
and Reed Hoffman stepped in to cover salaries as a stop gap, but they can't compete with Google on donations.
The irony once again was that they recruited researchers with the promise of open- source and nonprofit purity.
But those principles were now the roadblock to building what they needed to build. to underline it once again.
to build. to underline it once again.
They needed to become Google to fight Google.
It's May 2018.
Internal Open AI research confirms what Ilia already knew intuitively. Since
Alexet, the compute used in breakthrough AI research has multiplied by 300,000, about 10x per year. Scaling not only
works but it's a requirement but what the scale image that was curated 14 million labeled images hand organized into categories
there was no equivalent data set at the scale they needed and then a Google paper started circulating called attention is all you need.
The core idea was this. You could train a neural net to predict the next word in a sequence of text to improve translation. You know, like English to
translation. You know, like English to Chinese.
But Ilia and others at OpenAI saw something bigger.
Language is universal. It could
represent anything. Python code,
scientific papers, form arguments, instruction manuals.
If you can express it in text, you can train on it. This theoretically means you don't need a curated data set. You
could train on anything. The entire
internet perhaps.
Google invented it and they had infinite compute to scale it. And they basically had a copy of the internet from Google search.
But Google also had a hundred billion dollar advertising business that a general purpose AI could destroy.
Open AI had no such conflict of interest. to Ilyasaw for the first time
interest. to Ilyasaw for the first time perhaps a path to artificial general intelligence.
In June 2018, they published their first proof of concept model based on this architecture. They called it GPT,
architecture. They called it GPT, generative pre-trained transformer.
Tiny by modern standards, it's barely twice the size of Alex. Then it was trained on a few thousand self-published books.
could sometimes form coherent sentences and occasionally correctly answer questions, but the AI community mostly ignored it.
Internally, however, the results were proving the same old lesson. Performance
scaled with compute. Make it bigger, feed it more data, and it got better.
Their blog post was explicit about this.
This suggests there is significant room for improvement using the well- validated approach of more compute and data.
translation was, "We know how to make this work. We just need billions of
this work. We just need billions of dollars to scale."
And months earlier, they just turned down Elon's billions of dollars to preserve their nonprofit principles.
And now they had a breakthrough potentially bigger than Alexet. And
those billions were exactly what they needed to realize it. And it was those nonprofit principles they needed to shed as the main obstacle.
July 2019, Sam found OpenAI Billions, but it came with baggage.
They struck a deal with Microsoft, giving them a billion dollars with potentially much more to come. But they
had to let Microsoft commercialize their models, take a cut of revenue, and become their exclusive compute provider.
Every compromise Ilia blocked from Elon, Sam just handed to Microsoft.
And the difference was was that Elon wanted control to ensure safety, or at least he claimed so. Nobody was under any illusion that Microsoft cared about
safety. They had shareholders to
safety. They had shareholders to satisfy.
Fast forward to May 2020 and they use Microsoft's money to train GPT3.
Over 100 times bigger than GPT2 and a thousand times bigger than the original GPT1. It proved everything that naive
GPT1. It proved everything that naive scaling worked.
Feed it more compute, more data, and intelligence emerged. Could write
intelligence emerged. Could write coherent essays, debug code, pass high school exams. It's the breakthrough they've been chasing.
But most of the team that built it didn't want to release it. The world
wasn't ready and they weren't ready.
This wasn't a gradual process.
Intelligent AI arrived basically overnight.
But Sam wanted to productize it quickly, sell API access and get some revenue and also maybe put it into some Microsoft products.
Dario Amade led the GPT3 team.
He later said he felt psychologically abused by Sam's pressure to ship. And by
late 2020, Dario left OpenAI to start Anthropic, a rival AI lab. And most of his team that worked on GBD3 followed him.
OpenAI now had the money, and they just lost the people who cared the most about using it responsibly.
Except for Ilia. Ilia stayed. He still
believed.
It's November 30th, 2022 at an OpenAI recruiting party at the Nurups conference in New Orleans, Louisiana.
Most of the Open AI team is networking, drinking, just having a good time. But
one researcher won't leave his laptop.
An OpenAI recruiter tells him to have a drink and be normal and social. The
researcher says, "No, all the GPUs are melting. Everything is crashing."
melting. Everything is crashing."
Earlier that day, OpenAI launched a small demo. They took GPT 3.5, an
small demo. They took GPT 3.5, an 8-month-old model, and wrapped it in a chat interface. Previously, their models
chat interface. Previously, their models had only been available via the API, so developer only. This was just a little
developer only. This was just a little preview for regular people to play with GPD models. Nothing new. This was an old
GPD models. Nothing new. This was an old model with just a website interface.
They called it chat GPT and internal expectations were so low that the wildest scenario engineers prepped for was 100,000 users. Sam Alman
didn't even tell the board that they were launching it.
>> When Chacht came out, November 2022, the board was not informed in advance about that. We learned about Chat GBT on
that. We learned about Chat GBT on Twitter. But then Japanese Twitter woke
Twitter. But then Japanese Twitter woke up and it turns out Chad GPT could write Keo, the formal polite thank you emails
that are hugely culturally important in Japan. And Chad GPT was really good at
Japan. And Chad GPT was really good at it. So word spread and not just in Japan
it. So word spread and not just in Japan but everywhere.
Within hours the servers were overloaded.
Within 2 days they had a million users.
Within 2 months they had a 100 million users became the fastest growing tech product in history by accident.
So back at the party the researcher watching the GPUs melt knew that everything had just changed possibly forever.
Overnight, OpenAI went from a research lab to the biggest startup in Silicon Valley. Chat GPT was growing faster than
Valley. Chat GPT was growing faster than Uber at its peak. So, there was no going back. Investors would not allow it.
back. Investors would not allow it.
There's no putting the toothpaste back in the tube.
Internally, Google declared a code red to make some sort of response to Chat GBT. Meta rushed out an open- source
GBT. Meta rushed out an open- source chatbot, Llama. And when Google did
chatbot, Llama. And when Google did launch their chatbot, Bard, it was a complete disaster. We are missing the
complete disaster. We are missing the We're missing the phone.
We will have to We have no Okay, we're going to move on.
We can't find the phone.
>> Meanwhile, Wall Street was punishing Apple for moving too slow on AI.
Trillions of dollars in market value now move based on Sam Alman's tweets and company's perceived relationships with open AI and Sam is suddenly everywhere. Magazine
covers, congressional hearings, >> I'm doing this cuz I love it.
>> Dinner with presidents, multi-billion dollar fundraising meetings, resources were getting rounded from research to supporting Chad GPT.
They had their lead on Google and Sam wasn't going to give it up.
Safety kept sinking lower and lower on the priority list. The rapid growth forced them to hire aggressively and they were hiring a different type of employee.
These new hires hadn't spent a decade thinking about AI alignment. They didn't
have PD Doom estimates. They didn't read less wrong.
They had equity packages and they wanted to see them vest.
So the mission was being diluted with every hire. The culture that attracted
every hire. The culture that attracted the original team, the reason they take the pay cuts to work there originally, was now evaporating.
And Ilia started to notice Sam's behavior shifting pretty dramatically as well. You could no longer rely on his
well. You could no longer rely on his verbal agreements. He'd tell different
verbal agreements. He'd tell different people what they wanted to hear, and he would set up these internal factions against each other. because of it.
He told a board member that he got safety approval to release GPT4 Turbo, a smaller version of GPD4, but he never did. He completely lied.
He also launched the OpenAI startup fund, a venture capital fund that invested in OpenAI partners entirely in his own name with no financial relationship to OpenAI.
Then Ilia felt gas lit when Sam told him he'd be leading the research direction of the company. meanwhile telling
another colleague the same exact thing.
So these two teams ended up working in parallel on the same projects, wasting months of time in compute.
The pattern was clear. Sam was starting to treat OpenAI more like a move fast and break things traditional Y Combinator startup than a missiondriven nonprofit that they had originally
built.
Ilia was losing faith in Sam while becoming convinced AGI was imminent.
So much so that in summer of 2023, he tabled most of his AI development research to focus full-time on AI safety.
The thinking was simple. If we don't solve alignment before AGI comes, that would probably be really bad for humanity.
So, he announced Super Alignment, a massive project to ensure that super intelligent AI could be controlled with a 4-year timeline.
Sam committed 20% of OpenAI's compute to the project. Pretty massive public
the project. Pretty massive public promise. While Chat GPT is taking over
promise. While Chat GPT is taking over the world, in the blog post announcement, they say, "We need scientific and technical breakthroughs to steer and control AI
systems much smarter than us. To solve
this problem within four years, we're starting a new team co-led by Ilia Sutzber and Yan Leica and dedicating 20% of the compute we've secured to date to
this effort.
But the compute never came, at least not in the promised quantities.
And months continued to pass, chatbing continued to take priority. enterprise
deals and maintaining the lead over Google was much more important to Sam.
Safety kept getting pushed to the next quarter.
In October 2023, OpenAI board member Helen Toner published an AI safety paper.
It was pretty boring academic stuff buried in a niche journal. But one line from this paper would trigger everything
that's about to happen. It goes,
"By delaying the release of Claude, Anthropic was showing its willingness to avoid exactly the kind of frantic corner cutting that the release of chatbt
appeared to spur.
It was a pretty subtle dig at Open AAI, but not crazy for a safety researcher who's supposed to be holding OpenAI accountable as a board member. But Sam
wasn't happy. So, he called Toner and they talked it out and the issue was resolved. Or so it seemed. A few weeks
resolved. Or so it seemed. A few weeks later, Sam told Ilia that board member Tasha Macaulay thought Toner's paper was a fireable offense, that Toner should be
removed from the board for this.
Ilia and Tasha barely spoke, but just by chance, Ilia had been voicing concerns to the board about Sam's leadership that week.
When Ilia mentioned Sam's claim to Tasha, she was very confused. She had
never said anything like that.
So Sam lied. He was trying to manipulate the board composition and replace a safety focused member with someone friendlier to him. And he'd been caught.
So, Ilia started tallying everything.
Sam broke his super alignment compute promise, 20% promised and not delivered.
He lied about getting safety approval before releasing some models. He
launched a personal VC fund using OpenAI's name with no disclosure to the company. He told Ilia he'd lead research
company. He told Ilia he'd lead research and told another colleague the same thing, setting them against each other.
He routed resources away from safety to scale chat GBT regularly and now lying to manipulate the board composition.
Pretty much any one of these would be grounds for firing at a tire company.
At a company building potentially the most powerful technology in history was pretty obvious what needed to be done. So Ilia opened up his phone and
done. So Ilia opened up his phone and started making calls.
It was time.
It's November 17th, 2023.
Sam joins a video call. Helen Toner,
Adam D'Angelo, Tasha Macaulay, and Ilia are all in the call waiting for him.
There was an ambush. The board was removing him as CEO and the reason was not consistently being candid with the
board was corporate speak to say you're a liar.
After the video call ended with the board, Sam immediately started working the phones. Within hours, Silicon Valley
the phones. Within hours, Silicon Valley heavyweights were closing the ranks around Sam. Brian Chesy, Reed Hoffman,
around Sam. Brian Chesy, Reed Hoffman, Paul Graham, Eric Schmidt, even Elon Musk said given the risk and power of advanced AI, the public should be
informed of why the board felt they needed to take such a drastic action.
And then inside OpenAI, the all hands meeting was brutal. An employee asked Ilia at the meeting, "Will we ever find
out why Sam is fired?"
IA's answer was a simple no, just the one word. The board had the
one word. The board had the documentation and they were just refusing to share it. So there was an immediate mutiny. The company, Greg
immediate mutiny. The company, Greg Brockman, quit. The head of research
Brockman, quit. The head of research quit. Top executives were threatening to
quit. Top executives were threatening to leave.
And then the rest of the employees learned that their tender offer was in jeopardy.
A VC firm called Thrive Capital was about to buy their equity at a $90 billion valuation. Life-changing money
billion valuation. Life-changing money was now at risk because the board fired Sam without explanation.
Then Microsoft announced they were hiring Sam and Greg to start a new AI lab with an offer to every single OpenAI
employee to join and keep their equity.
At this point, even Mera Marotti flipped. Despite supplying evidence for
flipped. Despite supplying evidence for Sam's firing, she now was backtracking and claimed it was just constructive criticism that she never actually
supported removing him. The final blow was over 90% of OpenAI employees signed an open letter demanding to reinstate
Sam and Greg or we all join Microsoft.
Eventually, even Ilia signed the letter.
He planned for a month and built the perfect case.
And just 5 days later, he was signing a letter begging Sam to come back.
Within 5 days of the firing, Sam and Brockman were back, more powerful than ever.
The board that tried to fire Sam was gone. Toner and Macaulay were removed.
gone. Toner and Macaulay were removed.
With only D'Angelo remaining, Ilia tried to stop Sam from having too much control.
And somehow the result was Sam in total control.
Sam, Greg, and Meera reached out to Ilia. They were offering him to come
Ilia. They were offering him to come back to Open Eye, and he was considering it, too. Open was still his life's work.
it, too. Open was still his life's work.
Then, at the last second, Greg Brockman called and called the offer off. Ilia
would never step foot in opening eyes offices again.
After the board coup, Ilia went dark.
>> Is he being held hostage in a secret nuclear facility?
>> No.
>> What about a regular secret facility?
No.
>> What about a nuclear non-secret facility? Neither of Not that either.
facility? Neither of Not that either.
>> After months of radio silence, IA finally emerged in May 2024 with the announcement that he was starting a new AI lab, safe super intelligence.
In the press release, they said that they had one goal and one product, a safe super intelligence.
And it went on to say that our singular focus means no distraction by management overhead or product cycles and our business model means safety, security, and progress are all insulated from
short-term commercial pressures.
Today, SSI sits at a $ 32 billion valuation.
They've released no products and no research.
Exactly one year later, SSI CEO Daniel Gross took a massive buyout offer to join Meta's AI lab.
Ilia took over a CEO and he rejected Meta's offer to buy the whole company.
About 20 years earlier, a 17-year-old Ilia knocked on Jeffrey Hinton's door on a summer weekend.
That knock changed everything.
Hinton years later when he was asked about his life's work, it makes me sad. I don't feel particularly guilty about developing AI
like 40 years ago because at that time we had no idea that this stuff was going to happen this fast. We
thought we had plenty of time to worry about things like that. It's not like I knowingly did something thinking this might wipe us all out, but I'm going to do it anyway.
>> Mhm.
But it is a bit sad that it's not just going to be something for good.
Loading video analysis...