LongCut logo

What Did Ilya See?

By Run The Numbers

Summary

Topics Covered

  • Neural Nets Died from Minsky's Kill Shot
  • GPUs Unlock Neural Net Scaling
  • AlexNet Crushes ImageNet on Bedroom PC
  • Scale Predicts Intelligence Linearly
  • Sam's Lies Trigger Board Coup

Full Transcript

It's summer 2023.

This is Ilia Sudskver, OpenAI's chief scientist and basically the mastermind behind ChatGpt.

In the last 9 months, Chat GPT became the fastest growing consumer product in history.

Experts value the company at $29 billion and Sudskver has become an accidental celebrity.

But we find him here in his office at OpenAI reviewing blueprints for a bunker. A

bunker to hide in if the AI he built becomes uncontrollable.

He does not sleep much.

In 4 months, he will gather the board of OpenAI to fire his CEO, Sam Alman, because he is terrified of Sam being in charge if this thing loses control.

This all started with one petty phone call his professor made 11 years earlier.

This is Jeffrey Hinton.

It's a summer weekend and he's hunched over his computer coding.

He runs the AI lab at the University of Toronto, but he's not fielding million-dollar job offers from Google.

In fact, the lab is barely funded.

He hasn't made a noteworthy breakthrough since the8s, but then nobody has.

I said that Hinton studies AI, but he actually studies neural networks, the field of AI that tries to make computers learn in the way humans do. To your

average AI researcher back then, conflating neural networks with serious AI research was like comparing astrology to physics.

But it wasn't always this bad. Not long

ago, it was seen as the path to the singularity.

Then some really bad stuff happened. But

Hinton is a believer. Decades of

disappointment and his research still makes him so excited that he forgets to eat. And he also doesn't want to be

eat. And he also doesn't want to be remembered as a man who spends his time toiling away on some toy idea. He's

nearing 60 years old and his back problems are bad enough that on some days he wakes up paralyzed. He's

fighting the clock.

Who could that possibly be on a summer weekend visiting Hinton's deserted AI lab?

>> I am Ilia. I want to study neural networks.

>> Little did anyone know, least of all hinton, but that knock would change the course of human history.

The year is 1956.

America won the war and its next frontier was space and robots.

>> Discoveries that were miracles a few short years ago are accepted as commonplace today.

>> Is a concern of a new field of science called space medicine.

>> Thank you, Garo.

>> Within a year, we'd be in a space race with Russia. The future was now. We're

with Russia. The future was now. We're

in Tomorrowland.

That summer in New Hampshire, 10 men gathered to ask the question.

>> Well, now seriously, professor, do you think that one day machines will really be able to think?

>> Well, I think so, but people still disagree about it.

>> And the field of artificial intelligence was born. But unlike math or physics, AI

was born. But unlike math or physics, AI had no inherited wisdom or proven frameworks. It's just 10 men arguing in

frameworks. It's just 10 men arguing in a room. From that two camps emerged who

a room. From that two camps emerged who thought they had the way forward and they hated each other.

Leading the first camp was Marvin Minsky, an MIT professor who would become known as the godfather of AI. His

approach was called symbolic AI. It

involved taking human knowledge, rules, logic, decision trees, and encoding them into machines. To teach a computer

into machines. To teach a computer chess, you'd program in the rules and an algorithm for selecting the best move.

>> That man isn't playing checkers against the computer, is he?

>> Sure. And it plays pretty well.

>> It was all very clean and logical. And

Minsky had the entire AI establishment behind him. The opposition studied what

behind him. The opposition studied what were called neural networks, and they were the underdogs.

While Minsky and the Symbolists wanted to hardcode computers with our knowledge, neural nets people wanted to raise a baby computer. They didn't

program in rules or knowledge. They

believed real intelligence had to be learned, not given. The human brain is made up of 100 trillion tiny dials. Each

position finely tuned through years of life experience and learning. Neural

nets were a crude model of this with far fewer dials. But the core principle was

fewer dials. But the core principle was that intelligence isn't programmed. It's

iteratively tuned.

Take a simple challenge. Distinguish

between pictures of cats and dogs. The

Minskyled symbolist approach might encode specific features of dogs. Flappy

ears, a tail, a wet nose. But edge cases could fool this system, requiring humans to continually update the rules as things go wrong. Neural nets, on the

other hand, would be shown thousands of labeled pictures of dogs and cats, attempting to guess at each one. For

each wrong guess, it would make slight adjustments to his human apaped dials to get it right the next time. Repeated

thousands of times, the system will have learned a concept of cats and dogs. the

same way a human baby gains an intuition for the family dog, not through a checklist of rules, but by the experience of seeing the dog over and over again. So this humanity that neural

over again. So this humanity that neural nets exhibited, learning from experience, caught the public's imagination far quicker than the more

grounded symbolist approach.

In 1958, the day after the first prototype of a neural network was showcased, the so-called perceptron, the New York Times wrote this.

The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself, and be

conscious of its existence. The creator

of the perceptron, Frank Rosenblad, and the leader of the neural net movement went on to tell the times, "Later perceptrons, Dr. Rosenlat said, will be able to recognize people and call out

their names. Printed pages, longhand

their names. Printed pages, longhand letters, and even speech commands are within its reach."

Only one more step of development, a difficult step, he said, is needed for the device to hear speech in one language and instantly translate it to speech or writing in another language.

But in reality, these early prototypes of perceptrons were primitive. They

solved toy problems like distinguishing between a square and a circle. But the

proof of concept was key. A computer had learned in a humanlike way. The promise

lied in what it could do when it quote unquote grows up. In 1958, they sounded insane. To that doubt, the neural nets

insane. To that doubt, the neural nets people might retort, explain how you immediately recognize your mom. Do you go through a checklist

your mom. Do you go through a checklist of features, hair color, nose shape? No,

you just have an intuition. It was that intuition and fuzzy pattern recognition that neural nets people wanted to encode in their systems. But as time went on,

it started to dawn on Rosenblat and the rest of the field that neural nets were further off than one difficult step from human level intelligence. What we began to see is that the things that people

think are hard are actually rather easy and the things that people think are easy are very hard.

>> The techno optimism of the 1950s started to give way to realism in the60s.

Subsequent neural nets only made marginal progress over prototypes. They

were still just solving toy problems. Whereas the Minsky led symbolist approach was still gaining ground. a

checker's engine that beat a human expert, the first general purpose robot, and even a proto chatbot that while not quite chat GBT, fooled many into believing that a human was on the other

side. So for the first time since the

side. So for the first time since the field's genesis, neural net funding was under threat.

With no business use case yet, nearly all of AI research was funded by the government. And that pool of funds had

government. And that pool of funds had to be split by the symbolists and neural nets researchers. This drove Minsky mad.

nets researchers. This drove Minsky mad.

The systems that could barely tell shapes apart were still getting funding.

His funding in his eyes already extremely vocal about his distaste for neural nets. He started devising a kill

neural nets. He started devising a kill shot. And who better than him, the

shot. And who better than him, the legendary Marvin Minsky, the godfather of AI, to deliver it.

He and a colleague penned a book titled perceptrons that basically claimed creating strong neural nets was impossible. That there was a

impossible. That there was a mathematical ceiling on their capabilities.

Such a definitive and convicted claim coming from the deacto leader of the field hit hard.

DARPA, the primary funer of AI, quickly started cutting funding to neural net projects. And pretty soon, no academic

projects. And pretty soon, no academic journal or conference would accept research papers on neural nets. And just

two years after Minsky's book came out, Frank Rosenblat would die in a boating accident. The field had lost its

accident. The field had lost its funding, reputation, and then their leader.

Neural nets were officially dead. They

lost the battle to the symbolists. And

even decades later, Minsk's book echoed for the few remaining neural nets researchers like Hinton.

Everyone knew neural networks didn't work and forgot about them. Everyone

except Hinton who started his PhD just a year later in 1972.

Back to the modern day 2003 in Hinden's lab. This 17-year-old overconfident

lab. This 17-year-old overconfident Russian kid insists on meeting with Hinton and quickly makes himself useful.

Hinton gave Ilyas some research to look over just to see if he'd show up again.

A few days later, he came back with a veteran level insight about training neural nets.

>> And I gave him a paper to read which was the nature paper on back propagation.

And he came back and he said, "I didn't understand it." And I was very

understand it." And I was very disappointed. I thought he seemed like a

disappointed. I thought he seemed like a bright guy, but it's only the chain rule. It's not that hard to understand.

rule. It's not that hard to understand.

And he said, "Oh, no, no, I understood that. I just don't understand why you

that. I just don't understand why you don't give the gradient to a central a sensible function optimizer, which took us quite a few years to think about."

His raw intuitions about things were always very good. Hinton had misjudged Ilia as a wannabe. In university at just 17 years old, brand new to the study of

neural nets, he was easily picking up concepts. even Hinton's brightest

concepts. even Hinton's brightest students struggled with. And over the next few years, Ilia would become Hinton's most valued pupil. He was

taking the lead on difficult research questions and making real contributions to Hinton's most pressing problems. Lately, they've been working on Hinton's

career defining obsession to this point.

Deep belief networks.

Hinton's biggest problem is that it seems impossible to make a large neural network, one capable of impressive feats. At a certain network size, the

feats. At a certain network size, the model just fails to learn more stuff.

And deep belief nets were hinted solution to this. You can't train a giant model. You just train a bunch of

giant model. You just train a bunch of tiny models and duct tape them together.

It sounds almost too simple to work. I

mean, someone must have tried this before, right? But no, nobody's tried it

before, right? But no, nobody's tried it in this specific fashion. There are only a few dozen people on Earth seriously studying this stuff, and they collectively had less funding than

Google's annual cafeteria budget.

In the 34 years since Minsky published Perceptrons, neural nets had actually graduated from solving toy problems like recognizing shapes to doing practical

yet unremarkable things that had smallcale commercial appeal. At Bell

Labs in the 1980s, Yan Lun created a neural net that AT&T used in ATM machines. Hinton worked with a Wall

machines. Hinton worked with a Wall Street firm to predict stock prices, which actually made money for a bit until the signals got overcrowded. I was

actually once the technical guy on a little mutual fund. There was a neural net that decided what sort of phase of the market you were in. And there's

another neural net that told you which stocks would do better than the market in 6 months time. It actually performed extremely well.

>> Then there was a quiet neural net revolution in Japan during the 1980s which was largely invisible to Westerners. They were putting neural

Westerners. They were putting neural nets in camcorders for stabilization, using them in industrial applications like detecting flaws in welding jobs and even sticking tiny neural nets in rice

cookers to learn how sticky you like your rice. The problem for Hinton was

your rice. The problem for Hinton was that neural nets were invisible and unremarkable. They were doing little

unremarkable. They were doing little background tasks that while useful were just things we took for granted. It was

a massive difference from Rosenblat's predictions of reaching human level intelligence within a few years back in the 1950s.

>> I confidently expect that within a matter of 10 or 15 years, something will emerge from the laboratories which is not too far from the robot of science fiction fame. And despite decades of

fiction fame. And despite decades of marginal gains, Hinton still believed in that Rosenbladesque vision of neural nets that reach or even exceed human capabilities. But you weren't going to

capabilities. But you weren't going to do that with these baby models used for recognizing digits or learning how you like your rice. The models of the day were equivalent to maybe the brain of a

fruitly. Back of the napkin math says

fruitly. Back of the napkin math says you need models about a half a million times larger just to be equivalent to the size of the human brain. So these

deep belief nets were the first step to climbing that mountain.

Hinton, Ilia, and the rest of the lab at the University of Toronto spent the last few years working on deep belief nets.

And it was this year that Hinton published the first research on the subject called a fast learning algorithm for deep belief nets. 3 weeks of training the model on a university

supercomput and they made a model significantly larger than the current state-of-the-art.

The experiment to make a large neural net worked but just barely.

Its performance benchmarked by its accuracy and digit recognition did improve compared to the smaller models.

So in that sense it was a worthy proof of concept but it wasn't groundbreaking.

It was marginal not exponential improvement.

So in a sense, Hinton's hypothesis was correct that a bigger model equals a bigger brain and this paper provided a blueprint to successfully train a bigger

model. But it was just better at solving

model. But it was just better at solving the same unremarkable sort of background problems. So the question is what's next? How do we get to this Rosenblat

next? How do we get to this Rosenblat vision? Fulfilling Rosenblat's visions

vision? Fulfilling Rosenblat's visions of machines that could see, speak, and think would require a miracle.

And that miracle was sitting on the shelf of Best Buy for $499.

According to Moore's law, we'd need decades before computers could train a model approaching that of house cat intelligence.

Hinton was nearing 60 years old. He

didn't have decades, so there had to be another way.

But somewhere along the way, somebody made the relatively mundane realizations that GPUs, mostly used for PC games like Grand Theft Auto and Counterstrike, were

accidentally perfect for the exact type of math that neural nets run on. Matrix

multiplication.

Theoretically, using a $500 GPU would be like having a supercomput from 10 or 20 years in the future. And just as this became clear, Nvidia made their GPUs

programmable. So suddenly they weren't

programmable. So suddenly they weren't just for graphics anymore.

The only problem was that nobody in Hinton's lab knew how to program a GPU.

That was John Carmarmac type of stuff.

AI researchers wrote math proofs and waxed philosophical about robots taking over. They didn't do low-level systems

over. They didn't do low-level systems programming.

So they needed an extremely rare type of person with both the Carmarmacesque skills and expert neural net knowledge.

In 2007, that person did not exist until another strange Soviet kid wandered into Hinton's office.

His name was Alex Kruvki, and he was nothing like Ilia or Hinton.

Alex didn't dream about what AI could be one day. He just liked programming and

one day. He just liked programming and solving problems. And some of the bedroom experiments he was running, training neural nets on GPUs, were already more impressive than

some of Hinton's models, all on his $1,000 gaming computer at his mom's house.

If this works, it's a huge jump in computing power, going from wood to iron tools in Minecraft, from Glocks to AKs and Counter-Strike. But it's still an

and Counter-Strike. But it's still an open question if this will work at all.

Companies like Nvidia and Silicon Graphics were making GPUs way back in the '90s.

If they were some magical key to AI, surely someone would have noticed by now.

Yet, there was no theoretical reason it shouldn't work.

The dark years of neural nets were full of these moments. Breakthroughs that are obvious in hindsight that sat untouched for decades. But with maybe a few dozen

for decades. But with maybe a few dozen people seriously studying neural nets worldwide and tiny budgets, plenty slipped through the cracks.

Hinton learned one thing from deep belief nets. Scaling the size of neural

belief nets. Scaling the size of neural networks kind of works.

Bottleneck was how much you could spend on server hardware. not necessarily your research savvy. The theoretical research

research savvy. The theoretical research tests proved this out. And now with the realization that GPUs fast forwarded Moore's law by a couple decades, they finally had the pieces in place to

actually do useful things. But they had to do it fast because time was running out. Hinton was getting older, 64 years

out. Hinton was getting older, 64 years old now. Ilia and Alex were wrapping up

old now. Ilia and Alex were wrapping up their PhDs and would be leaving Hinton's lab soon. Google was quietly

lab soon. Google was quietly experimenting with neural nets for image recognition with unlimited money.

Meanwhile, other researchers in academia were figuring out GPUs.

If they didn't make a breakthrough, another lab would.

So, the only thing they needed now was a target, a proof of concept to train neural nets on GPUs.

And thank God that Hinton is so petty.

when a friendly adversary, a neural net skeptic, was publicly doubting neural net technology, Hitton called him up to trash talk and accidentally blurted out that he and his students could solve one

of the toughest problems in AI, image recognition.

Ever since the 50s, researchers have been trying to train neural nets and other AI models to identify objects within images. Show it a picture of a

within images. Show it a picture of a dog and it will tell you, "Yeah, that's a dog." Cracking the problem was a

a dog." Cracking the problem was a potential billion-dollar solution. It

could power self-driving cars, medical imagery like X-rays, and automated surveillance. But 60 years later, the

surveillance. But 60 years later, the current solutions are still very brittle. They make obvious errors, and

brittle. They make obvious errors, and they can only recognize a small amount of objects.

A true killer app for image recognition would be a general purpose, highly accurate system that you didn't need to hire several people to babysit.

and that didn't exist yet and there was little public hope that it would soon.

So, Jatendra Malik, the skeptic hint and called the trash talk, has spent his entire career trying to make this image recognition thing happen, and he's only

made marginal progress.

He's tried all the methods, and for him, neural nets have showed no promise.

Meanwhile, Hinton, mostly unfamiliar with the field of computer vision, was basically telling Malik that he could solve his life's work in just a few months as a side project. From the

outside looking in, Hinton probably rubbed many as arrogant and bitter, constantly claiming he could solve life-changing problems. And yet, neural nets were still incapable of mundane

utility. But then Malik nor the rest of

utility. But then Malik nor the rest of the AI community knew about Hinton's GPU breakthrough. They didn't know that the

breakthrough. They didn't know that the problem wasn't neural nets. It was

computers. And that problem, at least they hoped, was now taken care of.

So, if you wanted to prove that you can give a computer a set of eyes back in the early 2010s, you would train it on ImageNet, the largest data set of images ever collected. It's millions of images

ever collected. It's millions of images in thousands of different categories. If

an AI model could learn this data set, it would basically solve that billion-dollar question.

But the problem was few even bothered to try. It was so many times bigger than

try. It was so many times bigger than the standard data sets of the time. For

reference, Pascal, the most popular image data set of the time, was just 20,000 images across 20 classes, with classes being things like dog, cat,

truck, or person.

Imageet was 14 million images across 20,000 different classes. You weren't

training a computer to recognize a dog.

You were training it to recognize a noral terrier. And you had to make sure

noral terrier. And you had to make sure it didn't mix that up with the Norwich Terrier. Here's those two side by side.

Terrier. Here's those two side by side.

Can you even recognize the difference with your eyes?

The way the AI community gauged state-of-the-art and computer vision was to hold annual contests. Everyone takes

the same data set like Pascal or Imagenet and trains an AI model on the images. The model that can most

images. The model that can most accurately guess which class a picture belongs to wins the contest.

The contest for Pascal, the small data set, regularly had a few dozen teams participating, but only four teams entered the ImageNet

contest in 2011.

And the winning model had just 74.3% accuracy. not even close to good enough

accuracy. not even close to good enough to be useful in the real world.

So, this is what Hinton was signing them up for. A challenge so tough that most

up for. A challenge so tough that most didn't even bother trying. And they had just a few months to turn in a working model.

And what's more is their hardware budget was basically just Alex's $1,000 gaming computer at his mom's house. Their

competitors, meanwhile, had six figure server racks.

There's a USB stick in Hinton's pocket.

On it is either a path to AGI or professional suicide.

In a conference of 500 computer vision researchers, maybe a dozen believed that neural networks could work. The rest

were here to watch them fail.

From the outside, failure looked likely.

They were one of just seven teams brave enough to take on ImageNet that year.

The Pascal contest had 24 entrance.

Rumors circulated around the conference that one of the six teams in the Imagenet competition were using a neural net, which was confusing because that's

never happened before.

In the 6 years Pascal's been running and the two years for ImageNet, not one of the 100 plus teams has entered a neural net.

And the first time it does happen, it's Jeff Hinton, the biggest name in the niche world of neural nets.

So to put that in perspective, the last time most of these researchers heard about neural nets and vision was probably Yan Lunberg in the '90s, the

digit reading stuff that AT&T used.

And yet Hinton is entering the competition as if he's basically fast forwarded a decade plus of progress and he's entering the Imagenet contest

challenge so hard that basically everyone was shying away from it. He's

not even starting with the far easier Pascal contest.

So, naturally, people were speculating.

Hinton and his crew either found a time machine or they're going to embarrass themselves hard.

It's Friday, the last real day of the conference. After a week of

conference. After a week of presentations, today is the day for the two big competitions, Pascal first and then Imagenet. Throughout

then Imagenet. Throughout the Pascal competition, the auditorium is mostly half empty, the usual. This

contest's been running since 2006.

There's not much left to discover.

But once it came time for the Image competition, there wasn't an empty seat in the house. People are sitting on the floor, standing in the doorway, peeking over one another's shoulders to get a

look.

Vision contests never draw crowds like this.

Everyone knew something was coming.

The question was what?

The host calls up team supervision led by Alex Kusvki.

And out comes Alex Kusvki who'd clearly rather be coding than presenting.

Alex introduces himself. His voice

cracks. He's got zero stage presence.

>> My name is Alex Kvski. Um, I'll tell you about I'll tell you about some of this work that we did with um some of my awesome collaborators, Elia Sutzk and Jeff Hinton. And yet nobody's checking

Jeff Hinton. And yet nobody's checking email. Everyone's locked in like they

email. Everyone's locked in like they know something's coming. He details the architecture of the neural net. He talks

about the GPUs they used. All pretty

run-of-the-mill. Uh so the model is pretty simple. It's it's completely

pretty simple. It's it's completely supervised. It's it's deep it's a deep

supervised. It's it's deep it's a deep convolutional neural net that has five convolutional layers. Um two fully

convolutional layers. Um two fully connected layers. It's

connected layers. It's uh what else?

Yes, it's trained with very ordinary SGD.

It's trained on two Nvidia GPUs. It was

trained actually um in my bedroom.

But then he gets to slide three and there's one bolded number, 84.7%.

The previous best was 74.3%.

They had just obliterated the state-of-the-art on their first try and the room erupted in chaos. Arguments

started breaking out.

Researchers are saying the data set was flawed, that the code won't replicate, that this approach won't scale. Even

Jendra Malik, the professor who Hinton made the bet with originally, was still skeptical. He wanted to see it work on

skeptical. He wanted to see it work on more data sets. But within a year, he'd be writing neural nets papers himself.

The next team was supposed to present after them, but nobody remembers what they showed.

20 years of computer vision research were just rendered obsolete with a single PowerPoint slide.

And from here on out, nearly all computer vision research would be neural nets. The entire field would now have to

nets. The entire field would now have to learn Hinton's language.

There's 40 years in the wilderness, decades of mockery, funding cuts, dismissal.

People still referencing Minsky's book anytime Hinton brought up neural nets.

At 64 years old, with back problems so severe he sometimes woke up paralyzed, Hinton made history in the final hour.

So while Hinton was arguing back and forth with the skeptical audience, I was probably just in his head doing math.

Every experiment they did leading up to this made one thing clear.

The more compute you throw at training, the smarter the model gets. It's almost

linear.

And they just changed the entire field with $1,000 worth of GPUs.

What could they do with $100,000 or $10 million?

Google's revenue last year was $38 billion.

Multiplying Alex Net by 10,000 would be a rounding error for them. So, the

implications were obvious.

Within hours of Alex's presentation at a conference of maybe 100 people, every venture capitalist in Silicon Valley knew the names Ilia Sudskver, Alex

Kusvki, and Jeffrey Hinton. from

irrelevant to influencing the flow of billions of dollars with no time to prepare for that.

Within weeks, millions of dollars are changing hands, corporate buyouts, GPU purchase orders, and every tech company is doing it.

The recipe was so simple. More GPUs and more data. No governor, no discipline,

more data. No governor, no discipline, just the race to scale the fastest.

These were academics in a field that's been dead for 30 years. They were

suddenly getting million-dollar offers and blank check budgets to train models.

They could build whatever they wanted.

They weren't trained to be skeptical of the industry because before they had never had to be.

And Google moved by far the fastest.

Within 2 months of Alexet, they bought Hinton's team for $44 million. They had

no products. They had no assets. Google

was just buying the brains of Alex, Ilia, and Hinton for $44 million. And in

2014, they bought AI Lab Deep Mine for $500 million.

Larry Paige gave one of his lieutenants a simple mandate. Fly around the world and acquire any promising AI team with basically no budget limit. If you

published an interesting AI paper around that time, you were getting calls from Google recruiters offering to pay you like an NFL player.

And the same recipe that Alex Net provided continued to work. Throw more

GPUs and data at things until it does something scary.

For the first time, neural nets were being deployed at scale. Image

recognition and Google Photos or language models and Google search. the

stuff they were building wasn't staying in university labs anymore. It was no longer theoretical. So, a realization

longer theoretical. So, a realization started to dawn on some of these AI researchers. That for decades, the idea

researchers. That for decades, the idea of AI safety was basically just a thought experiment. It was fun

thought experiment. It was fun speculation about the idea of robots maybe taking over one day.

Now, it was a question of when a dangerous model might ship to billions of people through Google.

These researchers spent their careers just trying to make the math work, just trying to get something to work. Now,

they had to consider what were they building?

Is it dangerous?

And who should control it? Should Google

control it?

But the question was already answered.

It was the company that removed don't be evil from their motto, who helped the NSA with mass surveillance.

the company that was a surveillance machine themselves.

Google had acquired the entire field before anyone thought to ask if that was a good idea for society.

And Ilia, a Google employee himself, was starting to feel this the most. He'd

helped create the breakthrough that started this AI race. And now he was working for the winner, watching them scale it faster than anyone could think through the implications.

It's summer 2015. At a private room at the Rosewood Hotel in Silicon Valley, Ilia is sat across from Sam Alman, president of Y Combinator, which is the

most powerful startup incubator in Silicon Valley, and Greg Brockman, former CTO of Stripe. It's three of the

sharpest guys in tech just waiting.

And finally, the guest of honor, Elon Musk, shows up an hour late. The entire

restaurant is put on notice when he comes in. Elon's already talking before

comes in. Elon's already talking before he takes a seat.

He was there for Ilia and his message was basically, Ilia, super intelligence is coming.

>> I mean, with artificial intelligence, we are summoning the demon >> probably within the decade. Google's

building it and you're helping and nobody's going to stop this. So,

someone else needs to act as a counterweight.

Elon had already ruined his friendship with Larry Page over this topic. Larry

Page turned to me.

>> He goes, "You know, if I were to get hit by a bus today, I should leave all of it to Elon Musk."

>> Really? Yeah.

>> He said that.

>> Yeah.

>> So, like he's like he's a good friend of mine. I met Larry before he got venture

mine. I met Larry before he got venture funding. Paige called Elon a specist for

funding. Paige called Elon a specist for caring about human extinction from AGI.

Paige thought it was small-minded and tribalist.

That's who's building this stuff.

So his pitch was this.

Leave your multi-million dollar Google salary, leave your unlimited compute budget, and come work at a nonprofit that will pay you like a recent college

grad. And you're going to work out of an

grad. And you're going to work out of an old chocolate factory in San Francisco.

Your new boss is this guy, Sam Alman, who has zero AI experience, but is a killer tech investor, which admittedly is pretty useless at a nonprofit.

On the other hand, we have Greg Brockman, also zero AI experience, but is brilliant at optimizing huge mature code bases, which is really exactly what

you don't need at a scrappy research lab that has no code.

It was a bad and messy pitch that made no sense at all except for two things.

One, it was Elon Musk asking you to help him save the world. And two, he was the only one asking.

December 2015 at the neural information processing systems conference in Montreal Canada Ilia said yes. He was the credibility.

Him joining in inspired other researchers to join. Without Ilia, it was just Elon Musk and two guys with zero AI experience asking researchers to

take massive pay cuts.

They announced that the biggest AI conference of the year with a blog post introducing Open AI, but it almost didn't happen. Google's

Deep Mind was literally cornering people at the conference, making counter offers on the spot. But enough researchers signed on.

The combination of Elon's vision, mounting safety concerns, and Ilia's presence was enough to overcome entry-level salaries and minimal perks other than making the world a better

place.

Their mission statement read, "OpenAI is a nonprofit artificial intelligence research company. Our goal is to advance

research company. Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return. Since our

research is free from financial obligations, we can better focus on a positive human impact.

The idea was to emulate the great research labs of the 20th century, Bell Labs and Xerox Park. Get the world's smartest people in a room. let them

build the future unconstrained by profit motives or project managers.

The idealism was kind of compromised from the start. Months before they even existed, Sam Alman was emailing Musk about having an ongoing conversation

about what work should be open sourced and what shouldn't. And even Ilia from early on was fully aware of appealing to open source ideals to recruit researchers.

In a January 2016 email to Musk, he wrote, "As we get closer to building AI, it will make sense to start being less open. The open and open AI means that

open. The open and open AI means that everyone should benefit from the fruits of AI efforts built, but it's totally okay to not share the science, even though sharing everything is definitely

the right strategy in the short and possibly medium-term for recruitment purposes.

Open source was a recruiting tool from the start.

After the initial luster of the Elon Musk funded AI nonprofit wore off, Open AI was rootless.

They had some of the top researchers in AI, but no vision, no leadership.

Everyone was off doing their own personal projects.

They read about Bell Labs and assumed you could just copy and paste the formula of hiring smart people, give them freedom, and wait for breakthroughs.

That's not how it was working out.

Instead, researcher focus was fragmented across several projects, ripping off the latest Deep Mind paper or doing gimmicks like making a robot hand solve a Rubik's

cube. When asked what OpenAI's goal was,

cube. When asked what OpenAI's goal was, Greg Brockman said, "Our goal right now is to do the best thing there is to do.

It's a little vague."

The luster was clearly wearing off. In

March 2016, just 4 months into OpenAI's launch, Google's DeepMind beat the world champion at Go, a game most viewed as impossible for computers because it required intuition, not just

calculation.

Over 60 million people watched Alph Go demolish lease at all.

OpenAI was just building Python libraries.

And Ilia had a realization.

In 2016, he made the biggest neural net breakthrough since the 1980s because of scale. Alex's GPUs were multiple times

scale. Alex's GPUs were multiple times more powerful than the CPU rigs they were used to training on, allowing them to train a bigger model on more data for longer.

And the neural net revolution that soon followed was built on that very idea.

That's why Google and Facebook were building giant GPU data centers. But

despite being one of the three people who made this breakthrough, the core lesson had escaped him. At the precipice of this change, he left Google's

worldclass GPU farm for a scrappy startup with a few dozen MacBooks.

It's becoming abundantly clear that to hold a candle to Google, they need billions of dollars worth of compute.

The nonprofit open-source structure that got them here, that yanked top researchers away from millions of dollars, is now the very roadblock to scale.

Elon's funding won't be enough. Begging

every tech billionaire for donations won't be enough.

And basically, everybody knew it, too.

Leadership was already internally planning to shed the nonprofit structure by the end of 2017.

Internal emails show Greg Brockman planning this very thing. They drew in idealistic researchers with appeals to open source standing up to Google Bell

Labs nostalgia.

But now they needed to become Google just to have a chance at fighting them.

September 20th, 2017, Ilia and Greg are given an impossible choice.

Elon put a deal on the table. Fold open

AI into Tesla. It'll give them unlimited compute budget, leadership from the greatest entrepreneur of their generation, and everything they needed to compete with Google. But OpenAI would

become a subsidiary of Tesla. Elon would

get sole control, CEO and supermajority board power, the very same concentration of power they founded OpenAI to prevent.

The alternative is Elon walks. Their

only real source of funding and recruiting power disappears.

Elon was ready to make Tesla the cash cow for OpenAI's research, but the price was total control.

In the final hour, IA just couldn't do it. He and Greg drafted an email to Elon

it. He and Greg drafted an email to Elon that was vulnerable and honest and detailed every concern they had with the structure.

This process has been the highest stakes conversation that Greg and I have ever participated in. And if the project

participated in. And if the project succeeds, it'll turn out to have been the highest stakes conversation the world has seen.

We did not speak our full truth during a negotiation.

We have our excuses, but it was damaging to the process, and we may lose both Sam and Elon as a result.

Elon's response came the same day.

Guys, I've had enough. This is the final straw. Either go do something on your

straw. Either go do something on your own or continue with OpenAI as a nonprofit. I will no longer fund OpenAI

nonprofit. I will no longer fund OpenAI until you have made a firm commitment to stay or I'm just being a fool who is essentially providing free funding for you to create a startup. discussions are

over. To be clear, this is not an ultimatum to accept what was discussed before. That is no longer on the table.

before. That is no longer on the table.

Elon made the impossible choice for them.

By January of 2018, they're considering an initial coin offering from Elon's checkbook to hoping to become a memecoin.

and Reed Hoffman stepped in to cover salaries as a stop gap, but they can't compete with Google on donations.

The irony once again was that they recruited researchers with the promise of open- source and nonprofit purity.

But those principles were now the roadblock to building what they needed to build. to underline it once again.

to build. to underline it once again.

They needed to become Google to fight Google.

It's May 2018.

Internal Open AI research confirms what Ilia already knew intuitively. Since

Alexet, the compute used in breakthrough AI research has multiplied by 300,000, about 10x per year. Scaling not only

works but it's a requirement but what the scale image that was curated 14 million labeled images hand organized into categories

there was no equivalent data set at the scale they needed and then a Google paper started circulating called attention is all you need.

The core idea was this. You could train a neural net to predict the next word in a sequence of text to improve translation. You know, like English to

translation. You know, like English to Chinese.

But Ilia and others at OpenAI saw something bigger.

Language is universal. It could

represent anything. Python code,

scientific papers, form arguments, instruction manuals.

If you can express it in text, you can train on it. This theoretically means you don't need a curated data set. You

could train on anything. The entire

internet perhaps.

Google invented it and they had infinite compute to scale it. And they basically had a copy of the internet from Google search.

But Google also had a hundred billion dollar advertising business that a general purpose AI could destroy.

Open AI had no such conflict of interest. to Ilyasaw for the first time

interest. to Ilyasaw for the first time perhaps a path to artificial general intelligence.

In June 2018, they published their first proof of concept model based on this architecture. They called it GPT,

architecture. They called it GPT, generative pre-trained transformer.

Tiny by modern standards, it's barely twice the size of Alex. Then it was trained on a few thousand self-published books.

could sometimes form coherent sentences and occasionally correctly answer questions, but the AI community mostly ignored it.

Internally, however, the results were proving the same old lesson. Performance

scaled with compute. Make it bigger, feed it more data, and it got better.

Their blog post was explicit about this.

This suggests there is significant room for improvement using the well- validated approach of more compute and data.

translation was, "We know how to make this work. We just need billions of

this work. We just need billions of dollars to scale."

And months earlier, they just turned down Elon's billions of dollars to preserve their nonprofit principles.

And now they had a breakthrough potentially bigger than Alexet. And

those billions were exactly what they needed to realize it. And it was those nonprofit principles they needed to shed as the main obstacle.

July 2019, Sam found OpenAI Billions, but it came with baggage.

They struck a deal with Microsoft, giving them a billion dollars with potentially much more to come. But they

had to let Microsoft commercialize their models, take a cut of revenue, and become their exclusive compute provider.

Every compromise Ilia blocked from Elon, Sam just handed to Microsoft.

And the difference was was that Elon wanted control to ensure safety, or at least he claimed so. Nobody was under any illusion that Microsoft cared about

safety. They had shareholders to

safety. They had shareholders to satisfy.

Fast forward to May 2020 and they use Microsoft's money to train GPT3.

Over 100 times bigger than GPT2 and a thousand times bigger than the original GPT1. It proved everything that naive

GPT1. It proved everything that naive scaling worked.

Feed it more compute, more data, and intelligence emerged. Could write

intelligence emerged. Could write coherent essays, debug code, pass high school exams. It's the breakthrough they've been chasing.

But most of the team that built it didn't want to release it. The world

wasn't ready and they weren't ready.

This wasn't a gradual process.

Intelligent AI arrived basically overnight.

But Sam wanted to productize it quickly, sell API access and get some revenue and also maybe put it into some Microsoft products.

Dario Amade led the GPT3 team.

He later said he felt psychologically abused by Sam's pressure to ship. And by

late 2020, Dario left OpenAI to start Anthropic, a rival AI lab. And most of his team that worked on GBD3 followed him.

OpenAI now had the money, and they just lost the people who cared the most about using it responsibly.

Except for Ilia. Ilia stayed. He still

believed.

It's November 30th, 2022 at an OpenAI recruiting party at the Nurups conference in New Orleans, Louisiana.

Most of the Open AI team is networking, drinking, just having a good time. But

one researcher won't leave his laptop.

An OpenAI recruiter tells him to have a drink and be normal and social. The

researcher says, "No, all the GPUs are melting. Everything is crashing."

melting. Everything is crashing."

Earlier that day, OpenAI launched a small demo. They took GPT 3.5, an

small demo. They took GPT 3.5, an 8-month-old model, and wrapped it in a chat interface. Previously, their models

chat interface. Previously, their models had only been available via the API, so developer only. This was just a little

developer only. This was just a little preview for regular people to play with GPD models. Nothing new. This was an old

GPD models. Nothing new. This was an old model with just a website interface.

They called it chat GPT and internal expectations were so low that the wildest scenario engineers prepped for was 100,000 users. Sam Alman

didn't even tell the board that they were launching it.

>> When Chacht came out, November 2022, the board was not informed in advance about that. We learned about Chat GBT on

that. We learned about Chat GBT on Twitter. But then Japanese Twitter woke

Twitter. But then Japanese Twitter woke up and it turns out Chad GPT could write Keo, the formal polite thank you emails

that are hugely culturally important in Japan. And Chad GPT was really good at

Japan. And Chad GPT was really good at it. So word spread and not just in Japan

it. So word spread and not just in Japan but everywhere.

Within hours the servers were overloaded.

Within 2 days they had a million users.

Within 2 months they had a 100 million users became the fastest growing tech product in history by accident.

So back at the party the researcher watching the GPUs melt knew that everything had just changed possibly forever.

Overnight, OpenAI went from a research lab to the biggest startup in Silicon Valley. Chat GPT was growing faster than

Valley. Chat GPT was growing faster than Uber at its peak. So, there was no going back. Investors would not allow it.

back. Investors would not allow it.

There's no putting the toothpaste back in the tube.

Internally, Google declared a code red to make some sort of response to Chat GBT. Meta rushed out an open- source

GBT. Meta rushed out an open- source chatbot, Llama. And when Google did

chatbot, Llama. And when Google did launch their chatbot, Bard, it was a complete disaster. We are missing the

complete disaster. We are missing the We're missing the phone.

We will have to We have no Okay, we're going to move on.

We can't find the phone.

>> Meanwhile, Wall Street was punishing Apple for moving too slow on AI.

Trillions of dollars in market value now move based on Sam Alman's tweets and company's perceived relationships with open AI and Sam is suddenly everywhere. Magazine

covers, congressional hearings, >> I'm doing this cuz I love it.

>> Dinner with presidents, multi-billion dollar fundraising meetings, resources were getting rounded from research to supporting Chad GPT.

They had their lead on Google and Sam wasn't going to give it up.

Safety kept sinking lower and lower on the priority list. The rapid growth forced them to hire aggressively and they were hiring a different type of employee.

These new hires hadn't spent a decade thinking about AI alignment. They didn't

have PD Doom estimates. They didn't read less wrong.

They had equity packages and they wanted to see them vest.

So the mission was being diluted with every hire. The culture that attracted

every hire. The culture that attracted the original team, the reason they take the pay cuts to work there originally, was now evaporating.

And Ilia started to notice Sam's behavior shifting pretty dramatically as well. You could no longer rely on his

well. You could no longer rely on his verbal agreements. He'd tell different

verbal agreements. He'd tell different people what they wanted to hear, and he would set up these internal factions against each other. because of it.

He told a board member that he got safety approval to release GPT4 Turbo, a smaller version of GPD4, but he never did. He completely lied.

He also launched the OpenAI startup fund, a venture capital fund that invested in OpenAI partners entirely in his own name with no financial relationship to OpenAI.

Then Ilia felt gas lit when Sam told him he'd be leading the research direction of the company. meanwhile telling

another colleague the same exact thing.

So these two teams ended up working in parallel on the same projects, wasting months of time in compute.

The pattern was clear. Sam was starting to treat OpenAI more like a move fast and break things traditional Y Combinator startup than a missiondriven nonprofit that they had originally

built.

Ilia was losing faith in Sam while becoming convinced AGI was imminent.

So much so that in summer of 2023, he tabled most of his AI development research to focus full-time on AI safety.

The thinking was simple. If we don't solve alignment before AGI comes, that would probably be really bad for humanity.

So, he announced Super Alignment, a massive project to ensure that super intelligent AI could be controlled with a 4-year timeline.

Sam committed 20% of OpenAI's compute to the project. Pretty massive public

the project. Pretty massive public promise. While Chat GPT is taking over

promise. While Chat GPT is taking over the world, in the blog post announcement, they say, "We need scientific and technical breakthroughs to steer and control AI

systems much smarter than us. To solve

this problem within four years, we're starting a new team co-led by Ilia Sutzber and Yan Leica and dedicating 20% of the compute we've secured to date to

this effort.

But the compute never came, at least not in the promised quantities.

And months continued to pass, chatbing continued to take priority. enterprise

deals and maintaining the lead over Google was much more important to Sam.

Safety kept getting pushed to the next quarter.

In October 2023, OpenAI board member Helen Toner published an AI safety paper.

It was pretty boring academic stuff buried in a niche journal. But one line from this paper would trigger everything

that's about to happen. It goes,

"By delaying the release of Claude, Anthropic was showing its willingness to avoid exactly the kind of frantic corner cutting that the release of chatbt

appeared to spur.

It was a pretty subtle dig at Open AAI, but not crazy for a safety researcher who's supposed to be holding OpenAI accountable as a board member. But Sam

wasn't happy. So, he called Toner and they talked it out and the issue was resolved. Or so it seemed. A few weeks

resolved. Or so it seemed. A few weeks later, Sam told Ilia that board member Tasha Macaulay thought Toner's paper was a fireable offense, that Toner should be

removed from the board for this.

Ilia and Tasha barely spoke, but just by chance, Ilia had been voicing concerns to the board about Sam's leadership that week.

When Ilia mentioned Sam's claim to Tasha, she was very confused. She had

never said anything like that.

So Sam lied. He was trying to manipulate the board composition and replace a safety focused member with someone friendlier to him. And he'd been caught.

So, Ilia started tallying everything.

Sam broke his super alignment compute promise, 20% promised and not delivered.

He lied about getting safety approval before releasing some models. He

launched a personal VC fund using OpenAI's name with no disclosure to the company. He told Ilia he'd lead research

company. He told Ilia he'd lead research and told another colleague the same thing, setting them against each other.

He routed resources away from safety to scale chat GBT regularly and now lying to manipulate the board composition.

Pretty much any one of these would be grounds for firing at a tire company.

At a company building potentially the most powerful technology in history was pretty obvious what needed to be done. So Ilia opened up his phone and

done. So Ilia opened up his phone and started making calls.

It was time.

It's November 17th, 2023.

Sam joins a video call. Helen Toner,

Adam D'Angelo, Tasha Macaulay, and Ilia are all in the call waiting for him.

There was an ambush. The board was removing him as CEO and the reason was not consistently being candid with the

board was corporate speak to say you're a liar.

After the video call ended with the board, Sam immediately started working the phones. Within hours, Silicon Valley

the phones. Within hours, Silicon Valley heavyweights were closing the ranks around Sam. Brian Chesy, Reed Hoffman,

around Sam. Brian Chesy, Reed Hoffman, Paul Graham, Eric Schmidt, even Elon Musk said given the risk and power of advanced AI, the public should be

informed of why the board felt they needed to take such a drastic action.

And then inside OpenAI, the all hands meeting was brutal. An employee asked Ilia at the meeting, "Will we ever find

out why Sam is fired?"

IA's answer was a simple no, just the one word. The board had the

one word. The board had the documentation and they were just refusing to share it. So there was an immediate mutiny. The company, Greg

immediate mutiny. The company, Greg Brockman, quit. The head of research

Brockman, quit. The head of research quit. Top executives were threatening to

quit. Top executives were threatening to leave.

And then the rest of the employees learned that their tender offer was in jeopardy.

A VC firm called Thrive Capital was about to buy their equity at a $90 billion valuation. Life-changing money

billion valuation. Life-changing money was now at risk because the board fired Sam without explanation.

Then Microsoft announced they were hiring Sam and Greg to start a new AI lab with an offer to every single OpenAI

employee to join and keep their equity.

At this point, even Mera Marotti flipped. Despite supplying evidence for

flipped. Despite supplying evidence for Sam's firing, she now was backtracking and claimed it was just constructive criticism that she never actually

supported removing him. The final blow was over 90% of OpenAI employees signed an open letter demanding to reinstate

Sam and Greg or we all join Microsoft.

Eventually, even Ilia signed the letter.

He planned for a month and built the perfect case.

And just 5 days later, he was signing a letter begging Sam to come back.

Within 5 days of the firing, Sam and Brockman were back, more powerful than ever.

The board that tried to fire Sam was gone. Toner and Macaulay were removed.

gone. Toner and Macaulay were removed.

With only D'Angelo remaining, Ilia tried to stop Sam from having too much control.

And somehow the result was Sam in total control.

Sam, Greg, and Meera reached out to Ilia. They were offering him to come

Ilia. They were offering him to come back to Open Eye, and he was considering it, too. Open was still his life's work.

it, too. Open was still his life's work.

Then, at the last second, Greg Brockman called and called the offer off. Ilia

would never step foot in opening eyes offices again.

After the board coup, Ilia went dark.

>> Is he being held hostage in a secret nuclear facility?

>> No.

>> What about a regular secret facility?

No.

>> What about a nuclear non-secret facility? Neither of Not that either.

facility? Neither of Not that either.

>> After months of radio silence, IA finally emerged in May 2024 with the announcement that he was starting a new AI lab, safe super intelligence.

In the press release, they said that they had one goal and one product, a safe super intelligence.

And it went on to say that our singular focus means no distraction by management overhead or product cycles and our business model means safety, security, and progress are all insulated from

short-term commercial pressures.

Today, SSI sits at a $ 32 billion valuation.

They've released no products and no research.

Exactly one year later, SSI CEO Daniel Gross took a massive buyout offer to join Meta's AI lab.

Ilia took over a CEO and he rejected Meta's offer to buy the whole company.

About 20 years earlier, a 17-year-old Ilia knocked on Jeffrey Hinton's door on a summer weekend.

That knock changed everything.

Hinton years later when he was asked about his life's work, it makes me sad. I don't feel particularly guilty about developing AI

like 40 years ago because at that time we had no idea that this stuff was going to happen this fast. We

thought we had plenty of time to worry about things like that. It's not like I knowingly did something thinking this might wipe us all out, but I'm going to do it anyway.

>> Mhm.

But it is a bit sad that it's not just going to be something for good.

Loading...

Loading video analysis...