Sora 2 - It will only get more realistic from here

By AI Explained

Summary

## Key takeaways - **Two Sora 2 Models Exist**: OpenAI offers both a standard Sora 2 and a higher quality 'Sora 2 Pro,' with viral demos likely showcasing the Pro version, raising questions about general accessibility and OpenAI's profit motives. [00:42], [01:05] - **Sora 2 Rollout is Deliberate**: The invitation system, US/Canada initial focus, and iOS-only access for Sora 2 Pro are intentional safety measures for an iterative rollout, with API access promised soon. [01:16], [01:37] - **Training Data Influences Model Performance**: A model's superiority in specific prompts, like generating Cyberpunk scenes or anime, is heavily influenced by its training data, not necessarily overall intelligence. [02:45], [03:17] - **Sora App Prioritizes User Well-being**: OpenAI's Sora app includes features like periodic mood checks, user nudges towards creation over consumption, and strict content/likeness controls to ensure a net positive user experience. [06:53], [07:13] - **Periodic Labs Focuses on Automating Science**: With significant funding, Periodic Labs aims to automate scientific discovery by using AI to predict experiments, guide robotic execution, and analyze vast datasets, contrasting with Sora's focus. [10:09], [11:20] - **AI Will Pass Visual Turing Test**: While Sora 2 is a step towards passing the visual Turing test, the ultimate goal is creating indistinguishable artificial worlds across all senses, a path that is both exciting and treacherous. [14:11], [14:48]

Topics Covered

Sora 2: Two Versions, Deliberate Slow Rollout?
AI Model Superiority: It's All About Training Data.
Will Sora App Be Net Beneficial, Or Just Slop?
Sora App: A Clever Moat for OpenAI's Video AI?
AI Will Create Multi-Sensory Worlds Indistinguishable From Reality.

Full Transcript

Sora 2 is out and for some it will seem

like a slop feed shoggoth. For others a

glimpse at a genuinely generalist AI.

Visually it could be seen as the best

that has ever come out of text video but

for others barely an improvement on VO3

from Google. It's up to you of course

but I will try to focus on six elements

you might not have caught from the viral

release video and announcement. So, is

it all a distraction though from

physical science breakthroughs like

those promised by periodic labs or the

coding supremacy of Claude 4.5 Sonnet?

Depends who you ask. But let's get

started. First, a quick one. One detail

that many may have missed. There are

actually two Sora 2os. OpenAI said Pro

users will be able to use our

experimental higher quality Sora 2 Pro

initially on sora.com and then in the

app as well. But my question is where

did all the best demos come which you're

going to see in this video and which

went viral? Could it be that most of

those were Sora 2 Pro and therefore what

most people will access is just the

normal Sora 2. These things are

incredibly expensive to run, don't

forget. And OpenAI do eventually have to

make a profit. And then there's the

rollout. According to the Sora 2 system

card, that invitation system that which

is a bit jank is actually deliberate to

slow things down. That's maybe also why

it's only for the US and Canada

initially. iOS only premium but with

limits that will actually decrease as

new users join with no API though

apparently that's promised in the coming

weeks. All of that is deliberate and

part of the safety focused iterative

rollout strategy. Then there's the

inevitable comparisons initially between

Sora 1 and Sora 2, but I'm going to

throw in V3 demos too for reference made

via V3 preview on Gemini online and V3

quality with Google Flow. Now, note that

one of the leads for Sora 2 said that

that model is intelligent in a way that

we haven't seen with a video model

before. So, they're claiming it has the

best world model. You could say image to

video or video to video is not yet

allowed, although we'll get to cameos

later. All of this begs the inevitable

question as to which model is the very

best for video generation. And

comparisons are really hard to state

definitively. As I've said earlier on

we don't even know whether this is Sora

2 Pro or Sora 2. And even V3 has preview

quality versions, fast versions, and the

main V3. Also, I have seen credible

leaks that apparently VO 3.1 is going to

be released in the coming week or so.

I'm also going to make a point that I

think is fairly significant which is

models. I'm not just talking about Sora

and Vio but even LLM like Gemini and

ChatBT they are unbelievably

fundamentally dependent on the data sets

on which they're trained. So just

because for one particular prompt say of

a gymnast one model is clearly better

than the other doesn't mean it's better

all around. It might just have more

training data on that domain. Take this

game generation of Cyberpunk from Sor 2.

Now, I've never played that game, but

clearly according to reports, they must

have taken plenty of video tutorials

from that game and fed it into the

training data. Sora 2 can also generate

anime, much better than V3, apparently.

But again, think training data.

>> You, better, keep, that, wheel, steady, cuz

everyone's gunning for us.

>> Found, output, tokens, cost, more, than, input

tokens.

Yeah, apparently my words aren't worth

as much as the models.

>> Input, tokens, are, the, cheap, seats.

>> Then, there's, questions, of, copyright.

>> Transformer., That's, the, power, of, my

stand. Sydney Bing

>> and, ready, begin.

But that is going to have to be for

another video. I will note that certain

claims I've seen online about Sora 2

mastering physics are really overstated.

Take this video in particular. This was

touted by one of the leads on Sora 2 as

an exemplar of Sora 2 understanding

physics. I'm not sure about you, but the

physics in this one seems more video

gamey than real. Incredible realism, but

more like that from a video game. Look

how he bounces off the hoop. Now, what

about that almost social media app that

OpenAI launching called Sora? And Sam

Orman last night said that he could find

it easy to imagine the degenerate case

of AI video generation that ends up with

us all being sucked into an

reinforcement learning optimized slot

feed. Well, clearly OpenAI wanted to

distinguish their app from Vibes by

Meta, which was widely panned. Let me

know what you think. But for many, there

will be nothing less viby in the current

climate than a couple of billionaires

like Zuckerberg and Wang announcing the

launch of a new form of social media

full of quote AI slop. But putting vibes

to one side for a moment, I do think

it's a little bit more nuanced than

that. And to OpenAI's credit, they are

starting with some decent

differentiations. There will be no

infinite scroll for under 18s. Users

will be nudged to create rather than

consume. There will be watermarks, both

visible and invisible on all videos, as

well as strict opt-ins for your likeness

being used. Inputs will be classified

and then potentially blocked. And

outputs will go through a reasoning

model to see whether they should be

blocked. Like I said, you can't just

input an image and output a video or go

from video to video. So that's blocked.

And these categories are also blocked

from display. So, if you were hoping for

some wrongdoing, you'll have to look

elsewhere. Which brings me to the Cameo

feature, which is unique at the moment

at least, to, OpenAI's, Sora, app., For, this

feature, you can't just upload a video

of yourself, otherwise you get a bunch

of deep fakes, but you have to record

yourself saying things that OpenAI get

you to say. That kind of proves that you

are who you are, and then you can insert

your likeness into any new video or

existing video. This is at the moment a

unique feature available for Sora 2.

That's why you've been seeing all that

Samman content. And the intention is

that no one can take your likeness and

make a video of you without your

permission. And even if one of your

invited friends does that, you can then

delete the ones you don't like. Given

how low the bar is at the moment for

deep fakes, I actually commend them for

setting some standards. But the real

master plan can be found in Sam Alman's

blog post from just 18 hours ago, and

that has plenty of juicy details. you

may have missed. They were clearly very

hesitant about launching a social media

app, and you could see the hesitation on

their faces when some of the leads for

Sora 2 were announcing it. First

apparently, there are going to be

periodic checks in the app on how Sora

is impacting users mood and well-being.

I presume a lot of people are going to

spam thumbs up just to avoid being

blocked out of the app. But then comes

the centerpiece promise, which is big if

true. They will have a rule such that

the majority of users looking back on

the past 6 months should feel that their

life is better for using Sora than it

would have been if they hadn't. If

that's, not, the, case,, they're, going to

make quote significant changes brackets.

And this is key. If we can't fix it, we

would discontinue offering the service

almost like a guaranteed given to

fortoall the criticism that they knew

would be inevitable about launching a

social media app. And by the way, you

can direct message other people. So it

is social media. Taken at face value

this means that Sora does have to be net

beneficial for humanity to continue.

However, let's just say that if you look

at the track record, not every promise

issued by OpenAI has been fully upheld.

Just to take one example, the CEO of

OpenAI when it was launched said that

setting up this Manhattan project for AI

called OpenAI, they would obviously

comply with and aggressively support all

regulation. They now employ a whole

bunch of lobbyists who are partly

responsible for blocking certain pieces

of regulation. My prediction is that

this promise will be quietly forgotten.

Now, I say all that, but I must confess

that with Sora 2 and this app, my

feelings are about as mixed as they

possibly could be. You will very likely

be able to find me sending memes of me

in certain activities to some of my

friends., I, think, there's, going to, be

huge entertainment value and even some

practical, utility., As, one, of the, leads

for Sora 2 said, Will Depw, one of the

biggest bottlenecks in science at the

moment is good simulators for RL. But

then we can imagine elders and

eventually ourselves falling for the

kind of slop and not being able to

believe anything. Sam Orman even admits

if you just truly want to doom scroll

and be angry, then okay, we'll help you

with that. But that's quite a big zoom

out for now. My take is that a social

media app is actually quite a clever way

to build a moat in an environment that

doesn't have many at the moment. It is

so easy to flip from Sora 2 and just use

VO3 or soon VO3.1 or maybe cling 2.5

which we just announced. When Cream

becomes a video generator, you could

just hop to that. How do you get people

to stay using your video generator? How

does OpenAI make a profit? Well, if

you're locked into a social media app

and all your friends are on it and you

want to use your own or their likeness

but not have others use your likeness

well then you have the Sora app. So, I

think it quite cleverly locks you into

their system. OpenAI did also claim in

the launch video that Sora 2 is a step

towards a generalist agent. And I get

that they have to say that because the

company mission is officially we're

literally building AGI. So everything

has to get wrapped up into that vision.

But Sora 2 seems more like a side quest

that might add XP but isn't directly on

course. Much more on course for me would

be something like periodic labs. I

mentioned in my last video how

exploration and experimentation is one

of the last big blockers toward a

singularity. if you will. Even if you

solve hallucinations and the data

problem and the modeling problem, models

are still passive. They're not exploring

the world. Well, periodic labs want them

to automate science, run experiments

autonomously. I interviewed one of the

founders who came from Google DeepMind a

little while ago for a Patreon video.

And another one of their founders

William Fetis, came from OpenAI. I

believe he was behind chatbt in part.

This story, I realize, is almost the

polar opposite of Sora 2 because it's

immensely physical and in the real

world. It's also not available

immediately. But the idea roughly

speaking is this. If we want to for

example come up with a room temperature

superconductor or better solar

batteries, then there are a few

bottlenecks. First is running enough

experiments. Well, what if we could have

deep learning systems predict what an

experiment will yield and then have say

humanoid robots conduct those

experiments autonomously? That might

remove one bottleneck. Then what about

those terabytes and terabytes of data

generated by existing experiments that

LLMs can't use? What if a lab collected

all of that data in an LLM friendly

format which could then be fed into the

latest model? Finally, I think we all

know that there are just thousands and

thousands of papers out there that we're

never going to get around to read. So

what about an AI model optimized for

literature review? It could find from

the literature what are the most

promising experiments to run. Anyway

the big reveal is that Periodic Labs

with $300 million in funding is going to

work on all of those. Why even bring

this up? Well, partly to contrast with

Sora 2 and claims of being a generalist

agent, but also I guess for those people

who think all of AI is bad and it's

nothing but slob. In fairness, those

results aren't going to come overnight.

So, in the meantime, let's talk about

job opportunities that you could apply

for even today. The sponsors of today's

video are 80,000 hours and in particular

their job board, which you can access

through a link in the description. These

are jobs that are available around the

world, both remote and in person. And

you can see the list is updated daily.

The focus is on positive impact and as

you can see it spans from entry level to

senior roles. Again, if you're curious

check out the link in the description.

The obvious thing to say about Sora 2 is

the moment it's out that forever will be

the worst that AI ever is at video

generation. Likewise, Claude Sonicet

4.5, which is claimed to be the best

coding model in the world, although they

don't fully have the stats to back that

up, is I guess the worst that coding is

ever going to be via an LLM. By the way

just on that point about them not

backing up, I do get that they have

benchmarks showing that it's the best.

But then there'll be other benchmarks

showing that Codeex is the best. So

where's the definitive proof that on all

metrics it's the best coding model? But

that's for another discussion. I've been

testing Claude 4.5 Sonet for quite a few

days and to everyone's amazement, we

actually got an early result on simple

bench and yes, this is with thinking

enabled and it was 54%. Big step up from

Clawude Force on it and it does feel in

the ballpark of Claude 4.1 Opus when I'm

doing, coding, on, one, benchmark, at least

bench verified. It even beats Opus 4.1

and you might say, well that's already a

model from Anthropic, so what's the big

deal? It's like five times cheaper. you

try using Opus 4.1 in cursor and you

really do have to get the checkbook out.

For me, this just goes to show that a

few months after each new breakthrough

in AI, there is a breakthrough in price

wherein the earlier breakthrough is

suddenly then as cheap as the models

that came before that breakthrough. Or

to bring that back to video, there will

likely be a video generation model

released by some Chinese company which

is as good as Sora 2 with fewer filters

and way way cheaper in say 3 to 6

months. Before we end though, a quick

word on the future because it's almost a

given that in a few years you can

imagine a button on your TV remote that

you could press and just add your stored

face as a selected character in any show

that you're watching. That is coming.

It's just a matter of whether it's 2

years or 4 years away. Suddenly Netflix

will be all about you. But then here's

what I've been thinking about and

forgive me for the digression, but we

already have models that pass the

written touring test. As in, you can't

distinguish that you're talking to a

model, not a human. And in Sora 2 is

much closer to passing the visual one.

It's not there despite the hype posts

unless you're visually impaired

especially gullible, or just see a

couple of seconds at a glance. But I

think we have to admit we are getting

closer and closer to passing the visual

touring test, not being able to tell

that the video we're watching is real or

fake. But what happens after we pass the

visual touring test and then the audio

touring test and then the sematic

sensory system test so that we feel

artificial worlds in our nervous systems

and can literally touch them. You can

think of each of our senses like a

benchmark that we're getting closer to

crushing. What happens when we have

models that can create entire worlds

from scratch in real time that are

indistinguishable from reality according

to every sense we humans have? If we can

be fooled visually, why not with audio

or with touch or taste? When that

happens, we might look back to Sora 2 as

one step along that fascinating

exciting, and treacherous path. Let me

know what you think. Thank you so much

for watching and have a wonderful day.

>> What's, up, everyone?, Welcome, to, Sora, 2.

You finally made it. I'm so so excited

to see you here. I've been waiting all

week for this moment and it's real now

you're actually here. Yeah, those GPUs

behind me are literally on fire. It's

fine. We'll deal with that later. Right.

Knowledge is not a destination. It's a

companion for the road.

Loading...

Loading video analysis...