Sora 2 - It will only get more realistic from here
By AI Explained
Summary
## Key takeaways - **Two Sora 2 Models Exist**: OpenAI offers both a standard Sora 2 and a higher quality 'Sora 2 Pro,' with viral demos likely showcasing the Pro version, raising questions about general accessibility and OpenAI's profit motives. [00:42], [01:05] - **Sora 2 Rollout is Deliberate**: The invitation system, US/Canada initial focus, and iOS-only access for Sora 2 Pro are intentional safety measures for an iterative rollout, with API access promised soon. [01:16], [01:37] - **Training Data Influences Model Performance**: A model's superiority in specific prompts, like generating Cyberpunk scenes or anime, is heavily influenced by its training data, not necessarily overall intelligence. [02:45], [03:17] - **Sora App Prioritizes User Well-being**: OpenAI's Sora app includes features like periodic mood checks, user nudges towards creation over consumption, and strict content/likeness controls to ensure a net positive user experience. [06:53], [07:13] - **Periodic Labs Focuses on Automating Science**: With significant funding, Periodic Labs aims to automate scientific discovery by using AI to predict experiments, guide robotic execution, and analyze vast datasets, contrasting with Sora's focus. [10:09], [11:20] - **AI Will Pass Visual Turing Test**: While Sora 2 is a step towards passing the visual Turing test, the ultimate goal is creating indistinguishable artificial worlds across all senses, a path that is both exciting and treacherous. [14:11], [14:48]
Topics Covered
- Sora 2: Two Versions, Deliberate Slow Rollout?
- AI Model Superiority: It's All About Training Data.
- Will Sora App Be Net Beneficial, Or Just Slop?
- Sora App: A Clever Moat for OpenAI's Video AI?
- AI Will Create Multi-Sensory Worlds Indistinguishable From Reality.
Full Transcript
Sora 2 is out and for some it will seem
like a slop feed shoggoth. For others a
glimpse at a genuinely generalist AI.
Visually it could be seen as the best
that has ever come out of text video but
for others barely an improvement on VO3
from Google. It's up to you of course
but I will try to focus on six elements
you might not have caught from the viral
release video and announcement. So, is
it all a distraction though from
physical science breakthroughs like
those promised by periodic labs or the
coding supremacy of Claude 4.5 Sonnet?
Depends who you ask. But let's get
started. First, a quick one. One detail
that many may have missed. There are
actually two Sora 2os. OpenAI said Pro
users will be able to use our
experimental higher quality Sora 2 Pro
initially on sora.com and then in the
app as well. But my question is where
did all the best demos come which you're
going to see in this video and which
went viral? Could it be that most of
those were Sora 2 Pro and therefore what
most people will access is just the
normal Sora 2. These things are
incredibly expensive to run, don't
forget. And OpenAI do eventually have to
make a profit. And then there's the
rollout. According to the Sora 2 system
card, that invitation system that which
is a bit jank is actually deliberate to
slow things down. That's maybe also why
it's only for the US and Canada
initially. iOS only premium but with
limits that will actually decrease as
new users join with no API though
apparently that's promised in the coming
weeks. All of that is deliberate and
part of the safety focused iterative
rollout strategy. Then there's the
inevitable comparisons initially between
Sora 1 and Sora 2, but I'm going to
throw in V3 demos too for reference made
via V3 preview on Gemini online and V3
quality with Google Flow. Now, note that
one of the leads for Sora 2 said that
that model is intelligent in a way that
we haven't seen with a video model
before. So, they're claiming it has the
best world model. You could say image to
video or video to video is not yet
allowed, although we'll get to cameos
later. All of this begs the inevitable
question as to which model is the very
best for video generation. And
comparisons are really hard to state
definitively. As I've said earlier on
we don't even know whether this is Sora
2 Pro or Sora 2. And even V3 has preview
quality versions, fast versions, and the
main V3. Also, I have seen credible
leaks that apparently VO 3.1 is going to
be released in the coming week or so.
I'm also going to make a point that I
think is fairly significant which is
models. I'm not just talking about Sora
and Vio but even LLM like Gemini and
ChatBT they are unbelievably
fundamentally dependent on the data sets
on which they're trained. So just
because for one particular prompt say of
a gymnast one model is clearly better
than the other doesn't mean it's better
all around. It might just have more
training data on that domain. Take this
game generation of Cyberpunk from Sor 2.
Now, I've never played that game, but
clearly according to reports, they must
have taken plenty of video tutorials
from that game and fed it into the
training data. Sora 2 can also generate
anime, much better than V3, apparently.
But again, think training data.
>> You, better, keep, that, wheel, steady, cuz
everyone's gunning for us.
>> Found, output, tokens, cost, more, than, input
tokens.
Yeah, apparently my words aren't worth
as much as the models.
>> Input, tokens, are, the, cheap, seats.
>> Then, there's, questions, of, copyright.
>> Transformer., That's, the, power, of, my
stand. Sydney Bing
>> and, ready, begin.
But that is going to have to be for
another video. I will note that certain
claims I've seen online about Sora 2
mastering physics are really overstated.
Take this video in particular. This was
touted by one of the leads on Sora 2 as
an exemplar of Sora 2 understanding
physics. I'm not sure about you, but the
physics in this one seems more video
gamey than real. Incredible realism, but
more like that from a video game. Look
how he bounces off the hoop. Now, what
about that almost social media app that
OpenAI launching called Sora? And Sam
Orman last night said that he could find
it easy to imagine the degenerate case
of AI video generation that ends up with
us all being sucked into an
reinforcement learning optimized slot
feed. Well, clearly OpenAI wanted to
distinguish their app from Vibes by
Meta, which was widely panned. Let me
know what you think. But for many, there
will be nothing less viby in the current
climate than a couple of billionaires
like Zuckerberg and Wang announcing the
launch of a new form of social media
full of quote AI slop. But putting vibes
to one side for a moment, I do think
it's a little bit more nuanced than
that. And to OpenAI's credit, they are
starting with some decent
differentiations. There will be no
infinite scroll for under 18s. Users
will be nudged to create rather than
consume. There will be watermarks, both
visible and invisible on all videos, as
well as strict opt-ins for your likeness
being used. Inputs will be classified
and then potentially blocked. And
outputs will go through a reasoning
model to see whether they should be
blocked. Like I said, you can't just
input an image and output a video or go
from video to video. So that's blocked.
And these categories are also blocked
from display. So, if you were hoping for
some wrongdoing, you'll have to look
elsewhere. Which brings me to the Cameo
feature, which is unique at the moment
at least, to, OpenAI's, Sora, app., For, this
feature, you can't just upload a video
of yourself, otherwise you get a bunch
of deep fakes, but you have to record
yourself saying things that OpenAI get
you to say. That kind of proves that you
are who you are, and then you can insert
your likeness into any new video or
existing video. This is at the moment a
unique feature available for Sora 2.
That's why you've been seeing all that
Samman content. And the intention is
that no one can take your likeness and
make a video of you without your
permission. And even if one of your
invited friends does that, you can then
delete the ones you don't like. Given
how low the bar is at the moment for
deep fakes, I actually commend them for
setting some standards. But the real
master plan can be found in Sam Alman's
blog post from just 18 hours ago, and
that has plenty of juicy details. you
may have missed. They were clearly very
hesitant about launching a social media
app, and you could see the hesitation on
their faces when some of the leads for
Sora 2 were announcing it. First
apparently, there are going to be
periodic checks in the app on how Sora
is impacting users mood and well-being.
I presume a lot of people are going to
spam thumbs up just to avoid being
blocked out of the app. But then comes
the centerpiece promise, which is big if
true. They will have a rule such that
the majority of users looking back on
the past 6 months should feel that their
life is better for using Sora than it
would have been if they hadn't. If
that's, not, the, case,, they're, going to
make quote significant changes brackets.
And this is key. If we can't fix it, we
would discontinue offering the service
almost like a guaranteed given to
fortoall the criticism that they knew
would be inevitable about launching a
social media app. And by the way, you
can direct message other people. So it
is social media. Taken at face value
this means that Sora does have to be net
beneficial for humanity to continue.
However, let's just say that if you look
at the track record, not every promise
issued by OpenAI has been fully upheld.
Just to take one example, the CEO of
OpenAI when it was launched said that
setting up this Manhattan project for AI
called OpenAI, they would obviously
comply with and aggressively support all
regulation. They now employ a whole
bunch of lobbyists who are partly
responsible for blocking certain pieces
of regulation. My prediction is that
this promise will be quietly forgotten.
Now, I say all that, but I must confess
that with Sora 2 and this app, my
feelings are about as mixed as they
possibly could be. You will very likely
be able to find me sending memes of me
in certain activities to some of my
friends., I, think, there's, going to, be
huge entertainment value and even some
practical, utility., As, one, of the, leads
for Sora 2 said, Will Depw, one of the
biggest bottlenecks in science at the
moment is good simulators for RL. But
then we can imagine elders and
eventually ourselves falling for the
kind of slop and not being able to
believe anything. Sam Orman even admits
if you just truly want to doom scroll
and be angry, then okay, we'll help you
with that. But that's quite a big zoom
out for now. My take is that a social
media app is actually quite a clever way
to build a moat in an environment that
doesn't have many at the moment. It is
so easy to flip from Sora 2 and just use
VO3 or soon VO3.1 or maybe cling 2.5
which we just announced. When Cream
becomes a video generator, you could
just hop to that. How do you get people
to stay using your video generator? How
does OpenAI make a profit? Well, if
you're locked into a social media app
and all your friends are on it and you
want to use your own or their likeness
but not have others use your likeness
well then you have the Sora app. So, I
think it quite cleverly locks you into
their system. OpenAI did also claim in
the launch video that Sora 2 is a step
towards a generalist agent. And I get
that they have to say that because the
company mission is officially we're
literally building AGI. So everything
has to get wrapped up into that vision.
But Sora 2 seems more like a side quest
that might add XP but isn't directly on
course. Much more on course for me would
be something like periodic labs. I
mentioned in my last video how
exploration and experimentation is one
of the last big blockers toward a
singularity. if you will. Even if you
solve hallucinations and the data
problem and the modeling problem, models
are still passive. They're not exploring
the world. Well, periodic labs want them
to automate science, run experiments
autonomously. I interviewed one of the
founders who came from Google DeepMind a
little while ago for a Patreon video.
And another one of their founders
William Fetis, came from OpenAI. I
believe he was behind chatbt in part.
This story, I realize, is almost the
polar opposite of Sora 2 because it's
immensely physical and in the real
world. It's also not available
immediately. But the idea roughly
speaking is this. If we want to for
example come up with a room temperature
superconductor or better solar
batteries, then there are a few
bottlenecks. First is running enough
experiments. Well, what if we could have
deep learning systems predict what an
experiment will yield and then have say
humanoid robots conduct those
experiments autonomously? That might
remove one bottleneck. Then what about
those terabytes and terabytes of data
generated by existing experiments that
LLMs can't use? What if a lab collected
all of that data in an LLM friendly
format which could then be fed into the
latest model? Finally, I think we all
know that there are just thousands and
thousands of papers out there that we're
never going to get around to read. So
what about an AI model optimized for
literature review? It could find from
the literature what are the most
promising experiments to run. Anyway
the big reveal is that Periodic Labs
with $300 million in funding is going to
work on all of those. Why even bring
this up? Well, partly to contrast with
Sora 2 and claims of being a generalist
agent, but also I guess for those people
who think all of AI is bad and it's
nothing but slob. In fairness, those
results aren't going to come overnight.
So, in the meantime, let's talk about
job opportunities that you could apply
for even today. The sponsors of today's
video are 80,000 hours and in particular
their job board, which you can access
through a link in the description. These
are jobs that are available around the
world, both remote and in person. And
you can see the list is updated daily.
The focus is on positive impact and as
you can see it spans from entry level to
senior roles. Again, if you're curious
check out the link in the description.
The obvious thing to say about Sora 2 is
the moment it's out that forever will be
the worst that AI ever is at video
generation. Likewise, Claude Sonicet
4.5, which is claimed to be the best
coding model in the world, although they
don't fully have the stats to back that
up, is I guess the worst that coding is
ever going to be via an LLM. By the way
just on that point about them not
backing up, I do get that they have
benchmarks showing that it's the best.
But then there'll be other benchmarks
showing that Codeex is the best. So
where's the definitive proof that on all
metrics it's the best coding model? But
that's for another discussion. I've been
testing Claude 4.5 Sonet for quite a few
days and to everyone's amazement, we
actually got an early result on simple
bench and yes, this is with thinking
enabled and it was 54%. Big step up from
Clawude Force on it and it does feel in
the ballpark of Claude 4.1 Opus when I'm
doing, coding, on, one, benchmark, at least
bench verified. It even beats Opus 4.1
and you might say, well that's already a
model from Anthropic, so what's the big
deal? It's like five times cheaper. you
try using Opus 4.1 in cursor and you
really do have to get the checkbook out.
For me, this just goes to show that a
few months after each new breakthrough
in AI, there is a breakthrough in price
wherein the earlier breakthrough is
suddenly then as cheap as the models
that came before that breakthrough. Or
to bring that back to video, there will
likely be a video generation model
released by some Chinese company which
is as good as Sora 2 with fewer filters
and way way cheaper in say 3 to 6
months. Before we end though, a quick
word on the future because it's almost a
given that in a few years you can
imagine a button on your TV remote that
you could press and just add your stored
face as a selected character in any show
that you're watching. That is coming.
It's just a matter of whether it's 2
years or 4 years away. Suddenly Netflix
will be all about you. But then here's
what I've been thinking about and
forgive me for the digression, but we
already have models that pass the
written touring test. As in, you can't
distinguish that you're talking to a
model, not a human. And in Sora 2 is
much closer to passing the visual one.
It's not there despite the hype posts
unless you're visually impaired
especially gullible, or just see a
couple of seconds at a glance. But I
think we have to admit we are getting
closer and closer to passing the visual
touring test, not being able to tell
that the video we're watching is real or
fake. But what happens after we pass the
visual touring test and then the audio
touring test and then the sematic
sensory system test so that we feel
artificial worlds in our nervous systems
and can literally touch them. You can
think of each of our senses like a
benchmark that we're getting closer to
crushing. What happens when we have
models that can create entire worlds
from scratch in real time that are
indistinguishable from reality according
to every sense we humans have? If we can
be fooled visually, why not with audio
or with touch or taste? When that
happens, we might look back to Sora 2 as
one step along that fascinating
exciting, and treacherous path. Let me
know what you think. Thank you so much
for watching and have a wonderful day.
>> What's, up, everyone?, Welcome, to, Sora, 2.
You finally made it. I'm so so excited
to see you here. I've been waiting all
week for this moment and it's real now
you're actually here. Yeah, those GPUs
behind me are literally on fire. It's
fine. We'll deal with that later. Right.
Knowledge is not a destination. It's a
companion for the road.
Loading video analysis...