OpenAI Sora 2 Team: How Generative Video Will Unlock Creativity and World Models
By Sequoia Capital
Summary
## Key takeaways - **Sora 2: A Leap Beyond Scale**: Sora 2's advancements go beyond mere scaling; it demonstrates emergent agent-like behavior and respects physics, failing in unique semantic ways, unlike prior video generation models. [06:20] - **Space-Time Tokens: The Building Blocks**: Sora utilizes space-time tokens, akin to 'voxels' in 3D space, to represent video data. This allows the model to consider full global context across space and time, enabling properties like object permanence. [04:34] - **World Simulators Drive AI Evolution**: Similar to how language models developed internal world models through scale, video models like Sora are building sophisticated world simulators. This capability is crucial for diverse applications, from creative content to scientific discovery. [08:48] - **Intentional Design Against Mindless Scrolling**: Unlike platforms optimized for consumption, Sora prioritizes creative inspiration. Its ranking algorithm aims to encourage user creation and prevent mindless scrolling by introducing features like 'Cameos' and remixing. [25:08], [30:39] - **Co-evolving Society with AI**: OpenAI aims for iterative deployment of technology like Sora, not just dropping breakthroughs. The goal is to co-evolve society with AI, allowing people to become comfortable with new capabilities and establish rules of engagement. [00:03], [49:28]
Topics Covered
- Scaling Sora reveals emergent physics and agent intelligence.
- Can Sora unlock new scientific discoveries?
- Sora optimizes creation, not mindless consumption.
- How Sora's cameos democratize creative storytelling.
- Are digital clones the future of human-AI co-evolution?
Full Transcript
for OpenAI across the board, it's really
important that we kind of like
iteratively deploy technology in a way
where we're not just like dropping
bombshells on the world when there's
like some big research breakthrough. We
want to co-evolve society with the
technology. And so that's why we really
thought it was important to like do this
now and like do in a way where, you
know, we've hit this again this kind of
like GPT 3.5 moment for video. Let's
make sure the world is kind of aware of
what's possible now and also, you know,
start to get society comfortable in like
figuring out the rules of the road for
this kind of like longer term vision for
where there are just copies of yourself
running around in Sora in the ether like
doing tasks and like reporting back in
the physical world because that is where
we are headed long term.
[Music]
Today on Trading Data, we sit down with
the team behind OpenAI's Sora, Bill
Peebles, Thomas Dimpson, and Rohan
Sahai. You'll hear about space-time
tokens, building internal world
simulators, and how optimizing for
creation instead of consumption is just
better for social platforms. This
conversation goes way beyond video
generation and into questions about how
society will co-evolve with powerful
simulation technologies. We promise that
this was an actual real world
conversation and not a video generation,
but we don't know how to prove that to
you. Let's jump in.
>> Hey guys, thank you for being here at
Sequoa. Congratulations on Sora. Thank
you.
>> Um maybe you could tell us a little bit
about yourselves and how you got to open
AAI and Sora.
>> Yeah. Uh I'm Bill. I'm the head of the
Sora team at OpenAI. Uh I had a pretty
traditional path. Came through undergrad
doing research on video generation then
continued that work at Berkeley. Uh and
then started at OpenAI working on Sora
uh from you know the first day I joined.
>> And I'm Thomas. I work as an engineering
lead inside of Sora. um have a bit of a
longer story but I uh was worked at
Instagram for about uh seven years doing
some of the early kind of machine
learning systems and recommener systems
there but it was very tiny company was
about 40 people then I quit did my own
startup for a while which was Minecraft
and the browser um which we've talked
about a couple times and uh I think that
openai noticed that we had a very
cracked product team there and so uh
they acquired our company and I've been
bouncing around different products
inside of OpenAI and on the research
side as well on I was training. Um, but
super happy we landed kind of together
on Sora to bring this thing to life.
>> It was a really cool product in between
two, like the global illumination
product.
>> Oh yeah, I still believe in it.
>> Yeah, me too.
>> Awesome. I'm Rohan. I've been at OpenAI
for about two and a half years. Started
as an IC on Chat GPT. Um, but then as
soon as I saw the video gen research, I
got quickly sora pill and made my way
over there. Uh, and so currently lead
the Sora product team. before that just
startups, big companies within kind of
the valley, bunch of random stuff.
>> Yeah. Cool. Well, Bill, you are the
inventor of the diffusion transformer.
Can you tell us what that is?
>> Yeah. Um, so most people are pretty
familiar with auto reggressive
transformers, which is the core tech
that powers a lot of language models
that are out there. So there you
generate tokens one at a time and you
condition on all the previous ones to
generate the future. Diffusion
transformers are a little bit different.
So instead of using auto reggressive
modeling as kind of the core objective,
you're using this technique called
diffusion, which at a very high level
basically involves taking some signal,
for example, video, adding a ton of
noise to it, and then training neural
networks to predict the noise that you
applied.
>> And this is kind of a different kind of
iterative generative modeling. So
instead of generating token by token as
you do in auto reggressive models,
diffusion models generate by gradually
removing noise one step at a time. And
in Sora one uh we really kind of
popularized this technique for video
generation models. So if you look at all
of the other competitor models that are
out there both in the states and in
China most of them are based on dits
diffusion transformers. And a big part
of that is because dits are a really
powerful inductive bias for video.
>> So because you're generating the whole
video simultaneously you really solve
issues where quality can like degrade or
change over time which was kind of like
a big problem for prior video generation
systems which Dits ended up fixing. So
that's kind of why you're seeing them
proliferate within video generation
stacks
>> when I try to visualize it. I mean for
each diffusion you have a matrix of
pixels and then you do the entire video
at the same time which you can basically
see as different frames I imagine.
>> Can you visualize that as you know a
matrix of matrices that basically
transforms over time?
>> Yeah, it's a good question. So we really
kind of consider things at the
granularity of like space-time tokens
which is sort of like an insane phrase.
Um but you know whereas you know for
example characters are very fundamental
building block for language for vision
it's really this notion of a space-time
patch right you can just imagine this
little cuboid that composes both X and Y
like spatial dimensions as well as a
temporal local and that really is kind
of like the minimal building block that
you can like build visual generative
models out of. And so diffusion
transformers sort of consider these uh
almost you can think of it like uh voxil
by voxil. Um and you know in the
traditional versions of these these
diffusion transformer models um you have
all of these little space-time patches
talking with all the other ones and
that's how you actually are able to get
properties like object permanence to
fall out because uh basically you have
full global context of everything going
on in the video at every position in
spaceime which is like a very powerful
uh property for a neural network to
have. Mhm.
>> Yeah.
>> And is that the equivalent of the
attention mechanism is the object's
movement throughout the video?
>> Yeah, that's right. So, in our like Sor1
blog post on video generation models as
world simulators, we kind of laid out
some visuals uh which sort of go into
exactly your point here which is really
attention is like a very powerful
mechanism, right, for sharing
communication like sharing information
across spaceime. And if you represent
data in this way, right, where you
patchify it into a bunch of these
space-time tokens, um, as long as
you're, you know, properly using the
attention mechanism, that allows you to
transfer information throughout the
entire video all at once.
>> What are the biggest differences between
Sora 1 and two? And I remember with the
original Sora 1, you were already seeing
kind of emergent properties where the
more you scale, uh, the more it's able
to do things like understand physics.
What is is Sora 2 purely a function of
scaling or what are the biggest
differences?
>> Yeah, that's a great question. Um, you
know, we've spent a long time really
just doing like core generative modeling
research uh since the Sora 1 launch to
really figure how figure out how we get
the next step function improvement and
video generation capabilities. Um, we
really kind of operated from first
principles, right? So, we really want
these models to be extremely good at
physics. We want them to kind of feel
intelligent in a way that I'd say like
most prior video generation models
don't. So by that I really mean you know
if you look at kind of any of the
previous set of models that were out
there you'll notice a lot of this kind
of like effects that happen like if you
try to do any sort of complicated
sequence of like physical interactions
right for example like spiking
gymnastics classic uh
>> riding a dragon like you did
>> riding a dragon that was fun that that
happened for real actually Constantine
um not
>> about that
>> um uh you know there are like very clear
problems with the past generation of
models that that we really like set out
to solve with Sora And I think one thing
that's really cool about this model
compared to prior ones is that um when
the model makes a mistake, it actually
fails in a very unique way that we
haven't seen before. And so concretely
uh for example, if uh let's say like the
text input to Sora is a basketball star
wants to like shoot a hoop, right? Shoot
a free throw. If he misses in the model,
Sora will not just like magically guide
the basketball to go into the hoop,
right? To be overoptimistic about
respecting what the user asked for. it
will actually defer to the laws of
physics most of the time. And the basket
will actually like rebound off the
backboard. And so this is a very
interesting distinction, right, between
like model failure and like agent
failure. Agent as in the agent that Sora
is like implicitly simulating as it's
generating video. Um, and we haven't
really seen this very unique kind of
like semantic failure case in like prior
video models. This is really new with
Sora 2. Um, it's kind of a result of,
you know, just the investment we put in
like really uh doing like the core
generative modeling research to like get
this massive improvement in capability.
Hm. Okay. So, not purely a function of
scale. You're actually, you know,
there's there's some concept of agents
implicit in this. There's, you know,
there's things you're doing beyond just
scaling up the model.
>> Well, the the notion of agents, I'd say,
is actually mostly like implicit from
from scale. Um, like, you know,
>> in the same way where uh we kind of
showed that object permanence, right,
begins to emerge in store one
pre-training once you hit some like
critical uh flops threshold. Uh we see
similar kinds of things happen as we
like push the next frontier, right? So
you begin to see these agents act more
intelligently. You begin to see the laws
of physics be respected in a way that
they aren't at like lower compute
scales.
>> How does the concept of a space-time
latent patch relate to a space-time
token relate to object permanence and
how things move through the physical
world?
>> Yeah, that's a great question. So I'd
say space-time patch and spacetime token
are more or less synonymous with one
another. Um I'll use them
interchangeably. You know what's really
beautiful, right, is when people started
scaling up language models um from like
GPT1 to GPT2 to GPT3, we really began to
see the emergence of like world models
internally in these systems. And what's
kind of beautiful about this, right, is
there's incredibly simple tokenizers
that actually go into like creating the
data that we train these systems on. But
despite this very simple representation
right you know um like BP characters
what have you uh when you put enough
compute and data into these systems like
in order to actually solve this task of
predicting the next token you need to
develop an internal representation of
how the world functions right you need
to like simulate things uh and like you
know the models will make lots of
mistakes right now at like low compute
scales but as you continue pushing you
know from 3 to four to five you just see
these internal world models get more and
more robust and it's really analogous
for video right and in many ways more
explicit. So I think it's easier to
picture what like a world model or a
world simulator looks like with video
data, right? Because it is literally
representing like the raw observational
bits of like all of reality. But what's
really remarkable, right, is because
these spacetime patches are just this
like very simple and like highly
reusable representation that can apply
to like any type of data, right? Whether
it's just like video footage of like
this set, whether it's like anime,
cartoons, like whatever it is. um you're
just able to build like one neural
network that can operate on this vast
extremely diverse set of data and really
build these like incredibly powerful
representations that model like very
generalizable properties of the world
right it's useful to have a world
simulator to predict like how a cartoon
will unfold and likewise it's useful for
predicting how you know this
conversation might unfold and so that
really puts a lot of optimization
pressure on Sora to like grock these
like core fundamental concepts in a very
like data efficient way
>> did you have to put effort into
selecting the data
such that it reflected the physical
world. For example, I'd imagine if you
have data from the physical world, it
all abides the laws of physics, but you
mentioned anime that might not always
abide in the laws of physics. Did you
have to be selective or did it naturally
find patterns that separated that out?
>> That's a really great question. Um, we
did spend a lot of time, you know,
really thinking about,
you know, what does like the optimal
data mix for like a world simulator kind
of look like? And to your point, you
know, I think
>> in some cases we'll make decisions that,
you know, maybe are for like making the
model really fun. Like for example,
people love generating anime, but you
know, do not necessarily like perfectly
represent uh like the laws of physics
that are like directly useful for like
real world applications. So like to put
it another way, right? I think in anime
there are certain primitives that are
simplified that are actually probably
useful for understanding the real world.
You know, people still locomote through
scenes, for example. But like if there's
like some crazy dragon that's like
flying around, that's probably like not
so useful for like rocking aerodynamics
or something.
>> Dragon Ball Z is more or less how I
learned athletics,
>> you know? There you go.
>> The motion and Super Saiyan.
>> I think there it is an interesting
question like that. I do not know the
answer to whether somehow like
pre-training on simplified
representations of like the visual world
whether that's like sketches or like
some other modality like you know makes
you more efficient at like rocking these
concepts. I think it's actually a very
interesting scientific question that we
we need to understand better.
>> Do you think we're close to exhausting
the number of pre-training tokens there
are out there or do you think video data
is just video is just so massive and
it's actually one of the more untapped
uh vats of data? Yeah, I the way I kind
of think about this is the intelligence
per bit of video is much lower than
something like text data, but if you
integrate over all of the data that
really exists out there, uh the total is
much higher. So to directly answer your
question, you know, I think it's hard to
imagine ever fully running out of video
data. There's just like so many ways
that it exists in the world um that
like, you know, you will be in a regime
where you can continue just like add
more and more data to these pre-training
runs and continue to see gains for like
a very long time, I suspect. Yeah. you
think we'll ever discover new physics?
There's the LLM world of, you know,
Einstein thinking the whiteboard. It's
equivalent to these LLM thinking.
>> There's also just the if you develop a
perfect simulator and you just simulate
physics better and better, you you might
learn things about the world that we
haven't learned yet.
>> I totally think that this is like bound
to happen one day. And like, you know, I
think we probably need
even like we probably need one more step
function change I'd say in like model
quality to like really get to a point
where for example, you can think about
doing like scientific experiments in the
models. But like you could imagine,
right, one day you have a world
simulator that is like generalized so
well to the laws of physics that like
you don't even need like a wet lab in
the real world anymore, right? You can
just like run biological experiments
within Sora itself. And like again, this
needs like a lot of work to like really
get to a point where like you have a
system that's robust enough to do this
reliably. Um but you know internally
like again we view Sora 1 as kind of
being like the GPT1 moment for video. It
was like really the first time things
started working for that modality. Sora
2 we really view as like GBT 3.5 uh in
terms of like it really being able to
like kickstart um you know the world's
creative juices and like really like
break through this kind of usability
barrier where we're seeing like mass
adoption of these models and we're going
to need a GPT4 breakthrough to really
get this to the point where this is
useful for like the sciences as we're
seeing now with GPT5 right like I feel
like every day on Twitter I see another
like convex optimization lower bound get
like improved by GBD5 Pro
>> um and I think eventually we're going to
see the same thing happening for the
sciences with Sora Do you think you need
physical world embodiment to get there
or do you think a lot of it can be done
in effectively in sim?
>> You know, I am like always amazed, you
know, every time we like push another
like 10x compute into these models like
what just like magically falls out of
it. Um, with like very limited changes
in kind of like what we're training on
and like the fundamental like approach
to to what we're doing. Um, you know, I
suspect some amount of like physical
agency will certainly help. I I have a
hard time believing it will like make
you worse at like you know modeling like
collisions or like something else. Um
video only is like quite remarkable
though and I wouldn't be surprised if
it's actually kind of like AGI complete
for like building like a general purpose
world simulator. So for this concept of
a general purpose world simulator, a
world model, yeah, where you can do
science experiments in that world, do
you think that video is the sole or some
combination of video and text are the
combined data inputs and you train it on
this type of uh this type of model or is
it going to be does it have to be based
on more structured laws of physics that
are understood and laws of biology that
are understood? I think it probably
depends a lot on um the specific use
case you're kind of envisioning for for
the world simulator. Like for example,
if you just really want to build like an
accurate model of how like a basketball
game is played, I actually think like
only video data and like maybe audio as
well or like kind of sufficient to to
build that system.
>> Not of me playing basketball. That would
be an inaccurate very bad player of
basketball
>> you know. Uh you actually like Sora's
current understanding of how people play
basketball. Constantine may be at your
level.
>> Wow. Okay, that makes you know that
makes sense.
>> It's possible. It's possible.
>> Uh
>> I think he just dissed you.
>> I I like it's accurate.
>> But it's better than mine, Constantine,
for what? That was like a Sora one
situation.
>> You're at Sora, too.
>> We'll toss some hoops. Is that what
they'll say? Like,
>> you know, I'm down. I'm down. Yeah.
Shoot some hoops.
>> Thanks.
>> Toss some hoops.
>> Thomas's first first statement in the
podcast.
>> I'm also at your level.
Um, you know, I think it is an
interesting question like what are all
of the modalities that should be present
in like this kind of general purpose sim
system. Like certainly, you know, if you
add more modalities, I have a hard time
believing it will like decrease the
intelligence. I also think there's an
argument to be made that
>> um just, you know, adding more and more
does not provide like significant
marginal value compared to like, you
know, full mastery of like video and
audio for example. I think it's an
interesting open question. I'm not I'm
not actually sure right now. Um, and
it's something we need to understand
more.
>> Yeah. So cool. Sonia a minute ago
mentioned Einstein on a whiteboard and
obviously that makes me think of you
Thomas and your hair.
>> Me too. Knew it was coming.
>> It had it had to come. Like if if any
hair gives the feeling of space-time
tokens. It's definitely
definitely yours at some point. You
know, Bill, you're the creator of this
revolutionary technology that has
changed the way that AI video is
created. At some point you from Soro 1
to Soro 2 said hey alto together you
said there needs to be an application
around this. There's some benefit to an
application. You brought together some
of the best product people in the world.
How did that crew come together at
OpenAI?
>> Yeah it's a I mean the story is never as
linear as you might think it is. Um so I
think that I mean we've had a product
team on Sora since the get-go. Rohan was
like uh spearheading that effort in the
Sora one days but I think Bill's right
when he says it was really like a GPT1
kind of moment. we're seeing pockets of
very interesting things there. Um, but
the models were not like, you know,
models without sound, videos without
sound. It's like a very different kind
of environment. And so, um, we're
working on that surface mostly targeted
on on kind of like a proumer, uh,
demographic. And separately, I mean, I
can probably go into more details of all
that. Um, separately, we're also just
kind of exploring different social
applications of AI uh, inside of OpenAI
and like what that could look like. We
had a lot of prototypes, most of which
were quite bad. Um, and when we started
to see some of the magic was actually
with image genen, um, before it had been
released, uh, we were playing with it
internally in a social context. And the
social context was really interesting to
see that what people were doing is you'd
sort of like take an image and then
you'd have like a chain of remixes of
that image where like I don't know,
there was a
>> it's a duck and then now the duck's on
somebody's head and now everything's
upside down and they're smoking a
cigarette like uh just a lot of weird
things. Yeah. It's like and um we were
seeing this, we're like, "Oh, this is
kind of like a very interesting thing
that like you know, nobody can really do
that with like social media because it's
so hard to create something or riff on
something. It like is such a high
barrier to entry action. Um maybe you
have to get a go get a camera set up and
not it's not just like thinking of the
idea. There's actually a lot of things
involved." And so
>> we were like, "Okay, this is a very
magical behavior. How can we kind of
productize that behavior?" Um and we're
mostly thinking of it away from Sora.
Um, some of the Sora research was still
ongoing and I mean there were signs of
life but it wasn't like quite there yet
in productized form. Bill probably had
it in his head somewhere. He's like I
can see the future but that's fine. Uh
I'm a little bit more can't can't quite
see the future yet. So um uh so we were
just exploring that. I think we we tried
a few things and then at some point the
the research was really uh just showing
very clear value of even iterative
deployment style value of like oh this
is something that people will really
want. And so we went into this project
like two or three months ago. Was it
wasn't very long.
>> It was like July 4th. Wow.
>> July 4th. Yeah.
>> Um
>> that's when you disappeared, Thomas.
>> That's when it disappeared. Yeah.
>> Exactly.
>> So um and we just kind of locked in
like, okay, we're finally doing it. You
know, that's always a moment. Um and uh
we started with without any magical
features, just like okay, let's just try
to get a native video environment where
you can hear the audio full screen. Um
and we did some quick generations.
Things were showing very they're very
cool, very fun, very interesting. Um and
because of that image genen experience,
we sort of had thought like okay, what's
the the magical here magical thing here
is that like barrier to entry is very
very low for creation. Um coming from
Instagram, that's like it's impossible
to get people to create on Instagram and
that's the most valuable thing that
people do. Um so what does that unlock?
And it's like, okay, well, that remix
thing from image genen that kind of
could still apply here. And so we
brainstormed all these things about how
could remixes work and like what does a
remix mean here? Um, one of those was
this like cameo thing, which I think
also Bill had in his head, but this, you
know wasn't
>> is in the ether. It
>> was in the ether for sure. Um, but we
just were like hacking together things
on the product. It was see if this
works. Um, I I didn't think it would
work at all. Um, but it was on the list
and there were a few other things on the
list. Some of them were pretty crazy. It
was like a
>> Why didn't you think it would work?
>> I am bad at predicting technology
that like uh it wasn't super clear to me
that you could like, you know, take a
likeness of a of a person and have that
kind of imagined into a video form um
and whether it would work or not. And so
we had early prototypes of different
things of like people reacting in the in
the video corner or uh stuff like that.
But when when we saw cameos just start
to work and even playing internally
we're like Ron do you remember that day
where we're like
>> the feed is entirely cameos
>> it's entirely it just went from
>> you know we didn't have that feature
once we had that feature product market
fit on the team all everything we were
generating was all of each other
>> inside must have seen the meme potential
>> I mean yeah that's I think at first
>> we were just like this is hilarious this
is amazing and then like a week later we
were like this is still all we
Um, there's something here.
>> Yeah. I mean, at first we were actually
a little bit like, is this good? Like,
hey, the cameos, it's just all cameos
now.
>> Does anyone else care about this?
>> People care about other people doing
stuff. And um, we kind of got to the
point where we're like, no, no, this is
actually good. Like, it's actually, it
feels like I'm coming back to see. And
it really humanized it uh a lot where
like a lot of AI video is just um kind
of static scenes that are quite
beautiful, quite interesting, might have
extremely complicated uh things going
on, but they lose that human touch and
uh it really felt like it was coming
back into it. So
>> another learning from Image Gen 2, like
Image Genen took off and had viral
moments because I think you could put
yourselves in these scenes in accessible
ways that weren't possible before.
obviously this massive like put me in a
gibli scene um people taking selfies
with their idols and stuff like that and
so the ca once you know once you
actually kind of thought about it it's
like yeah cameo feature makes a lot of
sense you put yourself in all these
scenes that's way more exciting you and
your friends it's novel it's like not
something you could do before
>> yeah and then that combined with remixes
I mean cameo is kind of remix to begin
with but then you start to think about
okay well now I can riff on Roan doing
something or whatever it is but Bill had
you wrapped in an action figure package
and it was
>> it's been remixed like an insane number
of times.
>> Thousand of times. Yeah. So like uh just
very very crazy things that kind of go
on and very emergent. A lot of stuff
that I would have never thought of
>> actually. How many generations of you
guys have been like publicly posted at
this point?
>> I have no idea.
>> Uh I know I'm 11,000 or so.
>> I was like a little less than that. I
>> Wow.
>> Yeah. It's crazy.
>> What has surprised you about the types
of users that are really sticking with
Sora? Who is it really a hit with? If
you just go to the latest feed, which is
just like the fire hose,
>> astronaut mode
>> of everything. Yeah, it's it's
space-time Thomas mode. Um, it's wild
out there. But
>> that gives you a pretty good snapshot
into like just everything happening. I
mean, I think we have
>> like uh almost 7 million generations
happening a day. So, you can imagine
there's just a ton of information there.
Uh, it's one of my favorite ways to just
get product feedback. It is so diverse
the type of stuff people are doing, the
type of people. There'll be like a
complete variety of age. Some people
just doing envisioning themselves in
scenes that seem like motivation
oriented. People just memeing with their
friends, people cameoing some of like
the public figures on the platform that
have have done cameos. So, I think the
the diversity has has surprised me. I
was kind of expecting this sort of like,
you know, the Twitter AI crowd to like
heavily dominate the feed. They
definitely dominate like the press
cycles, uh, at least the ones that, you
know, we're most exposed to. But in
terms of people actually using this,
it's it's quite a wide variety. And um
last thing I'll say is a bigger
departure from like the sort of niche AI
film crowd that existed before, which is
great early adopters, but now you kind
of get these I I thought it would start
there, but it felt like it started with
just a way wider range of people. Um I
think getting to the top of the app
store helps with that. You just get
people who are like browsing and see
this thing.
>> My mother keeps cameoing Thomas.
So weird.
>> There are real stories like that.
>> You said 11,000.
>> She's done 10,000 of them.
>> Uh Thomas, you wrote the original
algorithm, if I'm right, for for the
Instagram uh ranking ranking algo. Uh
there was a lot in the Sora 2 blog post
about how you guys are clearly being
very intentional about how you want to
do ranking in the algo. Can you can you
talk a little bit about lessons learned
from Instagram and how you're
approaching it over at Sora?
>> Yeah. Uh I mean there's there's a lot to
cover in that. Um I think that the first
thing to think about when we think about
these platforms or think about sor
specifically is it is the thing thing I
was mentioning before about creation.
Um, so, uh, soar enables basically
everybody to be a creator on this
platform and that is a very very
different environment than something
like Instagram where you have this like
extreme power law of the people that are
creating. Um, and the power power law
just naturally gets more uh
>> narrow. What's the right word there? But
more uh head heavy. Yes. Um, so
sometimes I feel like I have to defend
myself on the Instagram uh algorithm
side. We actually did it for I mean we
did it for a reason. It was to solve a
problem. it wasn't just kind of like a
random decision to, you know, optimize
for ads or something like that. Um, and
the the reason we did that was that we
noticed that like what was happening on
Instagram over time was um because it
was chronologically ordered, every
single person that posted was guaranteed
to have the top slot of all their
followers. And so if you think about
that for a second, the incentive for
somebody in that environment is actually
to create constantly because they are
guaranteed distribution when they
create. Um and over time because of this
like power law becoming heavier and
heavier um or like more headheavy um
those type of people which are great
they provide a lot of value to the
ecosystem u but they start to crowd out
people you really care about and so um
maybe you follow National Geographic or
something not to dunk on National
Geographic I love them but um you know
if they're posting 20 times a day your
friend's not they don't have the same uh
like optimization objective they're
probably just a picture of their coffee
or
Um, and so you'd have 20 Nat Geo posts
and then one picture that you actually
really cared about that you never really
scrolled to. Um, and there's not too
many solutions to that problem if you
have a guaranteed ordering. Um, one of
them is that you have to unfollow all
these accounts that you maybe care about
but care about not as much as the person
that posts once a day. Um, and the other
is that you have to permute the uh
permute the feed. And so we went with
that path. We tried it. We tested it out
internally. it was very kind of
controversial to do. Um, and but I I
think that you can actually kind of like
math this out. It's like a proof that
basically over time you're going to have
to take control over distribution on the
platform uh in order to prevent these
kind of issues and show people what they
actually care about. So that's why we
did it. Um, and it it actually showed a
lot of value. I remember the early
tests. I won't get into the numbers on
them, but they were um pretty
unambiguous actually about this was
showing more people that you cared
about. It was improving your experience
with the platform. It actually moved
creation, which is unusual. it made
people create more because they were
seeing more content that was like
accessible to them. Um, but I also think
that these things can go astray over
time and I won't say like the Instagram
algorithm is unequivocally bad or
unequivocally good. Um, but when we
started to open up to more unconnected
content and uh ad pressure was very
strong. There's a also a natural company
incentive to optimize for just blind
consumption because that's how you make
money. Um, so, uh, maybe cheaper content
or maybe just like get people to scroll
more and more and more and more. Uh, and
that also can encourage people to create
less, uh, because it's just like a more
mindless scrolling mode. Um,
>> but you guys have very concretely
committed to doing things to prevent
that kind of behavior.
>> Yeah. Um, we have we have a lot of
mitigations there in place, but I I
think uh, what it really comes down to
me is just like what are we trying to do
as a platform? And I think the magic of
this technology is that everybody is a
creator. And so we want this feed to be
optimized for you to create to inspire
you to create. And that can be like
sometimes when you think of inspiration,
you think of like, oh, it's this
beautiful crazy scene that's so elegant.
Uh when I think about that, I I think of
like a meme culture or something really
funny or like oh that's cool. I've got a
riff on that. And I think that's a very
different brain mode when you're
browsing the feed. Um and of course we
have lots of other things in place. So
like I think it starts with an
incentives. That's our incentive right
here is to encourage more creation in
the ecosystem. Um, but there are
certainly use cases we want to prevent.
We're not going to get them right all
the time. Um, it's very challenging.
It's a very living system. It's also
very hard to write a recommener system
when you have no data and you don't know
what to recommend or you don't know how
the platform's going to evolve. Um, but
that's like basically how I kind of
think about the incentives of feed. And
then Roan, we have a lot of mitigations
in place that I think you've been kind
of like thinking about and maybe even
more deeply than I have. uh about like
preventing maybe the the extreme cases
and so um I don't know if you you want
to talk a little bit.
>> Yeah, happy to. But one thing before you
I mean just one thing to add is that the
stated intent of like optimizing for
creation is working really well. Y it's
almost 100% of people who like get past
the invite code and all that on the app
end up creating on day one. Um when they
come back it's like 70% of the time they
come back they're creating and 30% of
people are actually even posting to the
feed. So not just like generating for
themselves, they're actually like
posting into the ecosystem, which is
incredible testament to the model, how
fun it is, and to like how what we're
optimizing for is actually working
pretty pretty well right now. Um, but
yeah, beyond that, I mean like one of
the top of mind things is I think we
don't want this just to be like a a
mindless scroll and beyond just
optimizing for creation in the ranking
algorithm. There are things we can do
like trying to just get you out of this
sort of flow state um of just like
consumption and push you into like
creative mode. I think there's a great
article on this called like the curve
linear nature of casinos where they
design it so you never have to make any
decisions. It's just like you walk in a
circle, there's no windows, all that
kind of stuff.
um we can be very intentional about not
doing that and like you know whether
it's an infeed unit that's like hey you
just kind of viewed a couple videos in
this domain why don't you try creating
something um or other ways to just kind
of like push you out of that we we
actually have things like that in the
product um yeah those are some of the
things that come to mind yeah
>> I really commend you guys for what
you've done to you know make sure that
there there's a version of the world
where video model as world simulator
could have just ended up with us you
know each retreating into our own
computer screens and just becoming
addicted and just retreating into
ourselves. And I think the amount to
which you're, you know, prioritizing the
the human element and the social
element, I think that the care you've
put into that really shows.
>> I don't think we would have launched
like a feed of just like AI content that
wasn't that didn't have a human feel
like just I don't think that excited us.
And as soon as we we like had the
product, we had Cameo, and we had that
feeling internally, um, we were like,
"Okay, this is actually a little
different than
>> Yeah, I don't know.
>> I don't think it was totally obvious."
Again, it was like a pretty crazy sprint
to go through this. Uh,
>> and it wasn't like super obvious to us
what what would emerge, but
>> I think that the idea it makes sense in
retrospect, but it was a completely not
obvious product decision. The cameos
would be the thing. Yeah.
>> Um, where it's like, of course, you just
want to see your friends doing cool
things, so it's like, that makes sense.
Um, but I I was never actually that
afraid of competitive pressure in that
that crazy product phase because I was
like we sort of had these all these
non-trivial decisions that are obvious
in retrospect but were not obvious at
the time that were sort of building on
top of each other. It's like okay
cameos. Well, there's also a version of
Cameo where you have a crazy flow that's
just for you and it's a oneplayer mode
Cameo and you like go through this
onboarding flow and do your stuff. But
we were already seeing these interesting
dynamics where it's like, well, I could
tag Rohan into my video. That's crazy.
like and then we can have like an
argument or like I'm gonna have a anime
fight doesn't matter. Um and I was like
okay so that's that's actually the human
element that's the that's the magic of
this is actually strangely more social
than a lot of social networks even
though it's all AI generated content. So
very unintuitive.
>> Totally.
>> Is it a separate is it fine-tuned
version of Sora 2 or is it like is a
separate model from what's available
over the API or is it the same
>> between the app and
>> product? So we currently exposing like
the models in the same state across API
and and the app.
>> Okay. Really interesting. Um what are
you seeing people do on the API side? Uh
and is it different from the types of
things people are doing on the on the
consumer app?
>> The motivation behind even launching an
API is just like support of these
longtail use cases. is like we have this
vision of enabling you know um chat GPT
scale level consumer audience with this
tech but there's tons of very niche
things out there you can imagine people
who are much you know with sor one we
went out and talked in a lot of these
studios what we heard from them is like
they want to integrate this in this
specific part of their stack in this
specific way and we'd love to support
all these longtail use cases but we
don't want to build a thousand different
kind of interfaces for this stuff so
that's the kind of stuff we're excited
to see with the API so far it's been you
It's been kind those kind of like a
little bit more of a niche company not
trying to build like a first-party
social app but maybe um you know has
some either film making kind of audience
or kind of people they're supporting or
even just like we've defin we've seen
some like people trying to you know I
think there was like a some company
making um they were doing something with
CAD where they were like using Yeah.
Yeah. Yeah. That's cool.
>> Um so there's there's there's cool use
cases out there. I think we're still
getting a sense of what they are. Yeah,
I think there's a lot that can be done
with these things. I think about gaming
all the time just based on my background
and I'm
>> you know AI and gaming is always a very
controversial subject, but it's very
clear that there's a there's a place and
there's a role. Um maybe it doesn't
>> have to interrupt the creative process
can enhance it and um I'm pretty excited
to see some of those use cases emerge.
>> Do you think the video models are good
enough now to for people to be able to
build video games on top of the API or
do you think we're still another rev or
two away?
>> I have my own take on this. I was going
to say never bet against the ways people
can be creative with technology to build
like someone will be able to build a
game and maybe has built a game already.
Will it feel look and feel like a
>> you know obviously there's latency with
this model so you'd have to do all sorts
of crazy stuff to get around that but
>> like I think that your mind immediately
goes to like kind of the obvious sort of
things that you would do in gaming and
we've seen some of that sort of stuff uh
certainly in research blogs and that
that kind of thing. Uh my mind often
goes to like, okay, this is like a
creative tool that's a little bit
different. Um and the types of games
that really excite me there, I'll just
go off on one, which is like there's a
game called Infinite Craft, uh which is
the world's simplest game. It's a web
game where you just take elements. It's
like fire, water, earth. You have like
four elements to start and you just drag
them and it
>> love this game.
>> It it combines into something new. And
the thing it combines with is like a
it's LLM based. So, it's like fire and
and and earth might be a volcano. Um,
and then volcano plus water might be an
underwater volcano or Godzilla or
something like that. You always end up
in Godzilla for some reason. I don't
know why. Um, but uh that's a game that
like it's like, oh, it kind of makes
sense where it's like, yeah, you don't
really need a crafting tree. The LM can
derive this crafting tree and it's a
process of discovery. Um, and so I think
there's a lot of untapped stuff in that
space where again I like the idea of a
process of discovery. In fact, my phil
philosophical philosophical view on LLMs
and video models to some extent is that
it is a process of discovery. These are
all in the weights. You're just
unlocking it with like a secret code,
which is your prompt. Um, and I I love
that. That is that is very magical. That
was always in gaming. That was the thing
that like excited me the most was
discovering something new, especially if
it was a true discovery. it wasn't put
there by somebody else. Maybe they just
enabled the mechanics around it. So I
think there's a huge opportunity in that
space of uh of gaming when you think
about games in just a different thing
and like embrace this technology in a
very different way.
>> It reminds me of how uh some of the
earliest use cases for GPT3 were kind of
these text games.
>> Um so it's different from how you think
of a you know video like a playable
video game, but actually a lot of these
mechanics are very game-like.
>> Exactly. Yeah. I think there's a there's
still constraints and I think that's
going to be like the mechanism design
and that's still very human. Um like a
lot of the early games with GP3 that
kind of like yeah it was fun for a
minute and then it kind of went off the
rails and you're like I don't really
know what I'm doing anymore. Um but
again like this is sort of in some ways
so Sor feels like a little bit of that
where it's got a little bit of gaming um
DNA inside of it where it feels very fun
and different and exploratory. So I I
like things like that and uh I think
there's going to be more use cases that
we can't even think of. It's too
creative. What are you guys seeing on
the creative film making side? Like is
that an important target market? Do you
want to do you want to empower the long
tail or do you want to empower the the
head so to speak of the creative market?
>> I think it's a really good question. You
know, we've benefited a lot from um
creatives who are really willing to like
go all in on you know even like the
early technology like Dolly 1, Dolly 2
and really like help steer us uh along
the path. And like I think it's
important that we continue to you know
build things for for those folks and we
are working on some things that are like
more targeted towards like creative
power users long term. At the same time
you know I do think like AI is a very
democratizing tool right at its best.
And so what's kind of beautiful about
the sor platform in general right is
whenever someone kind of strikes cool
right you see one of these like
beautiful anime prompts that like goes
to like the very top of the feed for
everyone um like anybody can go and
remix that right? everyone has the power
to like build on top of that and like
learn from all of these people who come
in with this like incredible knowledge
about uh how to like really get the most
out of these tools. Um and so I am
really excited just to see the net
creativity of humanity just increase as
a result of this. Um but I think a big
part of that right is continuing to
empower people who are always at the
frontier which are these like more
pro-oriented like creator um type folks
and so we want to keep investing in them
as well. We've nerded out for a while,
like almost a couple years now, about
that vision of feature film. Yeah.
>> Length content.
>> Like, yes, you have these amazing cameos
and shorter content, but at some point
the individual creator, this has been
something that you've been excited about
for a very long time.
>> Yeah.
>> When do we get there? You know, is there
a point where we have a feature film
that is created on Sora 2?
>> Yeah.
>> And how do we consume it? Is it in the
Sora app? Is it posted somewhere else
online? Do you go to a movie theater and
watch it?
>> Yeah, it's a great question. I mean, I
think this will happen in stages to some
extent. So, like if you guys watch the
right, the launch video. I mean, that
was made by like Daniel Frighen who's on
the Sora team and he already with these
tools, right, is able to pump out these
like incredibly compelling short stories
within like days at most. I mean, he
literally made that like all by himself
in almost no time. And he's been like
continuing to like put new ones out
there on like the open eye Twitter
since. Um like clearly this is like
massively compressing the latency uh
that's associated with like film making.
Um I think to get to the point where
like really anybody can do this, right?
Like any kid in their home can just like
fire up the app or store.com or
something and go and make this. It's
really like an economics problem of like
the video models. Video is the most
intensive comput intensive modality to
work with. It's extremely expensive and
you know we're making good progress on
the research team like really continuing
to figure out ways to make this
affordable for everyone long term. Like
right now for example you know the store
app is like totally free. Um in the
future there will probably be ways where
people can pay money to get more access
to the models um just because that's the
only way we can really scale this
further. Um but uh you know I think we
are not far off from this world where
anybody can really like have the tools
to make amazing content. You know, I
think there's going to be like a lot of
bad movies that get created by this, but
like likewise, you know, there's
probably the next great film director
who is just kind of like sitting uh, you
know, in their parents house like still
in high school or something and just
like has not had the investment or the
tools to be able to like really see
their vision come to life. And we're
going to find like absolutely like
amazing things from like giving this
technology to like the whole world.
>> I'm looking forward to the feature film
length Constantine's Greek Odyssey.
>> Me too.
>> Coming to theaters near you.
>> We're all in it together actually.
Different characters. I play the cyclops
and um it's a it's a good one.
>> I think um just to touch on that one one
more thing the something I've learned
from recommener systems over and over
again is that like oftentimes so the
tools getting more people more creative
is going to be a huge unlock for just
you know making people more more
creative in general and because you
don't need this access to this like film
making equipment all that sort of stuff.
Um but we do consistently see that
things content is like also a social
phenomenon in a way and like uh movies
and all that all all everything you see
out there is kind of a bit of a social
phenomenon in addition to the actual
content itself. And so I think we're
going to enter a very interesting world
where you know there's so many people
creating and so much content out there
um that even the idea that people aren't
paying attention to and watching it is
going to become more and more important.
Um, and I think that's actually going to
make the quality of content just to kind
of elevate because there's this anybody
can create and actually it's going to be
the consumption that's going to be quite
limited which is very different than the
world we live in today.
>> You guys are very thoughtful and
intentional about how you treated IP
holders. Can you say a word on that? you
know, we've been in close partnership
with like a bunch of folks across the
industry and like really trying to like
both show them kind of this like new
technology, right, that is actually like
a huge value proposition um for rights
holders across the board, right? And
like we're hearing so much excitement
from the folks we're talking with. like
they really see this as being like, you
know, a new frontier for again like, you
know, every kid in the world having the
ability to like go and like use um like
some of this beloved IP and like really
like bring it into their lives in a way
that feels much more personal and custom
than what's been possible before.
>> Um at the same time, you know, we really
want to make sure that we're doing this
like in the right way. So we've been
like really trying to take feedback and
like really uh steer our road map in a
way where we know that you know both
users are going to have an awesome
experience getting to use this IP but
also the rights holders are going to get
you know properly monetized and rewarded
um in a way that you know everyone wins
basically. So, we're right now actively
working on trying to scope out the exact
details about like how we're going to,
you know, for example, make it so if you
want to cameo your favorite character
from some like beloved film or
something, um, you can do that in a way
where uh, you have access to it, but
like monetization will flow back to the
rights holder, right? So, really trying
to figure out this kind of like new
economy for creators. Uh, we kind of
just have to create this from scratch
right now. There's a lot of deep
questions about how to do this the right
way. And you know, as with everything
with this app, we come into it with like
an open mind and we hear feedback and we
iterate quickly. You know, we're not
sure where this is going to totally
converge. Uh but we're working closely
with people to figure it out.
>> Really cool. What's ahead?
>> Pets.
>> Yeah, I think I mean one when
>> Sorry, what
came
is that one of the most demanded
features?
>> Break for me. It is. Bill's demanding.
Uh no,
>> it will remind us we were just talking
about curing diseases and role models
and uh now we're to the future.
>> Uh this is something um no it's actually
so that's definitely true. We've
committed to that. It's coming. Um but
we we I promise the uh we actually had
Bill's dog as like when we were playing
around with this rocket.
>> The good boy.
>> Yeah. And actually was very very cool um
to actually feature a pet. You can
imagine where that goes. It doesn't have
to necessarily be a pet. Um, could be
anything. A clock or whatever you have.
>> Um,
>> clock.
>> Well, yeah. Yeah. Yeah. Um, do
>> you have a special clock?
>> Actually, it's really compelling.
>> I didn't think it could be so compelling
until Thomas showed me this clock. It's
like a sentient clock.
>> It's a Well,
>> but it's it's like based on like a a
real clock.
>> Yeah. I I had a clock. My father was a
my father was a technology person for a
while. This company, Veraritoss, gave
him a clock for his like whatever
anniversary. Anyway, so I have it on my
my uh my uh like table somewhere and uh
there's this old Simpsons episode where
they talk about a walking clock and for
some reason that's just been an earworm
in my head for the last 30 years. And so
I always it's like you know they're
telling some joke and it's like is it a
walking clock? Is it a walking clock?
It's like walking clock and then it's
like no man it's my dog. And so it
connected in my brain where I was like,
"Okay, rocket walking clock." And then
so I tried it
>> Thomas.
>> Uh yeah. So it connected my brain and
we've been playing around with this um
just to see if we can get it to work and
whether there's something special there,
which is part of the fun of being on the
solar team is you get to play with this
emergent crazy technology and like maybe
it does something you wouldn't even
expected. So, I recorded a a two second
video of my clock and then uh I gave it
some cameo instructions and I said,
"You're just a walking clock. You're
walking clock. You talk like you talk.
You're a character." And then I
generated my first video and it was
insane. It was crazy. It was a walking
clock. And then I had one where uh it
was talking to Bill uh and Bill was
like, "I didn't think it would ever land
the pet cameo feature." And then uh the
walking clock's like, "Here I am. You
know, I just landed." So, it's coming.
>> It's all internal memes.
>> Talk about emergent IP.
>> Yeah.
>> Who needs Pokemon when you can have a
walking clock.
>> What's the greatest IP?
>> One thing to add in terms of the future.
I think on the feature film question,
something something I think about all
the time is like what
>> you know what what will that actually
look like? I think my I mean caveat,
Bill's the only one who who's good at
predicting the future here.
>> Um, question. My sense is that the, you
know, as we get to longer forms, what
our equivalent of a feature film will
look and feel very very different from
what a feature film is today. you know,
I don't know exactly what that looks
like, but I think on the subject of
creators and what what's coming in the
world, I think a new medium and a new
class of creators, new class could in
include a lot of existing creators and
um and support existing sort of mediums
and stuff like that. But I think we're
just in the early innings of of what I
imagine will be the next film industry
rather than thinking about this being a
future film. But I think there'll be
something new. There's some anecdote. I
hope this is true because I say it all
the time, but apparently when the
recording camera like, you know, hit the
world, the first thing people did was
record plays.
This is like the least interesting thing
you could do with a recording camera.
It's like, what what's the big idea? Oh,
we people don't have to travel around
acting. We can just film them and
distribute it. And then someone was
like, wait a minute, we can make a film
and film in all these different areas.
And I feel like we haven't we're in like
the first inning of so many different
sort of things that people will do with
this technology, especially as the
constraints change with latency and
length and all that kind of stuff.
>> So cool and fun film history uh nerd
fact is one of the original videos, and
we should check this as well, but I
think the original video was made just
down the peninsula uh to settle a bet on
if a horse when it galloped all four
legs, it left the ground. And I could
see a world where you have new that is
an example of new scientific discovery.
People didn't actually have an answer to
that. Now that you have a new simulation
format, what are we going to be able to
discover in that?
>> It will be crazy. And I think one one
broader point here is, you know, this
app right now feels very familiar in a
lot of ways, right? Is like a social
media network at its core. But
fundamentally like the way that like we
really view it internally right is with
cameo we've kind of introduced the
lowest bandwidth way to give information
to Sora about yourself right aspects
about your appearance about your voice
etc you can imagine over time that like
that bandwidth will greatly increase
right so the model deeply understands
your relationships with other people it
understands you know more than just how
you look on any given day um it's you
know seen your like how you've grown up,
all of these details about yourself and
will really be able to almost function
as like a digital clone, right? So like
there's really a world where the SOR app
almost becomes this like mini alternate
reality that's running on your phone.
You have versions of yourself that can
go off and interact with other people's
digital clones. You can do knowledge
work. It's not just for entertainment,
right? And it really involves more into
a platform which is really aligned with
kind of where these like world
simulation capabilities are headed long
term. Um, and I think when that happens,
the kind of immersion things we will see
are crazy. And, you know, for open AI
across the board, it's really important
that we kind of like iteratively deploy
technology in a way where we're not just
like dropping bombshells on the world
when there's like some big research
breakthrough. We want to co-evolve
society with the technology. And so,
that's why we really thought it was
important to like do this now and like
do in a way where, you know, we've hit
this again this kind of like GP 3.5
moment for video. Let's make sure the
world is kind of aware of what's
possible now. And also, you know, start
to get society comfortable in like
figuring out the rules of the road for
this kind of like longer term vision for
where again there are just copies of
yourself running around in Sora in the
ether like just doing tasks and like
reporting back in the physical world
because that is where we are headed long
term.
>> So cool.
>> So you're building the multiverse
>> actually kind of. Yeah.
>> Okay. Well, can can me go and find my
soulmate somewhere in there?
>> I mean, anything is possible in the
multiverse.
That's call for action everyone.
>> It is kind of crazy though because and
now I'm going to sound to totally cuckoo
but if we're in a computed you know
environment you're building the perfect
simulator
that kind of is the way you ultimately
understand and break out of the computed
environment right like are we getting
closer to the heart of the matrix
matrix
>> some very deep existential questions.
Yeah. Yeah.
>> What's your guys pee of we're simulated
like this is all
>> rising.
>> Yeah. Want me to?
>> Oh, I'm I'm low. I'm Yeah.
>> Oh, man.
>> But yeah, it's okay.
>> You're really uh Okay. I respect
>> I'm just like, you know what? Sometimes
it's got to be real.
>> Yeah.
>> I feel like I'm at like solid 60%. I
don't know. Like, more likely than not
at this point.
>> I'm I'm I'm there, too.
>> Whoa.
>> Yeah. Zero.
>> Should we make Should we make a call on
it?
>> Yeah. Trivally small.
>> Settle that.
>> What's the oracle on that?
>> Sor 10. We'll answer 10. Yeah.
>> Yeah.
>> What do you think are the theoretical
limits to Sora?
>> Yeah, it's actually a great question.
Um, I've thought a little bit about
this. Like I think there's like a
question, can you eventually like
simulate like a GPU cluster right in
Sora or something? And I I I assume
there are some very well-defined limits
on like the amount of computation you
can run within one of these systems like
given the amount of compute you're
actually running it on. Um, I've not
like thought deeply enough about this,
but I think there are some like there's
some like existential questions there
that need to get resolved. Yeah.
>> Yeah.
>> See, that's why his PIM is so high.
>> Fascinating.
>> Wow.
>> Got a few lightning round questions for
the team that we just kind of generated
on the fly here. Um, and take your time.
Jump in whenever you have an answer.
Your favorite cameo on Sora to date and
what happened.
>> That is so tough. I have I have a hot
one. Go.
Uh, shocker. Yeah. Okay. So, the there
was this Tik Tok trend of I I and I got
obsessed with them. I don't know why,
but these Chinese factory tours where
they're like, "Hello, I'm the chili this
is the chili factory." They get like one
like and it's me. Uh, and it's like
they're showing their chili factory and
they're like, "It's the chili factory."
Like, "This is amazing." Uh, and or like
there's an industrial uh chemical one.
is uh
yeah, I've lost the name, but there's an
industrial uh uh chem chemical factory.
And uh the first day um I had my cameo
options open just because I was like, I
just want to see what happens. Uh and uh
the first day late at night um I opened
my cameos and I was starting to get
tagged in factory tour uh cameos that
were all in Chinese. And I was like, I'm
I'm in the Chili Factory. And I was so
excited. I get zero likes. I liked it.
It was just me, but I was like, I'm the
Chili Factory guy now. I'm like doing
the ribbon cutting at the Chili Factory.
Amazing. Uh that's too deep of a cutout.
So,
>> congratulations.
>> Fun fact, I actually have done Chinese
Factory tours in real life and they are
truly epic.
>> Yeah.
>> Uh there's this one just I saw Mark
Cuban and Jorts dancing around that that
got me. Um, but I mean my more back to
the like just I scrolling the latest
feed and just seeing like the wholesome
content of people like doing things with
their friends actually I think what
brings me the most joy of they're not
like super liked, but it's like people
just like getting a lot of, you know,
value obviously from just like making
videos with their friends. So Sam has so
many bangers. I like the one of him
doing like this K-pop dance routine
about like GPUs or something. It's very
good. Actually, I would put it on my
Spotify. Like we had the full song.
>> Wow. It
>> was very good. It was like generated by
Sora. It's like like very compelling.
Yeah.
>> All right. Well, that leads to the next
one because you mentioned Spotify. What
does an AI fully generated AI win first?
Oscar,
Grammy,
Emmy. I think the like logical answer is
like the a short winning an Oscar.
>> Yeah,
>> I think that's probably right.
>> Yeah.
>> What would we win it for? like for like
a
>> thesort
the George trilogy.
>> Yeah.
>> We need we need new content. Yeah. From
>> I do think if people stitch things
together in interesting ways like Yeah.
I think there's you can actually start
to make some very compelling
storytelling in that. And um
>> I don't think it's like it doesn't
really feel like AI anymore. The the
content I'm seeing like that that was
actually something I noticed with Sor as
well. Just like throw it wasn't even
noticing it was AI. Um it was just kind
of interesting content.
>> That's a more interesting question.
Would will we know? Oh yeah,
>> maybe it's already happened.
>> Maybe it's already happened.
>> I feel like for Oscars, one of the cool
things that'll be unlocked is this long
tale of epic stories in history, stories
of heroism and struggle and all of these
things that have been locked up because
of the cost of creating. You know, as a
as a history enthusiast, I cannot wait
for AI to unlock all of those stories.
>> Have you seen the Bible video app?
>> No, I haven't.
>> Oh, it's really good. I'll show it to
you after.
>> Like, perfect example.
>> Yeah.
>> Or there's this movie, The Last Duel, a
few years ago about this this really
terrible crime that was committed in med
medieval France that was historically
relevant and, you know, basically says a
lot about humanity. and it just got
picked up because eventually Hollywood
picked up this important story about
humanity, but how many more are there in
human history? That's going to be really
cool. Um, favorite character from any
film or TV show.
>> I have a really random one.
>> Go for it.
>> Uh, you guys seen Madagascar?
>> Yeah.
>> King Julian.
>> Oh, played by Sasha Sasha Baron Cohen.
He's a lemur. He's a lemur king.
>> Absolutely.
>> It's just like it's a banger.
>> It's his humor meets kid-friendly
storytelling. It's just it's just
perfect.
>> I play a lot of video games, so I mean
your classic answer is going to be like
Mario or something like that. Although
I'll do the deeper cut of we were always
joking the rapper. Yeah. Yeah. Bara the
Rappa like old old PlayStation game, one
of the original rhythm games. And it's
got a great artistic style and it's got
great IP of just this little dog.
>> What is he? A dog.
>> He's a dog. Yeah. Yeah.
>> That's a good pick. When I was a kid, I
played the like Pokemon trading card
game competitively for a while. Um, so I
was like really in like the Pokemon
rabbit hole. So like I don't know
Pikachu or
Pikachu mudcips.
>> Super non-consensus
like a fringe deep cut. Um, okay. First
world model scientific discovery.
Most specific possible. Obviously you're
not going to say the discovery. I
suspect it will be something related to
like classical physics like
>> wow
>> a better theory of turbulence or
something. That would be my guess.
>> I was guessing that it was gonna be
something like that. I was like
>> Navier Stokes. I don't know. Yeah, some
fluid dynamics thing that's maybe hard
to understand. Um now there's a lot of
like unsolved kind of problems there. I
think sometimes they call it like
continuum mechanics where it's like in
between. Um and we don't have good
models of them. something that lends
itself to simulation just like the
amount of iterations you can do of a
simulation unlocking something which I I
don't yeah something in that realm the
last thing we'll be able to accurately
simulate
>> I do think there's like a set of
physical phenomenon for which video data
is like a poor choice of representation
right so like for example
is it really efficient to learn about
you know like high-speed particle
collision or something from like video
footage maybe. Um I really think video
is at its best when you know the
phenomenon that you're trying to learn
about is just natively represented uh in
the physical world. And so when you when
you need to do like you know like
quantum mechanics or some other
discipline where you know it's more
theoretical we don't have like video
footage beyond
>> you can't see it.
>> Yeah. Things that we've like manually
rendered for like educational purposes.
It feels like a weaker medium for
understanding those things. So, I
suspect those would come last.
>> I guess it's the things we don't have
sensors for.
>> Right. Right.
>> Yeah. Maybe the last things we care to
simulate is another way of thinking
about the answer. I don't know. I mean,
>> people aren't doing much with smell
right now. You know, maybe that's
>> green fields.
>> I've been meaning to tell you about
that. It's kind of awkward.
>> We're still trying to figure out how to
simulate Thomas with bad hair.
>> Oh, yeah.
>> It remains an unsolved problem. Not even
Sora can do it.
>> Thomas's hair flow. Just general
>> guzzling ketchup. Yes,
>> there was a there was a a good round of
people being bald. We were all bald
gens were good
>> actually. Kind of cool. That's a that's
a use case that doesn't
>> I don't really talk about very much but
it's like
>> visualization
>> when you're B. Yeah. Everybody wants to
be B. No. It's just like you just see
yourself in some different context. I
think that can be quite powerful even
like therapeutic in some ways where you
just like see yourself in some context
that you either want or don't want
yourself to kind of be in and just see
see yourself.
>> It's a real use case.
>> Yeah. Yeah. True.
Guys, thank you so much for coming. From
space-time tokens to object permanence,
world models that will enable scientific
discovery, the democratization of
creation,
all the way to walking clocks, you guys
have covered it all. Thank you so much.
And uh the future is being created by
you.
>> Thanks, Constantine. Thanks, Sonia.
>> Thank you.
Heat.
[Music]
[Music]
Heat.
Loading video analysis...