LongCut logo

OpenAI Sora 2 Team: How Generative Video Will Unlock Creativity and World Models

By Sequoia Capital

Summary

## Key takeaways - **Sora 2: A Leap Beyond Scale**: Sora 2's advancements go beyond mere scaling; it demonstrates emergent agent-like behavior and respects physics, failing in unique semantic ways, unlike prior video generation models. [06:20] - **Space-Time Tokens: The Building Blocks**: Sora utilizes space-time tokens, akin to 'voxels' in 3D space, to represent video data. This allows the model to consider full global context across space and time, enabling properties like object permanence. [04:34] - **World Simulators Drive AI Evolution**: Similar to how language models developed internal world models through scale, video models like Sora are building sophisticated world simulators. This capability is crucial for diverse applications, from creative content to scientific discovery. [08:48] - **Intentional Design Against Mindless Scrolling**: Unlike platforms optimized for consumption, Sora prioritizes creative inspiration. Its ranking algorithm aims to encourage user creation and prevent mindless scrolling by introducing features like 'Cameos' and remixing. [25:08], [30:39] - **Co-evolving Society with AI**: OpenAI aims for iterative deployment of technology like Sora, not just dropping breakthroughs. The goal is to co-evolve society with AI, allowing people to become comfortable with new capabilities and establish rules of engagement. [00:03], [49:28]

Topics Covered

  • Scaling Sora reveals emergent physics and agent intelligence.
  • Can Sora unlock new scientific discoveries?
  • Sora optimizes creation, not mindless consumption.
  • How Sora's cameos democratize creative storytelling.
  • Are digital clones the future of human-AI co-evolution?

Full Transcript

for OpenAI across the board, it's really

important that we kind of like

iteratively deploy technology in a way

where we're not just like dropping

bombshells on the world when there's

like some big research breakthrough. We

want to co-evolve society with the

technology. And so that's why we really

thought it was important to like do this

now and like do in a way where, you

know, we've hit this again this kind of

like GPT 3.5 moment for video. Let's

make sure the world is kind of aware of

what's possible now and also, you know,

start to get society comfortable in like

figuring out the rules of the road for

this kind of like longer term vision for

where there are just copies of yourself

running around in Sora in the ether like

doing tasks and like reporting back in

the physical world because that is where

we are headed long term.

[Music]

Today on Trading Data, we sit down with

the team behind OpenAI's Sora, Bill

Peebles, Thomas Dimpson, and Rohan

Sahai. You'll hear about space-time

tokens, building internal world

simulators, and how optimizing for

creation instead of consumption is just

better for social platforms. This

conversation goes way beyond video

generation and into questions about how

society will co-evolve with powerful

simulation technologies. We promise that

this was an actual real world

conversation and not a video generation,

but we don't know how to prove that to

you. Let's jump in.

>> Hey guys, thank you for being here at

Sequoa. Congratulations on Sora. Thank

you.

>> Um maybe you could tell us a little bit

about yourselves and how you got to open

AAI and Sora.

>> Yeah. Uh I'm Bill. I'm the head of the

Sora team at OpenAI. Uh I had a pretty

traditional path. Came through undergrad

doing research on video generation then

continued that work at Berkeley. Uh and

then started at OpenAI working on Sora

uh from you know the first day I joined.

>> And I'm Thomas. I work as an engineering

lead inside of Sora. um have a bit of a

longer story but I uh was worked at

Instagram for about uh seven years doing

some of the early kind of machine

learning systems and recommener systems

there but it was very tiny company was

about 40 people then I quit did my own

startup for a while which was Minecraft

and the browser um which we've talked

about a couple times and uh I think that

openai noticed that we had a very

cracked product team there and so uh

they acquired our company and I've been

bouncing around different products

inside of OpenAI and on the research

side as well on I was training. Um, but

super happy we landed kind of together

on Sora to bring this thing to life.

>> It was a really cool product in between

two, like the global illumination

product.

>> Oh yeah, I still believe in it.

>> Yeah, me too.

>> Awesome. I'm Rohan. I've been at OpenAI

for about two and a half years. Started

as an IC on Chat GPT. Um, but then as

soon as I saw the video gen research, I

got quickly sora pill and made my way

over there. Uh, and so currently lead

the Sora product team. before that just

startups, big companies within kind of

the valley, bunch of random stuff.

>> Yeah. Cool. Well, Bill, you are the

inventor of the diffusion transformer.

Can you tell us what that is?

>> Yeah. Um, so most people are pretty

familiar with auto reggressive

transformers, which is the core tech

that powers a lot of language models

that are out there. So there you

generate tokens one at a time and you

condition on all the previous ones to

generate the future. Diffusion

transformers are a little bit different.

So instead of using auto reggressive

modeling as kind of the core objective,

you're using this technique called

diffusion, which at a very high level

basically involves taking some signal,

for example, video, adding a ton of

noise to it, and then training neural

networks to predict the noise that you

applied.

>> And this is kind of a different kind of

iterative generative modeling. So

instead of generating token by token as

you do in auto reggressive models,

diffusion models generate by gradually

removing noise one step at a time. And

in Sora one uh we really kind of

popularized this technique for video

generation models. So if you look at all

of the other competitor models that are

out there both in the states and in

China most of them are based on dits

diffusion transformers. And a big part

of that is because dits are a really

powerful inductive bias for video.

>> So because you're generating the whole

video simultaneously you really solve

issues where quality can like degrade or

change over time which was kind of like

a big problem for prior video generation

systems which Dits ended up fixing. So

that's kind of why you're seeing them

proliferate within video generation

stacks

>> when I try to visualize it. I mean for

each diffusion you have a matrix of

pixels and then you do the entire video

at the same time which you can basically

see as different frames I imagine.

>> Can you visualize that as you know a

matrix of matrices that basically

transforms over time?

>> Yeah, it's a good question. So we really

kind of consider things at the

granularity of like space-time tokens

which is sort of like an insane phrase.

Um but you know whereas you know for

example characters are very fundamental

building block for language for vision

it's really this notion of a space-time

patch right you can just imagine this

little cuboid that composes both X and Y

like spatial dimensions as well as a

temporal local and that really is kind

of like the minimal building block that

you can like build visual generative

models out of. And so diffusion

transformers sort of consider these uh

almost you can think of it like uh voxil

by voxil. Um and you know in the

traditional versions of these these

diffusion transformer models um you have

all of these little space-time patches

talking with all the other ones and

that's how you actually are able to get

properties like object permanence to

fall out because uh basically you have

full global context of everything going

on in the video at every position in

spaceime which is like a very powerful

uh property for a neural network to

have. Mhm.

>> Yeah.

>> And is that the equivalent of the

attention mechanism is the object's

movement throughout the video?

>> Yeah, that's right. So, in our like Sor1

blog post on video generation models as

world simulators, we kind of laid out

some visuals uh which sort of go into

exactly your point here which is really

attention is like a very powerful

mechanism, right, for sharing

communication like sharing information

across spaceime. And if you represent

data in this way, right, where you

patchify it into a bunch of these

space-time tokens, um, as long as

you're, you know, properly using the

attention mechanism, that allows you to

transfer information throughout the

entire video all at once.

>> What are the biggest differences between

Sora 1 and two? And I remember with the

original Sora 1, you were already seeing

kind of emergent properties where the

more you scale, uh, the more it's able

to do things like understand physics.

What is is Sora 2 purely a function of

scaling or what are the biggest

differences?

>> Yeah, that's a great question. Um, you

know, we've spent a long time really

just doing like core generative modeling

research uh since the Sora 1 launch to

really figure how figure out how we get

the next step function improvement and

video generation capabilities. Um, we

really kind of operated from first

principles, right? So, we really want

these models to be extremely good at

physics. We want them to kind of feel

intelligent in a way that I'd say like

most prior video generation models

don't. So by that I really mean you know

if you look at kind of any of the

previous set of models that were out

there you'll notice a lot of this kind

of like effects that happen like if you

try to do any sort of complicated

sequence of like physical interactions

right for example like spiking

gymnastics classic uh

>> riding a dragon like you did

>> riding a dragon that was fun that that

happened for real actually Constantine

um not

>> about that

>> um uh you know there are like very clear

problems with the past generation of

models that that we really like set out

to solve with Sora And I think one thing

that's really cool about this model

compared to prior ones is that um when

the model makes a mistake, it actually

fails in a very unique way that we

haven't seen before. And so concretely

uh for example, if uh let's say like the

text input to Sora is a basketball star

wants to like shoot a hoop, right? Shoot

a free throw. If he misses in the model,

Sora will not just like magically guide

the basketball to go into the hoop,

right? To be overoptimistic about

respecting what the user asked for. it

will actually defer to the laws of

physics most of the time. And the basket

will actually like rebound off the

backboard. And so this is a very

interesting distinction, right, between

like model failure and like agent

failure. Agent as in the agent that Sora

is like implicitly simulating as it's

generating video. Um, and we haven't

really seen this very unique kind of

like semantic failure case in like prior

video models. This is really new with

Sora 2. Um, it's kind of a result of,

you know, just the investment we put in

like really uh doing like the core

generative modeling research to like get

this massive improvement in capability.

Hm. Okay. So, not purely a function of

scale. You're actually, you know,

there's there's some concept of agents

implicit in this. There's, you know,

there's things you're doing beyond just

scaling up the model.

>> Well, the the notion of agents, I'd say,

is actually mostly like implicit from

from scale. Um, like, you know,

>> in the same way where uh we kind of

showed that object permanence, right,

begins to emerge in store one

pre-training once you hit some like

critical uh flops threshold. Uh we see

similar kinds of things happen as we

like push the next frontier, right? So

you begin to see these agents act more

intelligently. You begin to see the laws

of physics be respected in a way that

they aren't at like lower compute

scales.

>> How does the concept of a space-time

latent patch relate to a space-time

token relate to object permanence and

how things move through the physical

world?

>> Yeah, that's a great question. So I'd

say space-time patch and spacetime token

are more or less synonymous with one

another. Um I'll use them

interchangeably. You know what's really

beautiful, right, is when people started

scaling up language models um from like

GPT1 to GPT2 to GPT3, we really began to

see the emergence of like world models

internally in these systems. And what's

kind of beautiful about this, right, is

there's incredibly simple tokenizers

that actually go into like creating the

data that we train these systems on. But

despite this very simple representation

right you know um like BP characters

what have you uh when you put enough

compute and data into these systems like

in order to actually solve this task of

predicting the next token you need to

develop an internal representation of

how the world functions right you need

to like simulate things uh and like you

know the models will make lots of

mistakes right now at like low compute

scales but as you continue pushing you

know from 3 to four to five you just see

these internal world models get more and

more robust and it's really analogous

for video right and in many ways more

explicit. So I think it's easier to

picture what like a world model or a

world simulator looks like with video

data, right? Because it is literally

representing like the raw observational

bits of like all of reality. But what's

really remarkable, right, is because

these spacetime patches are just this

like very simple and like highly

reusable representation that can apply

to like any type of data, right? Whether

it's just like video footage of like

this set, whether it's like anime,

cartoons, like whatever it is. um you're

just able to build like one neural

network that can operate on this vast

extremely diverse set of data and really

build these like incredibly powerful

representations that model like very

generalizable properties of the world

right it's useful to have a world

simulator to predict like how a cartoon

will unfold and likewise it's useful for

predicting how you know this

conversation might unfold and so that

really puts a lot of optimization

pressure on Sora to like grock these

like core fundamental concepts in a very

like data efficient way

>> did you have to put effort into

selecting the data

such that it reflected the physical

world. For example, I'd imagine if you

have data from the physical world, it

all abides the laws of physics, but you

mentioned anime that might not always

abide in the laws of physics. Did you

have to be selective or did it naturally

find patterns that separated that out?

>> That's a really great question. Um, we

did spend a lot of time, you know,

really thinking about,

you know, what does like the optimal

data mix for like a world simulator kind

of look like? And to your point, you

know, I think

>> in some cases we'll make decisions that,

you know, maybe are for like making the

model really fun. Like for example,

people love generating anime, but you

know, do not necessarily like perfectly

represent uh like the laws of physics

that are like directly useful for like

real world applications. So like to put

it another way, right? I think in anime

there are certain primitives that are

simplified that are actually probably

useful for understanding the real world.

You know, people still locomote through

scenes, for example. But like if there's

like some crazy dragon that's like

flying around, that's probably like not

so useful for like rocking aerodynamics

or something.

>> Dragon Ball Z is more or less how I

learned athletics,

>> you know? There you go.

>> The motion and Super Saiyan.

>> I think there it is an interesting

question like that. I do not know the

answer to whether somehow like

pre-training on simplified

representations of like the visual world

whether that's like sketches or like

some other modality like you know makes

you more efficient at like rocking these

concepts. I think it's actually a very

interesting scientific question that we

we need to understand better.

>> Do you think we're close to exhausting

the number of pre-training tokens there

are out there or do you think video data

is just video is just so massive and

it's actually one of the more untapped

uh vats of data? Yeah, I the way I kind

of think about this is the intelligence

per bit of video is much lower than

something like text data, but if you

integrate over all of the data that

really exists out there, uh the total is

much higher. So to directly answer your

question, you know, I think it's hard to

imagine ever fully running out of video

data. There's just like so many ways

that it exists in the world um that

like, you know, you will be in a regime

where you can continue just like add

more and more data to these pre-training

runs and continue to see gains for like

a very long time, I suspect. Yeah. you

think we'll ever discover new physics?

There's the LLM world of, you know,

Einstein thinking the whiteboard. It's

equivalent to these LLM thinking.

>> There's also just the if you develop a

perfect simulator and you just simulate

physics better and better, you you might

learn things about the world that we

haven't learned yet.

>> I totally think that this is like bound

to happen one day. And like, you know, I

think we probably need

even like we probably need one more step

function change I'd say in like model

quality to like really get to a point

where for example, you can think about

doing like scientific experiments in the

models. But like you could imagine,

right, one day you have a world

simulator that is like generalized so

well to the laws of physics that like

you don't even need like a wet lab in

the real world anymore, right? You can

just like run biological experiments

within Sora itself. And like again, this

needs like a lot of work to like really

get to a point where like you have a

system that's robust enough to do this

reliably. Um but you know internally

like again we view Sora 1 as kind of

being like the GPT1 moment for video. It

was like really the first time things

started working for that modality. Sora

2 we really view as like GBT 3.5 uh in

terms of like it really being able to

like kickstart um you know the world's

creative juices and like really like

break through this kind of usability

barrier where we're seeing like mass

adoption of these models and we're going

to need a GPT4 breakthrough to really

get this to the point where this is

useful for like the sciences as we're

seeing now with GPT5 right like I feel

like every day on Twitter I see another

like convex optimization lower bound get

like improved by GBD5 Pro

>> um and I think eventually we're going to

see the same thing happening for the

sciences with Sora Do you think you need

physical world embodiment to get there

or do you think a lot of it can be done

in effectively in sim?

>> You know, I am like always amazed, you

know, every time we like push another

like 10x compute into these models like

what just like magically falls out of

it. Um, with like very limited changes

in kind of like what we're training on

and like the fundamental like approach

to to what we're doing. Um, you know, I

suspect some amount of like physical

agency will certainly help. I I have a

hard time believing it will like make

you worse at like you know modeling like

collisions or like something else. Um

video only is like quite remarkable

though and I wouldn't be surprised if

it's actually kind of like AGI complete

for like building like a general purpose

world simulator. So for this concept of

a general purpose world simulator, a

world model, yeah, where you can do

science experiments in that world, do

you think that video is the sole or some

combination of video and text are the

combined data inputs and you train it on

this type of uh this type of model or is

it going to be does it have to be based

on more structured laws of physics that

are understood and laws of biology that

are understood? I think it probably

depends a lot on um the specific use

case you're kind of envisioning for for

the world simulator. Like for example,

if you just really want to build like an

accurate model of how like a basketball

game is played, I actually think like

only video data and like maybe audio as

well or like kind of sufficient to to

build that system.

>> Not of me playing basketball. That would

be an inaccurate very bad player of

basketball

>> you know. Uh you actually like Sora's

current understanding of how people play

basketball. Constantine may be at your

level.

>> Wow. Okay, that makes you know that

makes sense.

>> It's possible. It's possible.

>> Uh

>> I think he just dissed you.

>> I I like it's accurate.

>> But it's better than mine, Constantine,

for what? That was like a Sora one

situation.

>> You're at Sora, too.

>> We'll toss some hoops. Is that what

they'll say? Like,

>> you know, I'm down. I'm down. Yeah.

Shoot some hoops.

>> Thanks.

>> Toss some hoops.

>> Thomas's first first statement in the

podcast.

>> I'm also at your level.

Um, you know, I think it is an

interesting question like what are all

of the modalities that should be present

in like this kind of general purpose sim

system. Like certainly, you know, if you

add more modalities, I have a hard time

believing it will like decrease the

intelligence. I also think there's an

argument to be made that

>> um just, you know, adding more and more

does not provide like significant

marginal value compared to like, you

know, full mastery of like video and

audio for example. I think it's an

interesting open question. I'm not I'm

not actually sure right now. Um, and

it's something we need to understand

more.

>> Yeah. So cool. Sonia a minute ago

mentioned Einstein on a whiteboard and

obviously that makes me think of you

Thomas and your hair.

>> Me too. Knew it was coming.

>> It had it had to come. Like if if any

hair gives the feeling of space-time

tokens. It's definitely

definitely yours at some point. You

know, Bill, you're the creator of this

revolutionary technology that has

changed the way that AI video is

created. At some point you from Soro 1

to Soro 2 said hey alto together you

said there needs to be an application

around this. There's some benefit to an

application. You brought together some

of the best product people in the world.

How did that crew come together at

OpenAI?

>> Yeah it's a I mean the story is never as

linear as you might think it is. Um so I

think that I mean we've had a product

team on Sora since the get-go. Rohan was

like uh spearheading that effort in the

Sora one days but I think Bill's right

when he says it was really like a GPT1

kind of moment. we're seeing pockets of

very interesting things there. Um, but

the models were not like, you know,

models without sound, videos without

sound. It's like a very different kind

of environment. And so, um, we're

working on that surface mostly targeted

on on kind of like a proumer, uh,

demographic. And separately, I mean, I

can probably go into more details of all

that. Um, separately, we're also just

kind of exploring different social

applications of AI uh, inside of OpenAI

and like what that could look like. We

had a lot of prototypes, most of which

were quite bad. Um, and when we started

to see some of the magic was actually

with image genen, um, before it had been

released, uh, we were playing with it

internally in a social context. And the

social context was really interesting to

see that what people were doing is you'd

sort of like take an image and then

you'd have like a chain of remixes of

that image where like I don't know,

there was a

>> it's a duck and then now the duck's on

somebody's head and now everything's

upside down and they're smoking a

cigarette like uh just a lot of weird

things. Yeah. It's like and um we were

seeing this, we're like, "Oh, this is

kind of like a very interesting thing

that like you know, nobody can really do

that with like social media because it's

so hard to create something or riff on

something. It like is such a high

barrier to entry action. Um maybe you

have to get a go get a camera set up and

not it's not just like thinking of the

idea. There's actually a lot of things

involved." And so

>> we were like, "Okay, this is a very

magical behavior. How can we kind of

productize that behavior?" Um and we're

mostly thinking of it away from Sora.

Um, some of the Sora research was still

ongoing and I mean there were signs of

life but it wasn't like quite there yet

in productized form. Bill probably had

it in his head somewhere. He's like I

can see the future but that's fine. Uh

I'm a little bit more can't can't quite

see the future yet. So um uh so we were

just exploring that. I think we we tried

a few things and then at some point the

the research was really uh just showing

very clear value of even iterative

deployment style value of like oh this

is something that people will really

want. And so we went into this project

like two or three months ago. Was it

wasn't very long.

>> It was like July 4th. Wow.

>> July 4th. Yeah.

>> Um

>> that's when you disappeared, Thomas.

>> That's when it disappeared. Yeah.

>> Exactly.

>> So um and we just kind of locked in

like, okay, we're finally doing it. You

know, that's always a moment. Um and uh

we started with without any magical

features, just like okay, let's just try

to get a native video environment where

you can hear the audio full screen. Um

and we did some quick generations.

Things were showing very they're very

cool, very fun, very interesting. Um and

because of that image genen experience,

we sort of had thought like okay, what's

the the magical here magical thing here

is that like barrier to entry is very

very low for creation. Um coming from

Instagram, that's like it's impossible

to get people to create on Instagram and

that's the most valuable thing that

people do. Um so what does that unlock?

And it's like, okay, well, that remix

thing from image genen that kind of

could still apply here. And so we

brainstormed all these things about how

could remixes work and like what does a

remix mean here? Um, one of those was

this like cameo thing, which I think

also Bill had in his head, but this, you

know wasn't

>> is in the ether. It

>> was in the ether for sure. Um, but we

just were like hacking together things

on the product. It was see if this

works. Um, I I didn't think it would

work at all. Um, but it was on the list

and there were a few other things on the

list. Some of them were pretty crazy. It

was like a

>> Why didn't you think it would work?

>> I am bad at predicting technology

that like uh it wasn't super clear to me

that you could like, you know, take a

likeness of a of a person and have that

kind of imagined into a video form um

and whether it would work or not. And so

we had early prototypes of different

things of like people reacting in the in

the video corner or uh stuff like that.

But when when we saw cameos just start

to work and even playing internally

we're like Ron do you remember that day

where we're like

>> the feed is entirely cameos

>> it's entirely it just went from

>> you know we didn't have that feature

once we had that feature product market

fit on the team all everything we were

generating was all of each other

>> inside must have seen the meme potential

>> I mean yeah that's I think at first

>> we were just like this is hilarious this

is amazing and then like a week later we

were like this is still all we

Um, there's something here.

>> Yeah. I mean, at first we were actually

a little bit like, is this good? Like,

hey, the cameos, it's just all cameos

now.

>> Does anyone else care about this?

>> People care about other people doing

stuff. And um, we kind of got to the

point where we're like, no, no, this is

actually good. Like, it's actually, it

feels like I'm coming back to see. And

it really humanized it uh a lot where

like a lot of AI video is just um kind

of static scenes that are quite

beautiful, quite interesting, might have

extremely complicated uh things going

on, but they lose that human touch and

uh it really felt like it was coming

back into it. So

>> another learning from Image Gen 2, like

Image Genen took off and had viral

moments because I think you could put

yourselves in these scenes in accessible

ways that weren't possible before.

obviously this massive like put me in a

gibli scene um people taking selfies

with their idols and stuff like that and

so the ca once you know once you

actually kind of thought about it it's

like yeah cameo feature makes a lot of

sense you put yourself in all these

scenes that's way more exciting you and

your friends it's novel it's like not

something you could do before

>> yeah and then that combined with remixes

I mean cameo is kind of remix to begin

with but then you start to think about

okay well now I can riff on Roan doing

something or whatever it is but Bill had

you wrapped in an action figure package

and it was

>> it's been remixed like an insane number

of times.

>> Thousand of times. Yeah. So like uh just

very very crazy things that kind of go

on and very emergent. A lot of stuff

that I would have never thought of

>> actually. How many generations of you

guys have been like publicly posted at

this point?

>> I have no idea.

>> Uh I know I'm 11,000 or so.

>> I was like a little less than that. I

>> Wow.

>> Yeah. It's crazy.

>> What has surprised you about the types

of users that are really sticking with

Sora? Who is it really a hit with? If

you just go to the latest feed, which is

just like the fire hose,

>> astronaut mode

>> of everything. Yeah, it's it's

space-time Thomas mode. Um, it's wild

out there. But

>> that gives you a pretty good snapshot

into like just everything happening. I

mean, I think we have

>> like uh almost 7 million generations

happening a day. So, you can imagine

there's just a ton of information there.

Uh, it's one of my favorite ways to just

get product feedback. It is so diverse

the type of stuff people are doing, the

type of people. There'll be like a

complete variety of age. Some people

just doing envisioning themselves in

scenes that seem like motivation

oriented. People just memeing with their

friends, people cameoing some of like

the public figures on the platform that

have have done cameos. So, I think the

the diversity has has surprised me. I

was kind of expecting this sort of like,

you know, the Twitter AI crowd to like

heavily dominate the feed. They

definitely dominate like the press

cycles, uh, at least the ones that, you

know, we're most exposed to. But in

terms of people actually using this,

it's it's quite a wide variety. And um

last thing I'll say is a bigger

departure from like the sort of niche AI

film crowd that existed before, which is

great early adopters, but now you kind

of get these I I thought it would start

there, but it felt like it started with

just a way wider range of people. Um I

think getting to the top of the app

store helps with that. You just get

people who are like browsing and see

this thing.

>> My mother keeps cameoing Thomas.

So weird.

>> There are real stories like that.

>> You said 11,000.

>> She's done 10,000 of them.

>> Uh Thomas, you wrote the original

algorithm, if I'm right, for for the

Instagram uh ranking ranking algo. Uh

there was a lot in the Sora 2 blog post

about how you guys are clearly being

very intentional about how you want to

do ranking in the algo. Can you can you

talk a little bit about lessons learned

from Instagram and how you're

approaching it over at Sora?

>> Yeah. Uh I mean there's there's a lot to

cover in that. Um I think that the first

thing to think about when we think about

these platforms or think about sor

specifically is it is the thing thing I

was mentioning before about creation.

Um, so, uh, soar enables basically

everybody to be a creator on this

platform and that is a very very

different environment than something

like Instagram where you have this like

extreme power law of the people that are

creating. Um, and the power power law

just naturally gets more uh

>> narrow. What's the right word there? But

more uh head heavy. Yes. Um, so

sometimes I feel like I have to defend

myself on the Instagram uh algorithm

side. We actually did it for I mean we

did it for a reason. It was to solve a

problem. it wasn't just kind of like a

random decision to, you know, optimize

for ads or something like that. Um, and

the the reason we did that was that we

noticed that like what was happening on

Instagram over time was um because it

was chronologically ordered, every

single person that posted was guaranteed

to have the top slot of all their

followers. And so if you think about

that for a second, the incentive for

somebody in that environment is actually

to create constantly because they are

guaranteed distribution when they

create. Um and over time because of this

like power law becoming heavier and

heavier um or like more headheavy um

those type of people which are great

they provide a lot of value to the

ecosystem u but they start to crowd out

people you really care about and so um

maybe you follow National Geographic or

something not to dunk on National

Geographic I love them but um you know

if they're posting 20 times a day your

friend's not they don't have the same uh

like optimization objective they're

probably just a picture of their coffee

or

Um, and so you'd have 20 Nat Geo posts

and then one picture that you actually

really cared about that you never really

scrolled to. Um, and there's not too

many solutions to that problem if you

have a guaranteed ordering. Um, one of

them is that you have to unfollow all

these accounts that you maybe care about

but care about not as much as the person

that posts once a day. Um, and the other

is that you have to permute the uh

permute the feed. And so we went with

that path. We tried it. We tested it out

internally. it was very kind of

controversial to do. Um, and but I I

think that you can actually kind of like

math this out. It's like a proof that

basically over time you're going to have

to take control over distribution on the

platform uh in order to prevent these

kind of issues and show people what they

actually care about. So that's why we

did it. Um, and it it actually showed a

lot of value. I remember the early

tests. I won't get into the numbers on

them, but they were um pretty

unambiguous actually about this was

showing more people that you cared

about. It was improving your experience

with the platform. It actually moved

creation, which is unusual. it made

people create more because they were

seeing more content that was like

accessible to them. Um, but I also think

that these things can go astray over

time and I won't say like the Instagram

algorithm is unequivocally bad or

unequivocally good. Um, but when we

started to open up to more unconnected

content and uh ad pressure was very

strong. There's a also a natural company

incentive to optimize for just blind

consumption because that's how you make

money. Um, so, uh, maybe cheaper content

or maybe just like get people to scroll

more and more and more and more. Uh, and

that also can encourage people to create

less, uh, because it's just like a more

mindless scrolling mode. Um,

>> but you guys have very concretely

committed to doing things to prevent

that kind of behavior.

>> Yeah. Um, we have we have a lot of

mitigations there in place, but I I

think uh, what it really comes down to

me is just like what are we trying to do

as a platform? And I think the magic of

this technology is that everybody is a

creator. And so we want this feed to be

optimized for you to create to inspire

you to create. And that can be like

sometimes when you think of inspiration,

you think of like, oh, it's this

beautiful crazy scene that's so elegant.

Uh when I think about that, I I think of

like a meme culture or something really

funny or like oh that's cool. I've got a

riff on that. And I think that's a very

different brain mode when you're

browsing the feed. Um and of course we

have lots of other things in place. So

like I think it starts with an

incentives. That's our incentive right

here is to encourage more creation in

the ecosystem. Um, but there are

certainly use cases we want to prevent.

We're not going to get them right all

the time. Um, it's very challenging.

It's a very living system. It's also

very hard to write a recommener system

when you have no data and you don't know

what to recommend or you don't know how

the platform's going to evolve. Um, but

that's like basically how I kind of

think about the incentives of feed. And

then Roan, we have a lot of mitigations

in place that I think you've been kind

of like thinking about and maybe even

more deeply than I have. uh about like

preventing maybe the the extreme cases

and so um I don't know if you you want

to talk a little bit.

>> Yeah, happy to. But one thing before you

I mean just one thing to add is that the

stated intent of like optimizing for

creation is working really well. Y it's

almost 100% of people who like get past

the invite code and all that on the app

end up creating on day one. Um when they

come back it's like 70% of the time they

come back they're creating and 30% of

people are actually even posting to the

feed. So not just like generating for

themselves, they're actually like

posting into the ecosystem, which is

incredible testament to the model, how

fun it is, and to like how what we're

optimizing for is actually working

pretty pretty well right now. Um, but

yeah, beyond that, I mean like one of

the top of mind things is I think we

don't want this just to be like a a

mindless scroll and beyond just

optimizing for creation in the ranking

algorithm. There are things we can do

like trying to just get you out of this

sort of flow state um of just like

consumption and push you into like

creative mode. I think there's a great

article on this called like the curve

linear nature of casinos where they

design it so you never have to make any

decisions. It's just like you walk in a

circle, there's no windows, all that

kind of stuff.

um we can be very intentional about not

doing that and like you know whether

it's an infeed unit that's like hey you

just kind of viewed a couple videos in

this domain why don't you try creating

something um or other ways to just kind

of like push you out of that we we

actually have things like that in the

product um yeah those are some of the

things that come to mind yeah

>> I really commend you guys for what

you've done to you know make sure that

there there's a version of the world

where video model as world simulator

could have just ended up with us you

know each retreating into our own

computer screens and just becoming

addicted and just retreating into

ourselves. And I think the amount to

which you're, you know, prioritizing the

the human element and the social

element, I think that the care you've

put into that really shows.

>> I don't think we would have launched

like a feed of just like AI content that

wasn't that didn't have a human feel

like just I don't think that excited us.

And as soon as we we like had the

product, we had Cameo, and we had that

feeling internally, um, we were like,

"Okay, this is actually a little

different than

>> Yeah, I don't know.

>> I don't think it was totally obvious."

Again, it was like a pretty crazy sprint

to go through this. Uh,

>> and it wasn't like super obvious to us

what what would emerge, but

>> I think that the idea it makes sense in

retrospect, but it was a completely not

obvious product decision. The cameos

would be the thing. Yeah.

>> Um, where it's like, of course, you just

want to see your friends doing cool

things, so it's like, that makes sense.

Um, but I I was never actually that

afraid of competitive pressure in that

that crazy product phase because I was

like we sort of had these all these

non-trivial decisions that are obvious

in retrospect but were not obvious at

the time that were sort of building on

top of each other. It's like okay

cameos. Well, there's also a version of

Cameo where you have a crazy flow that's

just for you and it's a oneplayer mode

Cameo and you like go through this

onboarding flow and do your stuff. But

we were already seeing these interesting

dynamics where it's like, well, I could

tag Rohan into my video. That's crazy.

like and then we can have like an

argument or like I'm gonna have a anime

fight doesn't matter. Um and I was like

okay so that's that's actually the human

element that's the that's the magic of

this is actually strangely more social

than a lot of social networks even

though it's all AI generated content. So

very unintuitive.

>> Totally.

>> Is it a separate is it fine-tuned

version of Sora 2 or is it like is a

separate model from what's available

over the API or is it the same

>> between the app and

>> product? So we currently exposing like

the models in the same state across API

and and the app.

>> Okay. Really interesting. Um what are

you seeing people do on the API side? Uh

and is it different from the types of

things people are doing on the on the

consumer app?

>> The motivation behind even launching an

API is just like support of these

longtail use cases. is like we have this

vision of enabling you know um chat GPT

scale level consumer audience with this

tech but there's tons of very niche

things out there you can imagine people

who are much you know with sor one we

went out and talked in a lot of these

studios what we heard from them is like

they want to integrate this in this

specific part of their stack in this

specific way and we'd love to support

all these longtail use cases but we

don't want to build a thousand different

kind of interfaces for this stuff so

that's the kind of stuff we're excited

to see with the API so far it's been you

It's been kind those kind of like a

little bit more of a niche company not

trying to build like a first-party

social app but maybe um you know has

some either film making kind of audience

or kind of people they're supporting or

even just like we've defin we've seen

some like people trying to you know I

think there was like a some company

making um they were doing something with

CAD where they were like using Yeah.

Yeah. Yeah. That's cool.

>> Um so there's there's there's cool use

cases out there. I think we're still

getting a sense of what they are. Yeah,

I think there's a lot that can be done

with these things. I think about gaming

all the time just based on my background

and I'm

>> you know AI and gaming is always a very

controversial subject, but it's very

clear that there's a there's a place and

there's a role. Um maybe it doesn't

>> have to interrupt the creative process

can enhance it and um I'm pretty excited

to see some of those use cases emerge.

>> Do you think the video models are good

enough now to for people to be able to

build video games on top of the API or

do you think we're still another rev or

two away?

>> I have my own take on this. I was going

to say never bet against the ways people

can be creative with technology to build

like someone will be able to build a

game and maybe has built a game already.

Will it feel look and feel like a

>> you know obviously there's latency with

this model so you'd have to do all sorts

of crazy stuff to get around that but

>> like I think that your mind immediately

goes to like kind of the obvious sort of

things that you would do in gaming and

we've seen some of that sort of stuff uh

certainly in research blogs and that

that kind of thing. Uh my mind often

goes to like, okay, this is like a

creative tool that's a little bit

different. Um and the types of games

that really excite me there, I'll just

go off on one, which is like there's a

game called Infinite Craft, uh which is

the world's simplest game. It's a web

game where you just take elements. It's

like fire, water, earth. You have like

four elements to start and you just drag

them and it

>> love this game.

>> It it combines into something new. And

the thing it combines with is like a

it's LLM based. So, it's like fire and

and and earth might be a volcano. Um,

and then volcano plus water might be an

underwater volcano or Godzilla or

something like that. You always end up

in Godzilla for some reason. I don't

know why. Um, but uh that's a game that

like it's like, oh, it kind of makes

sense where it's like, yeah, you don't

really need a crafting tree. The LM can

derive this crafting tree and it's a

process of discovery. Um, and so I think

there's a lot of untapped stuff in that

space where again I like the idea of a

process of discovery. In fact, my phil

philosophical philosophical view on LLMs

and video models to some extent is that

it is a process of discovery. These are

all in the weights. You're just

unlocking it with like a secret code,

which is your prompt. Um, and I I love

that. That is that is very magical. That

was always in gaming. That was the thing

that like excited me the most was

discovering something new, especially if

it was a true discovery. it wasn't put

there by somebody else. Maybe they just

enabled the mechanics around it. So I

think there's a huge opportunity in that

space of uh of gaming when you think

about games in just a different thing

and like embrace this technology in a

very different way.

>> It reminds me of how uh some of the

earliest use cases for GPT3 were kind of

these text games.

>> Um so it's different from how you think

of a you know video like a playable

video game, but actually a lot of these

mechanics are very game-like.

>> Exactly. Yeah. I think there's a there's

still constraints and I think that's

going to be like the mechanism design

and that's still very human. Um like a

lot of the early games with GP3 that

kind of like yeah it was fun for a

minute and then it kind of went off the

rails and you're like I don't really

know what I'm doing anymore. Um but

again like this is sort of in some ways

so Sor feels like a little bit of that

where it's got a little bit of gaming um

DNA inside of it where it feels very fun

and different and exploratory. So I I

like things like that and uh I think

there's going to be more use cases that

we can't even think of. It's too

creative. What are you guys seeing on

the creative film making side? Like is

that an important target market? Do you

want to do you want to empower the long

tail or do you want to empower the the

head so to speak of the creative market?

>> I think it's a really good question. You

know, we've benefited a lot from um

creatives who are really willing to like

go all in on you know even like the

early technology like Dolly 1, Dolly 2

and really like help steer us uh along

the path. And like I think it's

important that we continue to you know

build things for for those folks and we

are working on some things that are like

more targeted towards like creative

power users long term. At the same time

you know I do think like AI is a very

democratizing tool right at its best.

And so what's kind of beautiful about

the sor platform in general right is

whenever someone kind of strikes cool

right you see one of these like

beautiful anime prompts that like goes

to like the very top of the feed for

everyone um like anybody can go and

remix that right? everyone has the power

to like build on top of that and like

learn from all of these people who come

in with this like incredible knowledge

about uh how to like really get the most

out of these tools. Um and so I am

really excited just to see the net

creativity of humanity just increase as

a result of this. Um but I think a big

part of that right is continuing to

empower people who are always at the

frontier which are these like more

pro-oriented like creator um type folks

and so we want to keep investing in them

as well. We've nerded out for a while,

like almost a couple years now, about

that vision of feature film. Yeah.

>> Length content.

>> Like, yes, you have these amazing cameos

and shorter content, but at some point

the individual creator, this has been

something that you've been excited about

for a very long time.

>> Yeah.

>> When do we get there? You know, is there

a point where we have a feature film

that is created on Sora 2?

>> Yeah.

>> And how do we consume it? Is it in the

Sora app? Is it posted somewhere else

online? Do you go to a movie theater and

watch it?

>> Yeah, it's a great question. I mean, I

think this will happen in stages to some

extent. So, like if you guys watch the

right, the launch video. I mean, that

was made by like Daniel Frighen who's on

the Sora team and he already with these

tools, right, is able to pump out these

like incredibly compelling short stories

within like days at most. I mean, he

literally made that like all by himself

in almost no time. And he's been like

continuing to like put new ones out

there on like the open eye Twitter

since. Um like clearly this is like

massively compressing the latency uh

that's associated with like film making.

Um I think to get to the point where

like really anybody can do this, right?

Like any kid in their home can just like

fire up the app or store.com or

something and go and make this. It's

really like an economics problem of like

the video models. Video is the most

intensive comput intensive modality to

work with. It's extremely expensive and

you know we're making good progress on

the research team like really continuing

to figure out ways to make this

affordable for everyone long term. Like

right now for example you know the store

app is like totally free. Um in the

future there will probably be ways where

people can pay money to get more access

to the models um just because that's the

only way we can really scale this

further. Um but uh you know I think we

are not far off from this world where

anybody can really like have the tools

to make amazing content. You know, I

think there's going to be like a lot of

bad movies that get created by this, but

like likewise, you know, there's

probably the next great film director

who is just kind of like sitting uh, you

know, in their parents house like still

in high school or something and just

like has not had the investment or the

tools to be able to like really see

their vision come to life. And we're

going to find like absolutely like

amazing things from like giving this

technology to like the whole world.

>> I'm looking forward to the feature film

length Constantine's Greek Odyssey.

>> Me too.

>> Coming to theaters near you.

>> We're all in it together actually.

Different characters. I play the cyclops

and um it's a it's a good one.

>> I think um just to touch on that one one

more thing the something I've learned

from recommener systems over and over

again is that like oftentimes so the

tools getting more people more creative

is going to be a huge unlock for just

you know making people more more

creative in general and because you

don't need this access to this like film

making equipment all that sort of stuff.

Um but we do consistently see that

things content is like also a social

phenomenon in a way and like uh movies

and all that all all everything you see

out there is kind of a bit of a social

phenomenon in addition to the actual

content itself. And so I think we're

going to enter a very interesting world

where you know there's so many people

creating and so much content out there

um that even the idea that people aren't

paying attention to and watching it is

going to become more and more important.

Um, and I think that's actually going to

make the quality of content just to kind

of elevate because there's this anybody

can create and actually it's going to be

the consumption that's going to be quite

limited which is very different than the

world we live in today.

>> You guys are very thoughtful and

intentional about how you treated IP

holders. Can you say a word on that? you

know, we've been in close partnership

with like a bunch of folks across the

industry and like really trying to like

both show them kind of this like new

technology, right, that is actually like

a huge value proposition um for rights

holders across the board, right? And

like we're hearing so much excitement

from the folks we're talking with. like

they really see this as being like, you

know, a new frontier for again like, you

know, every kid in the world having the

ability to like go and like use um like

some of this beloved IP and like really

like bring it into their lives in a way

that feels much more personal and custom

than what's been possible before.

>> Um at the same time, you know, we really

want to make sure that we're doing this

like in the right way. So we've been

like really trying to take feedback and

like really uh steer our road map in a

way where we know that you know both

users are going to have an awesome

experience getting to use this IP but

also the rights holders are going to get

you know properly monetized and rewarded

um in a way that you know everyone wins

basically. So, we're right now actively

working on trying to scope out the exact

details about like how we're going to,

you know, for example, make it so if you

want to cameo your favorite character

from some like beloved film or

something, um, you can do that in a way

where uh, you have access to it, but

like monetization will flow back to the

rights holder, right? So, really trying

to figure out this kind of like new

economy for creators. Uh, we kind of

just have to create this from scratch

right now. There's a lot of deep

questions about how to do this the right

way. And you know, as with everything

with this app, we come into it with like

an open mind and we hear feedback and we

iterate quickly. You know, we're not

sure where this is going to totally

converge. Uh but we're working closely

with people to figure it out.

>> Really cool. What's ahead?

>> Pets.

>> Yeah, I think I mean one when

>> Sorry, what

came

is that one of the most demanded

features?

>> Break for me. It is. Bill's demanding.

Uh no,

>> it will remind us we were just talking

about curing diseases and role models

and uh now we're to the future.

>> Uh this is something um no it's actually

so that's definitely true. We've

committed to that. It's coming. Um but

we we I promise the uh we actually had

Bill's dog as like when we were playing

around with this rocket.

>> The good boy.

>> Yeah. And actually was very very cool um

to actually feature a pet. You can

imagine where that goes. It doesn't have

to necessarily be a pet. Um, could be

anything. A clock or whatever you have.

>> Um,

>> clock.

>> Well, yeah. Yeah. Yeah. Um, do

>> you have a special clock?

>> Actually, it's really compelling.

>> I didn't think it could be so compelling

until Thomas showed me this clock. It's

like a sentient clock.

>> It's a Well,

>> but it's it's like based on like a a

real clock.

>> Yeah. I I had a clock. My father was a

my father was a technology person for a

while. This company, Veraritoss, gave

him a clock for his like whatever

anniversary. Anyway, so I have it on my

my uh my uh like table somewhere and uh

there's this old Simpsons episode where

they talk about a walking clock and for

some reason that's just been an earworm

in my head for the last 30 years. And so

I always it's like you know they're

telling some joke and it's like is it a

walking clock? Is it a walking clock?

It's like walking clock and then it's

like no man it's my dog. And so it

connected in my brain where I was like,

"Okay, rocket walking clock." And then

so I tried it

>> Thomas.

>> Uh yeah. So it connected my brain and

we've been playing around with this um

just to see if we can get it to work and

whether there's something special there,

which is part of the fun of being on the

solar team is you get to play with this

emergent crazy technology and like maybe

it does something you wouldn't even

expected. So, I recorded a a two second

video of my clock and then uh I gave it

some cameo instructions and I said,

"You're just a walking clock. You're

walking clock. You talk like you talk.

You're a character." And then I

generated my first video and it was

insane. It was crazy. It was a walking

clock. And then I had one where uh it

was talking to Bill uh and Bill was

like, "I didn't think it would ever land

the pet cameo feature." And then uh the

walking clock's like, "Here I am. You

know, I just landed." So, it's coming.

>> It's all internal memes.

>> Talk about emergent IP.

>> Yeah.

>> Who needs Pokemon when you can have a

walking clock.

>> What's the greatest IP?

>> One thing to add in terms of the future.

I think on the feature film question,

something something I think about all

the time is like what

>> you know what what will that actually

look like? I think my I mean caveat,

Bill's the only one who who's good at

predicting the future here.

>> Um, question. My sense is that the, you

know, as we get to longer forms, what

our equivalent of a feature film will

look and feel very very different from

what a feature film is today. you know,

I don't know exactly what that looks

like, but I think on the subject of

creators and what what's coming in the

world, I think a new medium and a new

class of creators, new class could in

include a lot of existing creators and

um and support existing sort of mediums

and stuff like that. But I think we're

just in the early innings of of what I

imagine will be the next film industry

rather than thinking about this being a

future film. But I think there'll be

something new. There's some anecdote. I

hope this is true because I say it all

the time, but apparently when the

recording camera like, you know, hit the

world, the first thing people did was

record plays.

This is like the least interesting thing

you could do with a recording camera.

It's like, what what's the big idea? Oh,

we people don't have to travel around

acting. We can just film them and

distribute it. And then someone was

like, wait a minute, we can make a film

and film in all these different areas.

And I feel like we haven't we're in like

the first inning of so many different

sort of things that people will do with

this technology, especially as the

constraints change with latency and

length and all that kind of stuff.

>> So cool and fun film history uh nerd

fact is one of the original videos, and

we should check this as well, but I

think the original video was made just

down the peninsula uh to settle a bet on

if a horse when it galloped all four

legs, it left the ground. And I could

see a world where you have new that is

an example of new scientific discovery.

People didn't actually have an answer to

that. Now that you have a new simulation

format, what are we going to be able to

discover in that?

>> It will be crazy. And I think one one

broader point here is, you know, this

app right now feels very familiar in a

lot of ways, right? Is like a social

media network at its core. But

fundamentally like the way that like we

really view it internally right is with

cameo we've kind of introduced the

lowest bandwidth way to give information

to Sora about yourself right aspects

about your appearance about your voice

etc you can imagine over time that like

that bandwidth will greatly increase

right so the model deeply understands

your relationships with other people it

understands you know more than just how

you look on any given day um it's you

know seen your like how you've grown up,

all of these details about yourself and

will really be able to almost function

as like a digital clone, right? So like

there's really a world where the SOR app

almost becomes this like mini alternate

reality that's running on your phone.

You have versions of yourself that can

go off and interact with other people's

digital clones. You can do knowledge

work. It's not just for entertainment,

right? And it really involves more into

a platform which is really aligned with

kind of where these like world

simulation capabilities are headed long

term. Um, and I think when that happens,

the kind of immersion things we will see

are crazy. And, you know, for open AI

across the board, it's really important

that we kind of like iteratively deploy

technology in a way where we're not just

like dropping bombshells on the world

when there's like some big research

breakthrough. We want to co-evolve

society with the technology. And so,

that's why we really thought it was

important to like do this now and like

do in a way where, you know, we've hit

this again this kind of like GP 3.5

moment for video. Let's make sure the

world is kind of aware of what's

possible now. And also, you know, start

to get society comfortable in like

figuring out the rules of the road for

this kind of like longer term vision for

where again there are just copies of

yourself running around in Sora in the

ether like just doing tasks and like

reporting back in the physical world

because that is where we are headed long

term.

>> So cool.

>> So you're building the multiverse

>> actually kind of. Yeah.

>> Okay. Well, can can me go and find my

soulmate somewhere in there?

>> I mean, anything is possible in the

multiverse.

That's call for action everyone.

>> It is kind of crazy though because and

now I'm going to sound to totally cuckoo

but if we're in a computed you know

environment you're building the perfect

simulator

that kind of is the way you ultimately

understand and break out of the computed

environment right like are we getting

closer to the heart of the matrix

matrix

>> some very deep existential questions.

Yeah. Yeah.

>> What's your guys pee of we're simulated

like this is all

>> rising.

>> Yeah. Want me to?

>> Oh, I'm I'm low. I'm Yeah.

>> Oh, man.

>> But yeah, it's okay.

>> You're really uh Okay. I respect

>> I'm just like, you know what? Sometimes

it's got to be real.

>> Yeah.

>> I feel like I'm at like solid 60%. I

don't know. Like, more likely than not

at this point.

>> I'm I'm I'm there, too.

>> Whoa.

>> Yeah. Zero.

>> Should we make Should we make a call on

it?

>> Yeah. Trivally small.

>> Settle that.

>> What's the oracle on that?

>> Sor 10. We'll answer 10. Yeah.

>> Yeah.

>> What do you think are the theoretical

limits to Sora?

>> Yeah, it's actually a great question.

Um, I've thought a little bit about

this. Like I think there's like a

question, can you eventually like

simulate like a GPU cluster right in

Sora or something? And I I I assume

there are some very well-defined limits

on like the amount of computation you

can run within one of these systems like

given the amount of compute you're

actually running it on. Um, I've not

like thought deeply enough about this,

but I think there are some like there's

some like existential questions there

that need to get resolved. Yeah.

>> Yeah.

>> See, that's why his PIM is so high.

>> Fascinating.

>> Wow.

>> Got a few lightning round questions for

the team that we just kind of generated

on the fly here. Um, and take your time.

Jump in whenever you have an answer.

Your favorite cameo on Sora to date and

what happened.

>> That is so tough. I have I have a hot

one. Go.

Uh, shocker. Yeah. Okay. So, the there

was this Tik Tok trend of I I and I got

obsessed with them. I don't know why,

but these Chinese factory tours where

they're like, "Hello, I'm the chili this

is the chili factory." They get like one

like and it's me. Uh, and it's like

they're showing their chili factory and

they're like, "It's the chili factory."

Like, "This is amazing." Uh, and or like

there's an industrial uh chemical one.

is uh

yeah, I've lost the name, but there's an

industrial uh uh chem chemical factory.

And uh the first day um I had my cameo

options open just because I was like, I

just want to see what happens. Uh and uh

the first day late at night um I opened

my cameos and I was starting to get

tagged in factory tour uh cameos that

were all in Chinese. And I was like, I'm

I'm in the Chili Factory. And I was so

excited. I get zero likes. I liked it.

It was just me, but I was like, I'm the

Chili Factory guy now. I'm like doing

the ribbon cutting at the Chili Factory.

Amazing. Uh that's too deep of a cutout.

So,

>> congratulations.

>> Fun fact, I actually have done Chinese

Factory tours in real life and they are

truly epic.

>> Yeah.

>> Uh there's this one just I saw Mark

Cuban and Jorts dancing around that that

got me. Um, but I mean my more back to

the like just I scrolling the latest

feed and just seeing like the wholesome

content of people like doing things with

their friends actually I think what

brings me the most joy of they're not

like super liked, but it's like people

just like getting a lot of, you know,

value obviously from just like making

videos with their friends. So Sam has so

many bangers. I like the one of him

doing like this K-pop dance routine

about like GPUs or something. It's very

good. Actually, I would put it on my

Spotify. Like we had the full song.

>> Wow. It

>> was very good. It was like generated by

Sora. It's like like very compelling.

Yeah.

>> All right. Well, that leads to the next

one because you mentioned Spotify. What

does an AI fully generated AI win first?

Oscar,

Grammy,

Emmy. I think the like logical answer is

like the a short winning an Oscar.

>> Yeah,

>> I think that's probably right.

>> Yeah.

>> What would we win it for? like for like

a

>> thesort

the George trilogy.

>> Yeah.

>> We need we need new content. Yeah. From

>> I do think if people stitch things

together in interesting ways like Yeah.

I think there's you can actually start

to make some very compelling

storytelling in that. And um

>> I don't think it's like it doesn't

really feel like AI anymore. The the

content I'm seeing like that that was

actually something I noticed with Sor as

well. Just like throw it wasn't even

noticing it was AI. Um it was just kind

of interesting content.

>> That's a more interesting question.

Would will we know? Oh yeah,

>> maybe it's already happened.

>> Maybe it's already happened.

>> I feel like for Oscars, one of the cool

things that'll be unlocked is this long

tale of epic stories in history, stories

of heroism and struggle and all of these

things that have been locked up because

of the cost of creating. You know, as a

as a history enthusiast, I cannot wait

for AI to unlock all of those stories.

>> Have you seen the Bible video app?

>> No, I haven't.

>> Oh, it's really good. I'll show it to

you after.

>> Like, perfect example.

>> Yeah.

>> Or there's this movie, The Last Duel, a

few years ago about this this really

terrible crime that was committed in med

medieval France that was historically

relevant and, you know, basically says a

lot about humanity. and it just got

picked up because eventually Hollywood

picked up this important story about

humanity, but how many more are there in

human history? That's going to be really

cool. Um, favorite character from any

film or TV show.

>> I have a really random one.

>> Go for it.

>> Uh, you guys seen Madagascar?

>> Yeah.

>> King Julian.

>> Oh, played by Sasha Sasha Baron Cohen.

He's a lemur. He's a lemur king.

>> Absolutely.

>> It's just like it's a banger.

>> It's his humor meets kid-friendly

storytelling. It's just it's just

perfect.

>> I play a lot of video games, so I mean

your classic answer is going to be like

Mario or something like that. Although

I'll do the deeper cut of we were always

joking the rapper. Yeah. Yeah. Bara the

Rappa like old old PlayStation game, one

of the original rhythm games. And it's

got a great artistic style and it's got

great IP of just this little dog.

>> What is he? A dog.

>> He's a dog. Yeah. Yeah.

>> That's a good pick. When I was a kid, I

played the like Pokemon trading card

game competitively for a while. Um, so I

was like really in like the Pokemon

rabbit hole. So like I don't know

Pikachu or

Pikachu mudcips.

>> Super non-consensus

like a fringe deep cut. Um, okay. First

world model scientific discovery.

Most specific possible. Obviously you're

not going to say the discovery. I

suspect it will be something related to

like classical physics like

>> wow

>> a better theory of turbulence or

something. That would be my guess.

>> I was guessing that it was gonna be

something like that. I was like

>> Navier Stokes. I don't know. Yeah, some

fluid dynamics thing that's maybe hard

to understand. Um now there's a lot of

like unsolved kind of problems there. I

think sometimes they call it like

continuum mechanics where it's like in

between. Um and we don't have good

models of them. something that lends

itself to simulation just like the

amount of iterations you can do of a

simulation unlocking something which I I

don't yeah something in that realm the

last thing we'll be able to accurately

simulate

>> I do think there's like a set of

physical phenomenon for which video data

is like a poor choice of representation

right so like for example

is it really efficient to learn about

you know like high-speed particle

collision or something from like video

footage maybe. Um I really think video

is at its best when you know the

phenomenon that you're trying to learn

about is just natively represented uh in

the physical world. And so when you when

you need to do like you know like

quantum mechanics or some other

discipline where you know it's more

theoretical we don't have like video

footage beyond

>> you can't see it.

>> Yeah. Things that we've like manually

rendered for like educational purposes.

It feels like a weaker medium for

understanding those things. So, I

suspect those would come last.

>> I guess it's the things we don't have

sensors for.

>> Right. Right.

>> Yeah. Maybe the last things we care to

simulate is another way of thinking

about the answer. I don't know. I mean,

>> people aren't doing much with smell

right now. You know, maybe that's

>> green fields.

>> I've been meaning to tell you about

that. It's kind of awkward.

>> We're still trying to figure out how to

simulate Thomas with bad hair.

>> Oh, yeah.

>> It remains an unsolved problem. Not even

Sora can do it.

>> Thomas's hair flow. Just general

>> guzzling ketchup. Yes,

>> there was a there was a a good round of

people being bald. We were all bald

gens were good

>> actually. Kind of cool. That's a that's

a use case that doesn't

>> I don't really talk about very much but

it's like

>> visualization

>> when you're B. Yeah. Everybody wants to

be B. No. It's just like you just see

yourself in some different context. I

think that can be quite powerful even

like therapeutic in some ways where you

just like see yourself in some context

that you either want or don't want

yourself to kind of be in and just see

see yourself.

>> It's a real use case.

>> Yeah. Yeah. True.

Guys, thank you so much for coming. From

space-time tokens to object permanence,

world models that will enable scientific

discovery, the democratization of

creation,

all the way to walking clocks, you guys

have covered it all. Thank you so much.

And uh the future is being created by

you.

>> Thanks, Constantine. Thanks, Sonia.

>> Thank you.

Heat.

[Music]

[Music]

Heat.

Loading...

Loading video analysis...