World Labs' Fei-Fei Li on Creating Large World Models

By Bloomberg Live

Summary

Topics Covered

Animal Intelligence Started with Seeing and Moving
The Simulator Is the Linchpin of World Models
$6 Billion for Humanoids Is Too Small
The Vacuum in AI Discourse Feeds Anxiety
Standardized Tests Are Obsolete; Teach Human Agency

Full Transcript

Everyone is focused on Llms ChatGPT, Claude large language models.

But you have raised $1 billion to build something different.

Large world models make the case for us. What is the bet you are making that others aren't? Right?

others aren't? Right?

So, um, this is my, uh, co-founded startup relapse, and, uh, we are, uh, all in, in spatial intelligence. And, uh, the means to spatial intelligence is building a large world model.

So what is the case for us? The case for us is a 500 million year story, is that animal intelligence starts with seeing and moving in the physical world. That, uh, evolution began with us as

physical world. That, uh, evolution began with us as animals, knowing what the world is, knowing, knowing who we are, knowing how to move around it, interact with it. And, uh, much of life, human life, human

work life, human private life has a lot to do with perceiving, understanding, reasoning, interaction with the world, including imaginary world of creativity, of uh, of, uh, productivity, uh, as virtual worlds.

So unlocking that capability in machines, unlocking the capability of generating alien 3D, 4D worlds, unlocking the capability of reasoning within any world. Unlocking the capability of, um,

teaching agents or robots, or assisting humans to interact with the world is what spatial intelligence is about. And that's what we are focusing though.

So what can world models do ultimately that Llms will never be able to.

Kill words. Put down fires to words, uh, cook an omelet? Mm.

I think, uh, there's so much rye. So we, um, for example, creativity, um, people design people. Uh.

Uh, whether we're designing interior space, we're designing, uh, machines were design. We're designing homes, we're designing

were design. We're designing homes, we're designing stories. So much of that is beyond words.

stories. So much of that is beyond words.

Uh, we also use agents. Uh, whether we use agents, a virtual world, whether it's for entertainment like gaming or for, uh, more serious industrial, um, industrial applications, whether it's digital twin, um, design or

inspection or optimal or what kind of, uh, many kind of optimization tasks.

Or we, uh, build robots and, uh, to help us to do a lot of things from, um, putting down fire to, uh, helping healthcare scenarios to manufacturing all those our application downstream applications of, uh, unlocking spatial intelligence and building moral models. So what's the what do you think the

ChatGPT moment for world models will be like?

How will we know this has arrived? Yeah, that's a great question, Emily, because chat is such a consumer behavior that chat GPT moment tends to be used to describe a viral, uh, public consumer moment of getting so close to what I can

do in the, in the world of world models. Um, the kind of spatial intelligence we're trying to unlock. Um.

I'm still trying to figure out if there is a corresponding consumer moment, because the kind of applications we are talking about, um, tend to be first, go into the professionals, professional creators, professional Designers.

Professional developers. Professional researchers and engineers who use it for robotics and industrial design and all that.

So maybe we will not necessarily have a consumer moment, but maybe we will.

And you know, I would love to design my home in a much easier way and just change the color of the curtains, you know, with a click.

All right. That sounds pretty cool.

So in the last six months, Jonathan last mapped out to work on world models.

Google shipped Project Genie. Nvidia has its own world models.

Cosmos Nvidia is also one of your investors.

What do you have that they don't? And which competitors out there?

Where are you the most? Yeah.

So first of all, we started we're laps into 2024.

I still remember when when we were out talking about our models and spatial analogies. It was just a year after ten people were

analogies. It was just a year after ten people were still totally talking about lent. And so we we really had a head start and understanding that this is going to be the next frontier of.

I am very excited by that. So, uh, what do they have?

We don't know. First of all, I think we have an incredible team. We have the conviction.

incredible team. We have the conviction.

They don't have the godmother, that's for sure.

Um, but but the world is big, and I think this is just like lambs.

I think there will be many companies doing incredible working world models.

Just as 24 hours ago. Uh, I, we kind of got fed up that the word world model has been so, uh, confusing and being used so in so many different ways that we actually put out a, uh, a blog just explaining what a

functional taxonomy of world model is instead of mushing everything together.

And the way I see it is right now, there are three ways, uh, of calling world models when it comes to spatial intelligence.

One is what I call a renderer. When the model puts beautiful pixels on the screen, mostly like video generation model and the consumer is mostly human eyeballs. And while the model commits to beautiful

eyeballs. And while the model commits to beautiful pixels on the screen, it doesn't necessarily commit to, um, uh, physics and dynamics and geometric correctness. Uh, because that's just, um, consuming human life, all consuming, not necessarily for computation and other other tasks. Then another kind of, uh, world model is

other tasks. Then another kind of, uh, world model is what we call, um, a planner that is more for machines, more for robots, where it outputs, um, whatever the input is, the state of the world or the action, it outputs a correct action to take to the next step.

And you see that kind of world model a lot for robotics applications.

And you hear that in that context. The third kind, which I think is the linchpin of the three is a simulator, is that it actually is consumed by humans as well as machines, is trying to respect the structure, the physics and the dynamics of the world and really simulate the 3D and, uh, 4D um,

information of the world as well as well as the semantic information.

And the simulator could become a renderer, the simulator could become a planner, but this layer is, um, a huge critical path, in my opinion, to unlock spatial intelligence. And that's what, uh, we're a lab is, uh,

spatial intelligence. And that's what, uh, we're a lab is, uh, working out. All of this rolls up into robotics.

working out. All of this rolls up into robotics.

So I want to get your take on the field. And humanoids in particular.

Funding for humanoids hit $6 billion. But, you know, they still can't load my dishwasher as fast as I can. They still can't go get my Amazon packages. Well, world models, world labs closed

packages. Well, world models, world labs closed the gap between hype and reality. That's a loaded question, Emily.

First of all, that is my job. Yes, I get it.

First of all, robotics is going to be one of the most important revolutions in human industrialization. $6 billion is too small, right?

human industrialization. $6 billion is too small, right?

If you look at self-driving cars investment.

If you look at language models, investment, it took way more than $6 billion. I'm not saying we now.

billion. I'm not saying we now.

I think it will take time to invest, and it will also hopefully not take the hype, but take the thoughtfulness to invest in the right effort.

And for example, unlocking worlds modeling and spatial intelligence and simulation layer. All this is part of that, that, uh,

simulation layer. All this is part of that, that, uh, important uh effort. Um

well, are we going to close the gap? I do believe World Labs is working on one of the most critical technology in the speech of physical intelligence.

And obviously that's the that's the hope.

Mhm. You've been more measured on eye safety skeptical of the doom narrative but also of heavy handed regulation.

When you look across the industry. Where do you feel real safety work versus safety theater. Is anyone getting it right.

So in general I've been just more measured on every every rhetoric makes me very boring, to be honest. Um, I think there's just so much hype.

There's so much hype. Um,

obviously we need to build the right technology.

We need to guard rail. The technology.

Whether you use the word responsible, you use the word safety, you use the word um, uh, trustworthy, uh, building the right technology and product so that it can empower, enhance, augment humanity and not harm them.

Is the goal of any any work we do, whether it's I or not?

So where is it doing? Right?

I really hope every company, every, um, every product that's being built, that the people behind it are very mindful of that and are thinking about, you know, what data are we using? What system are we building, what evaluations are we conducting, what guardrails are we putting in?

How do we communicate with, uh, with our users and customers?

How do we work with regulators so that when the rubber hits the road that we are, um, you know, being responsible? I do believe a lot of this work is happening. It's not happening in the theater, to be

happening. It's not happening in the theater, to be honest, for example. So building pharmaceutical and health care, uh, industry, uh, companies are incorporating I, uh, literally I just came from the hospital to come to your to to your panel because I have a family

member, uh, about to get a surgery in the next one hour or so, and I, I was just in her hospital looking at where I is already being used and where I could be used, and it's already happening. Doctors are using I to to to help them with charting. Radiologists are using AI to assist them

with charting. Radiologists are using AI to assist them reading the the MRI and the CT scans. I do hope that we have more AI to help our nurses to help family members. I got this long radiology report last night, and the first thing I did is send it to her eye so that they can help me

to explain it. So all this is happening, um, safety measures are happening. Um, but there needs to be more in the right way in a in a scientifically grounded way.

Um, and that's the conversation that should be taking place instead of what you say, the theater. Well, thank you for coming, and I hope your person is okay. We all we all do.

Um, the backlash is real. It's being called the I hate wave.

I'm sure you've seen the video. Former Google CEO Eric Schmidt getting booed at a college graduation. You spent a lot of time with students.

What are they saying? And if they're scared, are the fears justified? Yeah.

justified? Yeah.

I do spend a lot of time with students. Uh, to be fair, my students are pretty privileged because they're Stanford students.

I think it's I think it's even more important.

And I tried to do it myself, that we spend time with our teachers, with our nurses, with our parents, grandparents. And that's actually something that I try to do. I try to talk to K-12 educators.

to do. I try to talk to K-12 educators.

I try to go to places and talk to people where they feel that they're not part of the conversation and even stuff.

Our students reflect some of this mixed sentiment.

There is society. There is a sense of hope.

There is also excitement. There is also confusion.

There is also, um. Simultaneously, a sense of dignity and agency. When I can help me do things that I

agency. When I can help me do things that I couldn't do before. And a sense of loss of dignity and agency. If I is, is it going to take my job?

agency. If I is, is it going to take my job?

So I think, uh, I think the sentiment is mixed.

And I really want to point out a lot of this sentiment happens when there is a vacuum of thoughtful public discourse. Right now, the oxygen, the air is all sucked into the polarized extreme of ism or total utopian.

And, well, hype takes all the oxygen in the room.

That void brews the kind of anxiety. And it's actually that void we really need to care about, because that's where real people live.

That's where real people are seeking answers, and I think it's, uh.

Um, as a scientist and an educator and an entrepreneur, uh, I'm on ground zero with students, with educators, with entrepreneurs. And I really do believe it's is one of

entrepreneurs. And I really do believe it's is one of my responsibility to not hype and try to speak with, with both science and humility and, and inspire people to to recognize this is a technology that can

truly empower a lot of our work and life, can truly help us, you know, have a better health care system, have better scientific discovery, have better, uh, um, better environment, better education if we do the right thing.

Mhm. We're both moms. We both have young teenagers. How do you think I will change learning in the college experience I must change learning.

I must change K to 16 learning. I think this is one of the biggest opportunity for humanity in the next decade to come.

Is that what is? The most precious resource of our entire world? Is human capital.

world? Is human capital.

And when we have gotten a technology that can answer standardized tests, whether it's, it's, uh, uh, Common Core kind of test all the way to International Olympiad of math exams. Well, I can do better than average human. It's not about humans are bad.

human. It's not about humans are bad.

It's about we need to change the education system.

We need to change how we evaluate. We need to change the way we empower teachers to teach, to to educate the next generation of students where they can use these tools, beam power and do things that we can never imagine.

So do you think our kids will still learn?

Absolutely. If we teach them right.

If the society prepares them right, they should not be.

All of the kids today should not be scared of.

They should feel the human agency to to lead I to use I in the right way and to use I to make the right, uh, to make the impact that they want to make for the world. Anthropic CEO Dario Ahmadi has suggested

world. Anthropic CEO Dario Ahmadi has suggested AGI is 2 to 3 years out. We'll get there by scaling the current paradigm. Demis Hassabis says we're at the

paradigm. Demis Hassabis says we're at the foothills of the singularity. You've said you don't even engage with the term AGI. Are they wrong, or is the disagreement about what we're calling the goal? I don't engage with the term AGI because

the founding fathers of artificial intelligence as a scientific field had this dream of thinking and, uh, doing machines.

Uh, that is a scientific quest. And that quest has been my lifelong career, and I am still on that quest. Now, I'm combining that scientific quest with making products that can make people's life better.

And that is the field called artificial intelligence.

And, uh, um, I'm okay.

People call it whatever they want. They can call it an apple.

That's fine. Um, I'm focusing on, um, building a technology that can truly that can truly make a difference in people's lives, a work. What's the one thing you'll have shipped this year that we'll be talking about next year?

I hope that we will be shipping a model over spatial intelligence that will inspire incredibly exciting product opportunities that people haven't seen before.

Loading...

Loading video analysis...