OpenAI DevDay 2024 | Fireside chat with Sam Altman and Kevin Weil

By OpenAI

Summary

## Key takeaways - **AGI is a blurry, not binary, transition**: Sam Altman believes AGI won't be a sudden switch but a gradual, blurry exponential curve. The exact milestone will be debated, similar to the Turing test, but the continuous progress will feel significant. [03:58], [04:30] - **Research drives OpenAI's core advancements**: OpenAI's commitment to research is stronger than ever, with breakthroughs like 'o1' demonstrating its critical role. The company thrives on discovering new paradigms rather than just optimizing existing ones. [05:19], [06:04] - **Product development is unpredictable due to rapid AI evolution**: Building products at OpenAI is uniquely challenging because the capabilities of AI models evolve every few months. This unpredictable pace requires adapting roadmaps based on scientific advancements rather than fixed plans. [07:45], [08:46] - **Iterative deployment is a key safety strategy**: OpenAI prioritizes iterative deployment and learning from real-world usage as a crucial safety measure. This approach allows them to confront emerging safety challenges and opportunities as they arise, rather than relying solely on theoretical planning. [10:45], [12:10] - **AI agents will fundamentally change how we work**: AI agents are poised to be a major shift, enabling multi-day interactions with environments and people. This will dramatically accelerate task completion, moving from monthly to hourly, and eventually to near-instantaneous results, reshaping human capabilities. [14:44], [16:32] - **Safety and alignment are critical hurdles for agents**: While AI capabilities for agents are advancing rapidly, safety and alignment remain the primary challenges. Ensuring robustness and reliability is paramount before granting agents the ability to interact extensively with computer systems. [18:35]

Topics Covered

OpenAI's AGI Levels Framework: From Chatbots to Innovators
Sam Altman on AGI: It's not binary, it's a blurry exponential
Product Development in a Rapidly Evolving AI Landscape
AI for Government: Automating Workflows Now, Not Waiting for AGI
AI as a Universal Translator

Full Transcript

[ Cheers and applause ]

-Hello.

-How's it going? Good to see everybody.

-Thanks for coming.

-All right, I think everybody knows you.

For those who don't know me,

I'm Kevin Weil, Chief Product Officer at OpenAI.

I have the good fortune of getting to turn

the amazing research that our research teams do

into the products that you all use every day

and the APIs that you all build on every day.

I thought we'd start with some audience engagement here.

So on the count of three, I'm going to count to three,

and I want you all to say,

of all the things that you saw launched here today,

what's the first thing you're going to integrate?

The thing you're most excited to build on, all right?

You've got to do it, right?

One, two, three. All right. -Happy to hear that.

-I'll say personally,

I'm super excited about our distillation products.

I think that's going to be really, really interesting.

Yeah.

I'm also excited to see what you all do with Advanced Voice Mode,

with the real-time API,

and with vision fine-tuning in particular.

So okay.

So I've got some questions for Sam.

I've got my CEO here in the hot seat.

Let's see if I can't make a career-limiting move.

So we'll start with an easy one, Sam.

How close are we to AGI?

-You know, we used to, every time we finished a system,

we would say like, in what ways is this not an AGI?

And it used to be like very easy.

You kind of, like make a little robotic hand

that does a Rubik's Cube or a Dotabod,

and it's like, it does some things,

but definitely not an AGI.

It's obviously harder to say now,

and so we're trying to like stop talking about AGI

as this general thing, and we have this levels framework,

because the word AGI has become so overloaded.

So like real quickly, we use one for chatbots,

two for reasoners, three for agents, four for innovators,

and five for organizations, like roughly.

I think we clearly got to Level 2,

or we believe we clearly got to Level 2, with o1.

And it, you know,

can do really quite impressive cognitive tasks.

It's a very smart model.

It doesn't feel AGI-like in a few important ways,

but I think if you just do the one next step of making it,

you know, very agent-like, which is our Level 3,

and which I think we will be able to do

in the not-distant future, it will feel surprisingly capable.

Still probably not something that most of you

would call an AGI, though, maybe some of you would,

but it's going to feel like,

all right, this is like a significant thing.

And then the leap -- and I think we do that pretty quickly,

the leap from that to something

that can really increase the rate

of new scientific discovery,

which for me is like a very important part of having an AGI,

I feel a little bit less certain on that,

but not a long time.

Like I think all of this now is going to happen pretty quickly,

and if you think about what happened from last DevDay

to this one, in terms of model capabilities,

and you're like -- I mean, if you go look at like --

if you go from like o1 on a hard problem back to like 4 Turbo

that we launched 11 months ago, you'll be like,

"Wow, this is happening pretty fast."

And I think the next year will be very steep progress.

The next two years, I think, will be very steep progress.

Harder than that, harder to see with a lot of certainty,

but I would say like not very, and at this point,

the definitions really matter.

And the fact that the definitions matter this much

somehow means we're like getting kind of close.

-Yeah.

And, you know, there used to be this sense of AGI

where it was like it was a binary thing,

and you were going to go to sleep one day,

and there was no AGI, and wake up the next day,

and there was AGI.

I don't think that's exactly how we think about it anymore,

but how have your views on this evolved?

-Yeah.

You know, the one -- I agree with that.

I think we're like, you know, in this like kind of period

where it's going to feel very blurry for a while,

and, you know, is this AGI yet, or is this not AGI,

or kind of like at what point?

Yeah, it's just going to be this like smooth exponential,

and, you know,

probably most people looking back in history

won't agree like when that milestone was hit,

and will just realize it was like a silly thing.

Even the Turing test,

which I thought always was like this very clear milestone,

you know, there was this like fuzzy period.

It kind of like went whooshing by, and no one cared.

But I think the right framework is it's just

this one exponential.

That said,

if we can make an AI system

that is like materially better at all of Open AI

than at doing AI research,

that does feel to me like some sort of important discontinuity.

It's probably still wrong to think about it that way.

It probably still is the smooth exponential curve,

but that feels like a real milestone.

-Mm-hmm.

Is OpenAI still as committed to research

as it was in the early days?

Will research still drive the core of our advancements

in our product development?

-Yeah, I mean, I think more than ever.

There was like a time in our history

when the right thing to do was just scale up, compute,

and we saw that with conviction.

And we have a spirit of like we'll do whatever works,

you know?

Like we have this mission, we want to like build safe AGI,

figure out how to share the benefits.

If the answer is like rack up GPUs, we'll do that.

And right now, the answer is, again, really push on research.

And I think you see this with o1.

Like that is a giant research breakthrough

that we were attacking from many vectors

over a long period of time

that came together in this really powerful way.

We have many more giant research breakthroughs to come.

But the thing that I think is most special about Open AI

is that we really deeply care about research

and we understand how to -- I think,

it's easy to copy something you know works.

And, you know, I actually don't even mean that as a bad thing.

Like when people copy OpenAI, I'm like,

"Great, the world gets more AI.

That's wonderful."

But to do something new for the first time,

to like really do research in the true sense of it,

which is not like,

you know, let's barely get soda at this thing,

or like let's tweak this,

but like, let's go find the new paradigm and the one after that

and the one after that, that is what motivates us.

And I think the thing that is special about us as an org,

besides the fact that we, you know,

marry product and research and all this other stuff together,

is that we know how to run that kind of a culture

that can go push back the frontier.

And that's really hard.

But we love it.

And that's, you know,

I think we only have to do that a few more times

and then we get to AGI.

-Yeah, I'll say like the litmus test

for me coming from the outside, from, you know,

sort of normal tech companies,

of how critical research is to OpenAI

is that building product at OpenAI

is fundamentally different than any other place

that I have ever done it before.

You know, normally you have some sense of your tech stack.

You have some sense of what you have to work with,

what capabilities computers have,

and then you're trying to build the best product, right?

You're figuring out who your users are,

and what problems they have,

and how you can help solve those problems for them.

There is that at Open AI.

But also,

the state of like

what computers can do just evolves every two months,

three months,

and suddenly computers have a new capability

that they've never had in the history of the world

and we're trying to figure out how to build a great product

and expose that for developers and our APIs and so on.

And, you know, you can't totally tell what's coming.

It's coming through the mist a little bit at you

and gradually taking shape.

It's fundamentally different than any other company

I've ever worked at.

And it's, I think, because research is so --

-Is that the thing that has most surprised you?

-Yes.

Yeah, and it's interesting

how even internally we don't always have a sense --

you have like,

"Okay, I think this capability is coming,

but is it going to be, you know,

90% accurate or 99% accurate in the next model?"

Because the difference really changes

what kind of product you can build.

-Yeah.

-And you know that you're going to get to 99,

but you don't quite know when and figuring out

how you put a roadmap together in that world

is really interesting.

-Yeah, the degree

to which we have to just like follow the science

and let that determine what we go work on next

and what products we build and everything else is,

I think, hard to get across.

Like we have guesses about where things are going to go.

Sometimes we're right, often we're not.

But if something starts working

or if something doesn't work that you thought

was going to work,

our willingness to just say we're going to like

pivot everything and do what the science allows,

and you don't get to like pick what the science allows,

that's surprising.

-I was sitting with an enterprise customer

a couple weeks ago and they said,

"You know, one of the things we really want --

this is all working great, we love this,

one of the things we really want

is a notification 60 days in advance

when you're going to launch something."

And I was like, "I want that too."

All right, so I'm going through;

these are a bunch of questions from the audience,

by the way,

and we're going to try and also leave some time at the end

for people to ask audience questions.

So we've got some folks with mics when we get there,

so be thinking, but next thing.

So many in the alignment community

are genuinely concerned

that OpenAI is now only paying lip service to alignment.

Can you reassure us? -Yeah.

I think it's true we have a different take on alignment

than like maybe what people write about on whatever that

like internet form is.

But we really do care a lot about building safe systems.

We have an approach to do it

that has been informed by our experience so far.

And touching on that other question,

which is you don't get to pick where the science goes,

of we want to figure out how to make capable models

that get safer and safer over time.

And, you know, a couple of years ago,

we didn't think the whole Strawberry

or the o1 paradigm was going to work in the way

that it has worked.

And that brought a whole new set of safety challenges,

but also safety opportunities.

And rather than kind of like plan from a theoretical once,

you know, superintelligence gets here,

here's the like 17 principles,

we have an approach of figure out

where the capabilities are going

and then work to make that system safe.

And o1 is obviously our most capable model ever,

but it's also our most aligned model ever by a lot.

And as these models get better intelligence, better reasoning,

whatever you want to call it,

the things that we can do to align them and things

we can do to build, really, safe systems across the entire stack,

our tool set keeps increasing as well.

So we have to build models that are generally accepted as safe

and robust to be able to put them in the world.

And when we started OpenAI,

what the picture of alignment looked like

and what we thought the problems

that we needed to solve were going to be turned out

to be nothing like the problems that actually are in front of us

and that we have to solve now.

And also, when we made the first GPT-3,

if you asked me for the techniques

that would have worked for us to be able to now deploy

our current systems as generally accepted to be safe and robust,

they would not have been the ones that turned out to work.

So by this idea of iterative deployment,

which I think has been one of our

most important safety stances ever,

and sort of confronting reality as it's in front of us,

we've made a lot of progress and we expect to make more.

We keep finding new problems to solve,

but we also keep finding new techniques to solve them.

All of that said,

I think worrying about the sci-fi ways this all

goes wrong is also very important.

We have people thinking about that.

It's a little bit less clear kind of what to do there,

and sometimes you end up backtracking a lot,

but...I also don't think it's fair

to say we're only going to work on the thing in front of us.

We do have to think about where this is going,

and we do that too.

And I think if we keep approaching the problem

from both ends like that,

most of our thrust on the like okay, here's the next thing,

we want to deploy this, what needs to happen to get there,

but also like what happens if this curve just keeps going?

That's been an effective strategy for us.

-I'll say also, it's one of the places

where I really like our philosophy

of iterative deployment.

When I was at Twitter back, I don't know, 100 years ago now,

Ev said something that stuck with me,

which is,

"No matter how many smart people you have inside your walls,

there are way more smart people outside your walls."

And so when we try and get our --

you know, it'd be one thing if we just said we're going to try

and figure out everything that could possibly go wrong within

our walls, and it would be just us and the red teamers

that we can hire, and so on.

And we do that. We work really hard at that.

But also, launching iteratively and launching carefully

and learning from the ways that folks like you all use it,

what can go right, what can go wrong,

I think is a big way that we get these things right.

-I also think that as we head into this world of agents

off doing things in the world,

that is going to become really, really important.

As these systems get more complex

and are acting over longer horizons,

the pressure testing from the whole outside world,

I really believe, will be critical.

-Yeah. So we'll go actually, we'll go off of that

and maybe talk to us a bit more about

how you see agents fitting into OpenAI's long-term plans.

-What do you think?

-I think they're a huge part of the --

I mean, I think the exciting thing is this set of models,

o1 in particular, and all of its successors,

are going to be what makes this possible,

because you finally have the ability to reason,

to take hard problems,

break them into simpler problems, and act on them.

I mean, I think 2025 is going to be the year

that this really goes big.

-Yeah.

I mean, chat interfaces are great, and they will,

I think, have an important place in the world,

but the -- -- when you can like ask a model --

when you can ask like ChatGPT or some agent something,

and it's not just like you get a kind of quick response,

or even you get like 15 seconds of thinking,

and o1 gives you like a nice piece of code back,

or whatever,

but you can like really give something

a multi-turn interaction with environments,

or other people,

or whatever,

and like think for the equivalent of multiple days

of human effort,

and like a really smart, really capable human,

and like have stuff happen. We all say that, we're all like,

"Yeah, AI agents are the next thing.

This is coming, this is going to be another thing."

And we just talk about it like,

"Okay, you know, it's like the next model in evolution."

I would bet, and we don't really know until we get to use these,

that it's -- we'll of course get used to it quickly,

people get used to any new technology quickly,

but this will be like a very significant change to the way

the world works in a short period of time.

-Yeah, it's amazing.

Somebody was talking about getting used to

new capabilities and AI models,

and how quickly -- actually, I think it was about Waymo,

but they were talking

about how in the first 10 seconds of using Waymo,

they were like, "Oh my God, is this thing --"

Like there's a bike, let's watch out.

And then 10 minutes in, they were like,

"Oh, this is really cool."

And then 20 minutes in,

they were like checking their phone, bored.

You know, it's amazing

how much your sort of internal firmware updates

for this new stuff very quickly.

-Yeah, like I think

that people will ask an agent to do something for them

that would have taken them a month,

and it'll finish in an hour, and it'll be great,

and then they'll have like 10 of those at the same time,

and then they'll have like 1,000 of those at the same time,

and by 2030 or whatever, we'll look back and be like,

"Yeah, this is just like what a human is supposed

to be capable of, what a human used to like,

you know, grind at for years, or whatever,

or many humans used to grind at for years.

Like I just now like ask the computer to do it,

and it's like done in an hour."

And that's -- why is it not a minute?

Like this, you know. -Yeah.

And it's also,

it's one of the things

that makes having an amazing developer platform great,

too, because, you know, we'll experiment,

and we'll build some agentic things, of course.

And like, we've already got --

I think just like,

we're just pushing the boundaries

of what's possible today with,

you've got groups like Cognition doing amazing things,

and coding like Harvey, and Casetext,

you've got Speak doing cool things with language translation

like we're beginning to see this stuff work,

and I think it's really going to start working

as we continue to iterate these models.

-One of the very fun things for us,

about having this developer platform,

is just getting to like watch the unbelievable speed

and creativity of people that are building these experiences,

like developers very near and dear to our heart.

It was kind of like the first thing we launched.

And just many of us came from building on platforms.

But so much of the capability of these models,

and great experiences,

have been built, by people building on the platform,

we'll continue to try to offer like great first-party products,

but we know that will only ever be like a small narrow slice

of the apps, or agents,

or whatever people build in the world.

And seeing what has happened in the world

in the last 18, 24 months,

it's been like quite amazing to watch.

-Well, we'll keep going on the agent front here,

what do you see as the current hurdles

for computer controlling agents? -Safety and alignment.

Like if you are really going to give an agent the ability

to start clicking around your computer,

which you will,

you are going to have a very high bar for the robustness,

and the reliability, and the alignment of that system.

So technically speaking, I think that,

you know, we're getting like pretty close

on the capability side,

but the sort of agent safety, and trust framework,

that's going to, I think, be the long pole.

-And now I'll kind of ask a question,

it's almost the opposite of one of the questions from earlier,

do you think safety could act as a false positive

and actually limit public access to critical tools

that would enable a more egalitarian world?

-The honest answer is yes, that will happen sometimes.

Like we'll try to get the balance right,

but if we were full YOLO,

didn't care about like safety and alignment at all,

could we have launched o1 faster?

Yeah, we could have done that.

It would have come at a cost, there would have been things

that would have gone really wrong.

I'm very proud that we didn't.

The cost, you know,

I think would have been manageable with o1,

but by the time of o3, or whatever,

like maybe it would be pretty unacceptable.

And so starting on the conservative side like,

you know, another thing people are complaining like,

"Voice Mode, like it won't say this offensive thing,

and I really want it to,

and, you know, you're a horrible company, and let it offend me."

You know what? I actually mostly agree.

If you are trying to get o1 to say something offensive,

it should follow the instructions of its user

most of the time.

There's plenty of cases where it shouldn't.

But we have like a long history of

when we put a new technology into the world,

we start on the conservative side,

we try to give society time to adapt,

we try to understand where the real harms are

versus the sort of like, kind of more theoretical ones.

And that's like part of our approach to safety,

and not everyone likes it all the time.

I don't even like it all the time.

But if we're right,

that these systems are -- and we're going to get it wrong too.

Like sometimes we won't be conservative

enough in some area. But if we're right

that these systems are going to get as powerful

as we think they are, as quickly as we think they might,

then I think starting that way makes sense.

And, you know, we like relax over time.

-Totally agree.

What's the next big challenge for a startup

that's using AI as a core feature?

I'll say -- -You first.

--- I've got one, which is, I think one of the challenges,

and we face this too,

because we're also building products on top

of our own models, is trying to find the kind of the frontier.

You want to be building --

these AI models are evolving so rapidly,

and if you're building for something

that the AI model does well today, it'll work well today,

but it's going to feel old tomorrow.

And so you want to build for things

that the AI model can just barely not do.

You know, where maybe the early adopters will go for it,

and other people won't quite,

but that just means that when the next model comes out,

and as we continue to make improvements,

that use case that just barely didn't work,

you're going to be the first to do it

and it's going to be amazing.

But figuring out that boundary is really hard.

I think it's where the best products

are going to get built though.

-Totally agree with that.

The other thing I would add is,

I think it's like very tempting to think

that a technology makes a startup.

And that is almost never true.

No matter how cool, a new technology,

or a new sort of like tech tidal wave is,

it doesn't excuse you from having to do all

of the hard work of building a great company

that is going to have durability,

or like accumulated advantage over time.

And we hear from a lot of startups,

and at YC this was like a very common thing,

which is like, I can do this incredible thing,

I can make this incredible service,

and that seems like a complete answer,

but it doesn't excuse you from any

of like the normal laws of business.

You still have to like build a good business,

and a good strategic position.

And I think a mistake is that

in the unbelievable excitement and updraft of AI,

people are very tempted to forget that.

-This is an interesting one,

the mode of voice is like tapping directly

into the human API.

How do you ensure ethical use of such a powerful tool

with obvious abilities of manipulation?

-Yeah, you know, Voice Mode was a really interesting one for me.

It was like the first time that I felt

like I sort of had gotten like really tricked by an AI in that

when I was playing with the first beta of it,

I couldn't like -- I couldn't stop myself.

I mean, I kind of, like I still say like please to ChatGPT,

but in Voice Mode,

I like couldn't not kind of use the normal niceties.

I was like so convinced like it might be a real --

like, you know.

And obviously it's just like hacking some circuit

in my brain, but I really felt it with Voice Mode.

And I sort of still do.

I think this is a more --

this is an example of like a more general thing

that we're going to start facing,

which is as these systems become more and more capable,

and as we try to make them as natural as possible

to interact with,

they're going to like hit parts of our neural circuitry

that was like evolved to deal with other people.

And, you know,

there's like a bunch of clear lines about things

we don't want to do. Like we don't --

like there's a whole bunch

of like weird personality growth hacking

like I think vaguely socially manipulative stuff we could do.

But then there's these like other things

that are just not nearly as clear cut.

Like you want the Voice Mode to feel as natural as possible,

but then you get across the uncanny valley,

and it like, at least in me, triggers something.

And, you know, me saying like, please

and thank you to ChatGPT, no problem.

Probably a good thing to do. You never know.

But I think this like really points at the kinds

of safety and alignment issues

we have to start paying a lot of attention to.

-All right, back to brass tacks.

Sam, when is o1 going to support function tools?

-Do you know? -Before the end of the year.

There are three things that we really want to get in for --

all right, go for it.

We're going to record this, take this back to the research team,

show them how badly we need to do this.

But, I mean, there are a handful of things

that we really wanted to get into o1.

And we also -- you know, it's a balance of,

should we get this out to the world earlier

and begin learning from it,

learning from how you all use it?

Or should we launch a fully complete thing that is,

you know, in line with it,

that has all the abilities

that every other model that we've launched has?

I'm really excited to see things like system prompts,

and structured outputs,

and function calling make it into o1.

We will be there by the end of the year.

It really matters to us too.

[ Applause ]

-In addition to that,

just because I can't resist the opportunity to reinforce this,

like we will get all of those things in

and a whole bunch more things you all have asked for.

The model is going to get so much better so fast.

Lie we are so early.

This is like, you know, maybe it's the GPT-2 scale moment,

but like, we know how to get it to GPT-4.

And we have the fundamental stuff in place

now to get it to GPT-4.

And in addition to planning for us to build all of those things,

plan for the model to just get like rapidly smarter.

Like, you know, hope you all come back next year

and plan for it to feel like way more of a year of improvement

than from 4 Turbo to o1.

You don't have to clap, that's fine.

It'll be really smart.

-What feature or capability of a competitor

do you really admire?

-I think Google's notebook thing is super cool.

-Yeah. -What do they call it?

-NotebookLM. -NotebookLM yeah.

I was like, I woke up early this morning

and I was like looking at examples on Twitter

and I was just like, this is like -- this is just cool.

Like this is just a good, cool thing.

And like I think not enough of the world is like shipping new

and different things.

It's mostly like the same stuff.

But that I think is like --

that brought me a lot of joy this morning.

-Yeah. -That was very well done.

-One of the things I really appreciate about that product

is they -- I mean, there's the --

just the format itself is really interesting.

But they also nailed the podcast style voices.

-Yes.

-It's like they have really nice microphones.

They have these sort of sonorant voices.

Did you guys see,

somebody on Twitter was saying like the cool thing

to do is take your LinkedIn and put it, you know,

PDF it and give it to these -- give it to NotebookLM.

And you'll have two podcasters riffing back

and forth about how amazing you are

and all of your accomplishments over the years.

I'll say mine is,

I think Anthropic did a really good job on Projects.

It's kind of a different take on what we did with GPTs.

GPTs are a little bit more long-lived.

It's something you build and can use over and over again.

Projects are kind of the same idea,

but like more temporary, meant to be kind of stood up,

used for a while,

and then you can move on,

and that the different mental model makes a difference.

And I think they did a really nice job with that.

All right, we're getting close to audience questions.

So be thinking of what you want to ask.

So at OpenAI

how do you balance what you think users may need

versus what they actually need today?

-Also a better question for you.

-Yeah, well, I think it does get back to a bit

of what we were saying around trying to build

for what the model can just like not quite do, but almost do.

But it's a real balance too,

as, you know,

we support over 200 million people every week on ChatGPT.

You also can't say, "No, it's cool,

like deal with this bug for three months or this issue."

We've got something really cool coming.

You've got to solve for the needs of today.

And there are some really interesting product problems.

I mean, you think about,

I'm speaking to a group of people who know AI really well,

think of all the people in the world

who have never used any of these products.

And that is the vast majority of the world still.

You're basically giving them a text interface.

And on the other side of the text interface

is this like alien intelligence that's constantly evolving

that they've never seen or interacted with.

And you're trying to teach them all of the crazy things

that you can actually do with all the ways it can help,

can integrate into your life, can solve problems for you.

And people don't know what to do with it.

You know, like you come in and you're just like,

people type like, "Hi."

And it responds, you know, "Hey, great to see you."

Like, "How can I help you today?"

And you're like,

"Okay, I don't know what to say."

And then you end up, you kind of walk away and you're like,

"Well, I didn't see the magic in that."

And so it's a real challenge figuring out how you --

I mean, we all have 100 different ways

that we use ChatGPT and AI tools in general,

but teaching people what those can be

and then bringing them along as the model changes

month by month by month

and suddenly gains these capabilities way faster than we

as humans gain new capabilities,

it's a really interesting set of problems.

And I know it's one that you all solve in different ways as well.

-I have a question.

Who feels like that they've spent a lot of time with o1

and they would say like,

"I feel definitively smarter than that thing?"

Do you think you still will by o2?

You still. No one.

No one taking the bet of like being smarter than o2.

So one of the challenges that we face is like,

we know how to go do this thing

that we think will be like at least probably smarter

than all of us in like a broad array of tasks.

And yet we have to like still like fix the bugs and do the,

"Hey, how are you," problem.

And mostly what we believe in

is that if we keep pushing on model intelligence,

people will do incredible things with them.

You know, we want to build the smartest,

most helpful models in the world and

people then find all sorts of ways to use that

and build on top of that.

It has been definitely a evolution

for us to not just be entirely research focused

and then we do have to fix all those bugs

and make this super usable.

And I think we've gotten better at balancing that.

But still, as part of our culture,

I think we trust that if we can keep pushing on intelligence

and looks like we still have four of you to run down here,

people will build just incredible things

with that capability.

-Yeah, and I think it's a core part of the philosophy

and you do a good job pushing us to always,

well, basically incorporate the frontier of intelligence

into our products,

both in the APIs and into our first-party products.

Because it's easy to kind of stick to the thing you know,

the thing that works well,

but you're always pushing us to like get the frontier in,

even if it only kind of works

because it's going to work really well soon.

So I always find that a really helpful push.

You kind of answered the next one.

You do say please and thank you to the models.

I'm curious, how many people say please and thank you?

-Isn't that so interesting? -I do too.

I kind of can't -- I feel bad if I don't.

-Yeah, me too. -Okay.

Last question, and then we'll go into audience questions

for the last 10 or so minutes.

Do you plan to build models specifically made for

agentic use cases?

Things that are better at reasoning and tool calling.

-We plan to make models that are great at agentic use cases.

That'll be a key priority for us over the coming months.

Specifically, is a hard thing to ask for

because I think it's also just

how we keep making smarter models.

So yes, there's like some things like tool use and function

calling that we need to build in that'll help.

But mostly we just want to make

the best reasoning models in the world.

Those will also be the best agentic based

models in the world.

-Cool, let's go to audience questions.

I don't know who's got the mic.

All right, we got a mic.

-How extensively do you dog food

your own technology in your company?

And do you have any interesting examples

that may not be obvious? -Yeah.

I mean we put models up for internal use even

before they're done training.

Like we use checkpoints

and try to have people use them for whatever they can

and try to sort of like build new ways

to explore the capability of the model internally

and use them for our own development or research

or whatever else as much as we can.

We're still always surprised by the creativity

of the outside world and what people do.

But basically

the way we have figured out every step along our way

of what to push on next, what we can productize,

like what the models are really good at

is by internal dog fooding.

That's like our whole -- that's

how we feel our way through this.

We don't yet have like employees that are based off of o1,

but as we like move into the world of agents,

we will try that.

Like we will try having like,

you know, things that we deploy in our internal systems

that help you with stuff.

-There are things that get closer to that.

I mean, like customer service,

we have bots internally that do a ton

about answering external questions

and fielding internal people's questions on Slack and so on.

And our customer service team is probably, I don't know,

20% the size it might otherwise need to be because of it.

I know Matt Knight and our security team

has talked extensively about all the different ways

we use models internally to automate a bunch

of security things

and, you know, take what used to be a manual process

where you might not have the number of humans

to even like look at everything incoming and have models taking,

separating signal from noise and highlighting

to humans what they need to go look at, things like that.

So I think internally there are tons of examples and people

maybe underestimate the --

you all probably will not be surprised by this

but a lot of folks that I talk to are --

the extent to which it's not just using a model in a place,

it's actually about using like chains of models

that are good at doing different things

and connecting them all together to get one end-to-end process

that is very good at the thing you're doing,

even if the individual models have, you know,

flaws and make mistakes.

-Thank you.

I'm wondering

if you guys have any plans on sharing models for

like offline usage?

Because with this distillation thing,

it's really cool that we can generate our own models

but a lot of use cases,

you really want to kind of like have a version of it.

-You want to take that? -We're open to it.

It's not like high priority on the current roadmap.

If we had like more resources and bandwidth,

we would go do that.

I think there's a lot of reasons you want a local model

but it's not like a this year kind of thing.

-Okay. Hi.

My question is; there are many agencies in the government

about the local, state and national level

that could really greatly benefit from the tools

that you guys are developing

but have perhaps some hesitancy on deploying them because of,

you know, security concerns, data concerns, privacy concerns.

And I guess I'm curious to know if there are any sort of,

you know, planned partnerships with governments,

world governments, once whatever AGI is achieved,

because obviously if AGI can help solve problems

like world hunger, poverty, climate change,

government is going to have to get involved with that, right?

And I'm just curious to know if there is some,

you know, plan in the works when and if that time comes.

-Yeah, I think,

I actually think you don't want to wait until AGI,

you want to start now, right?

Because there's a learning process

and there's a lot of good

that we can do with our current models.

So we've announced a handful of partnerships

with government agencies.

Some states, I think Minnesota and some others, Pennsylvania,

also with organizations like USAID.

It's actually a huge priority of ours

to be able to help governments around the world get acclimated,

get benefit from the technology.

I mean, of all places, government feels like somewhere

where you can automate a bunch of workflows

and make things more efficient, reduce drudgery, so on.

So I think there's a huge amount of good we can do now.

And if we do that now,

it just accrues over the long run

as the models get better and we get closer to AGI.

-I have a pretty open-ended question.

What are your thoughts on open-source?

So whether that's open weights; just general discussion.

Where do you guys sit with open-source?

-I think open-source is awesome.

Again, if we had more bandwidth, we would do that too.

We've like gotten very close

to making a big open-source effort a few times.

And then, you know, the really hard part is prioritization.

And we have put other things ahead of it.

Part of it is like there's such good open-source models

in the world now that I think that segment --

the thing we always have been most tempted by

is like a really great on-device model,

and I think that segment is fairly well served.

I do hope we do something at some point,

but we want to find something that we feel

like if we don't do it,

the world will just be missing this thing

and not make like another thing

that's like a tiny bit better around benchmarks,

because we think there's like a lot of good stuff out there now.

But like spiritually, philosophically,

very glad it exists, would like figure out how to contribute.

-Hi, Sam. Hi, Kevin.

Thanks for inviting us to DevDay.

It's been awesome. All the live demos worked.

It's incredible.

Why can't Advanced Voice Mode sing?

And as a follow-up to this,

if it's a company, like legal issue in terms of copyright,

et cetera, is there daylight between

how you think about safety in terms of your own products

on your own platform

versus giving us developers kind of the --

I don't know, sign the right things off

so we can make our Advanced Voice Mode sing.

Could you address this?

-You know, the funny thing is Sam asked the same question.

"Why can't this thing sing? I want it to sing.

I've seen it sing before."

Actually, there are things obviously

that we can't have it sing, right?

You can't have it sing copyrighted songs.

We don't have the licenses, et cetera.

And then there are things that it can sing.

You could have it sing Happy Birthday

and that would be just fine, right?

And we want that too.

It's a matter of once you --

basically it's easier with finite time to say no

and then build it in,

but it's nuanced to get it right.

And we -- you know,

there are penalties to getting these kinds of things wrong.

So it's really just where we are now.

We really want the models to sing too.

-People were tired of waiting for us to ship Voice Mode,

which is like very fair.

We could have like waited longer

and kind of really got the classifications and filters

on, you know, copyrighted music versus not,

but we decided we would just ship it and we'll add more.

-But I think Sam has asked me like four or five times

why it can't sing. -It is a nice feature.

I mean, we still can't like offer something

where we're going to be in like really bad legal hot water,

developers or first-party or whatever.

So yes, we can maybe have some differences,

but we still have to be able to like comply with the law.

-Could you speak a little to the future of

where you see context windows going

and kind of the timeline for when --

how you see things balanced

between context window growth and RAG,

basically information retrieval?

-I think there's like two different takes on that matter.

One is like,

when is it going to get to like kind of normal long context,

like context length, 10 million or whatever,

like long enough that you just throw stuff in there

and it's fast enough, you're happy about it.

And I expect everybody is going to make

pretty fast progress there, and that'll just be a thing.

Long context has gotten weirdly less usage

than I would have expected so far.

But I think, you know, there's a bunch of reasons for that.

I don't want to go too much into it.

And then there's this other question of like,

when do we get to context length,

not like 10 million, but 10 trillion?

Like when do we get to the point

where you throw like every piece of data

you've ever seen in your entire life in there?

And, you know, like that's a whole different set of things.

That obviously takes some research breakthroughs.

But I assume that infinite context will happen

at some point, and some point is like less than a decade.

And that's going to be just a totally different way

that we use these models.

Even getting to the like 10 million tokens of very fast

and accurate context, which I expect measured in like months,

something like that, you know,

like people will use that in all sorts of ways

and it'll be great.

But, yeah, the very, very long context,

I think is going to happen and is really interesting.

-I think we maybe have time for one or two more.

-Don't worry, this is going to be your favorite question.

So with voice and all of the other changes

that users have experienced since you all

have launched your technology,

what do you see is the vision for the new engagement layer,

the form factor,

and how we actually engage with this technology

to make our lives so much better?

-I love that question.

It's one that we ask ourselves a lot, frankly.

There's this, and I think it's one

where developers can play a really big part here,

because there's this trade-off

between generality and specificity here.

I'll give you an example.

I was in Seoul and Tokyo a few weeks ago,

and I was in a number of conversations with folks

with whom I didn't have a common language,

and we didn't have a translator around.

Before, we would not have been able to have a conversation.

We would've just sort of smiled at each other and continued on.

I took out my phone.

I said, "ChatGPT, I want you to be a translator for me.

When I speak in English, I want you to speak in Korean.

When you hear Korean, I want you to repeat it in English."

And I was able to have a full business conversation,

and it was amazing.

And you think about the impact that that can have,

not just for business,

but think about travel and tourism and people's

willingness to go places

where they might not have a word of the language.

You can have these really amazing impacts.

But inside ChatGPT,

that was still a thing that I had to --

like ChatGPT is not optimized for that, right?

Like you want this sort of digital,

universal translator in your pocket

that just knows that what you want it to do is translate.

Not that hard to build, but I think there's,

we struggle with trying to build an application

that can do lots of things for lots of people,

and that keeps up, like we've been talking about a few times,

that keeps up with the pace of change

and with the capabilities, you know,

agentic capabilities and so on.

I think there's also a huge opportunity

for the creativity of an audience like this

to come in and like solve problems

that we're not thinking of,

that we don't have the expertise to do.

And ultimately, the world is a much better place

if we get more AI to more people,

and it's why we are so proud to serve all of you.

-The only thing I would add is,

if you just think about everything

that's going to come together,

at some point in not that many years in the future,

you'll walk up to a piece of glass,

you will say whatever you want. There will have like --

there will be incredible reasoning models,

agents connected to everything.

There'll be a video model streaming back to you,

like a custom interface just for this one request.

Whatever you need is just going to get rendered

in real time in video.

You'll be able to interact with it,

you'll be able to like click through the stream

or say different things, and it'll be off doing like, again,

the kinds of things that used to take like humans

years to figure out. And it'll just,

you know, dynamically render whatever you need,

and it'll be a completely different way

of using a computer,

and also getting things to happen in the world,

that is going to be quite wild. -Awesome, thank you.

That was a great question to end on.

I think we're at time.

Thank you so much for coming to DevDay with us.

-Thank you all.

[ Applause ]

-All right, we can't wait to see what you all build.

-Thank you.

Loading...

Loading video analysis...