OpenAI DevDay 2024 | Fireside chat with Sam Altman and Kevin Weil
By OpenAI
Summary
## Key takeaways - **AGI is a blurry, not binary, transition**: Sam Altman believes AGI won't be a sudden switch but a gradual, blurry exponential curve. The exact milestone will be debated, similar to the Turing test, but the continuous progress will feel significant. [03:58], [04:30] - **Research drives OpenAI's core advancements**: OpenAI's commitment to research is stronger than ever, with breakthroughs like 'o1' demonstrating its critical role. The company thrives on discovering new paradigms rather than just optimizing existing ones. [05:19], [06:04] - **Product development is unpredictable due to rapid AI evolution**: Building products at OpenAI is uniquely challenging because the capabilities of AI models evolve every few months. This unpredictable pace requires adapting roadmaps based on scientific advancements rather than fixed plans. [07:45], [08:46] - **Iterative deployment is a key safety strategy**: OpenAI prioritizes iterative deployment and learning from real-world usage as a crucial safety measure. This approach allows them to confront emerging safety challenges and opportunities as they arise, rather than relying solely on theoretical planning. [10:45], [12:10] - **AI agents will fundamentally change how we work**: AI agents are poised to be a major shift, enabling multi-day interactions with environments and people. This will dramatically accelerate task completion, moving from monthly to hourly, and eventually to near-instantaneous results, reshaping human capabilities. [14:44], [16:32] - **Safety and alignment are critical hurdles for agents**: While AI capabilities for agents are advancing rapidly, safety and alignment remain the primary challenges. Ensuring robustness and reliability is paramount before granting agents the ability to interact extensively with computer systems. [18:35]
Topics Covered
- OpenAI's AGI Levels Framework: From Chatbots to Innovators
- Sam Altman on AGI: It's not binary, it's a blurry exponential
- Product Development in a Rapidly Evolving AI Landscape
- AI for Government: Automating Workflows Now, Not Waiting for AGI
- AI as a Universal Translator
Full Transcript
[ Cheers and applause ]
-Hello.
-How's it going? Good to see everybody.
-Thanks for coming.
-All right, I think everybody knows you.
For those who don't know me,
I'm Kevin Weil, Chief Product Officer at OpenAI.
I have the good fortune of getting to turn
the amazing research that our research teams do
into the products that you all use every day
and the APIs that you all build on every day.
I thought we'd start with some audience engagement here.
So on the count of three, I'm going to count to three,
and I want you all to say,
of all the things that you saw launched here today,
what's the first thing you're going to integrate?
The thing you're most excited to build on, all right?
You've got to do it, right?
One, two, three. All right. -Happy to hear that.
-I'll say personally,
I'm super excited about our distillation products.
I think that's going to be really, really interesting.
Yeah.
I'm also excited to see what you all do with Advanced Voice Mode,
with the real-time API,
and with vision fine-tuning in particular.
So okay.
So I've got some questions for Sam.
I've got my CEO here in the hot seat.
Let's see if I can't make a career-limiting move.
So we'll start with an easy one, Sam.
How close are we to AGI?
-You know, we used to, every time we finished a system,
we would say like, in what ways is this not an AGI?
And it used to be like very easy.
You kind of, like make a little robotic hand
that does a Rubik's Cube or a Dotabod,
and it's like, it does some things,
but definitely not an AGI.
It's obviously harder to say now,
and so we're trying to like stop talking about AGI
as this general thing, and we have this levels framework,
because the word AGI has become so overloaded.
So like real quickly, we use one for chatbots,
two for reasoners, three for agents, four for innovators,
and five for organizations, like roughly.
I think we clearly got to Level 2,
or we believe we clearly got to Level 2, with o1.
And it, you know,
can do really quite impressive cognitive tasks.
It's a very smart model.
It doesn't feel AGI-like in a few important ways,
but I think if you just do the one next step of making it,
you know, very agent-like, which is our Level 3,
and which I think we will be able to do
in the not-distant future, it will feel surprisingly capable.
Still probably not something that most of you
would call an AGI, though, maybe some of you would,
but it's going to feel like,
all right, this is like a significant thing.
And then the leap -- and I think we do that pretty quickly,
the leap from that to something
that can really increase the rate
of new scientific discovery,
which for me is like a very important part of having an AGI,
I feel a little bit less certain on that,
but not a long time.
Like I think all of this now is going to happen pretty quickly,
and if you think about what happened from last DevDay
to this one, in terms of model capabilities,
and you're like -- I mean, if you go look at like --
if you go from like o1 on a hard problem back to like 4 Turbo
that we launched 11 months ago, you'll be like,
"Wow, this is happening pretty fast."
And I think the next year will be very steep progress.
The next two years, I think, will be very steep progress.
Harder than that, harder to see with a lot of certainty,
but I would say like not very, and at this point,
the definitions really matter.
And the fact that the definitions matter this much
somehow means we're like getting kind of close.
-Yeah.
And, you know, there used to be this sense of AGI
where it was like it was a binary thing,
and you were going to go to sleep one day,
and there was no AGI, and wake up the next day,
and there was AGI.
I don't think that's exactly how we think about it anymore,
but how have your views on this evolved?
-Yeah.
You know, the one -- I agree with that.
I think we're like, you know, in this like kind of period
where it's going to feel very blurry for a while,
and, you know, is this AGI yet, or is this not AGI,
or kind of like at what point?
Yeah, it's just going to be this like smooth exponential,
and, you know,
probably most people looking back in history
won't agree like when that milestone was hit,
and will just realize it was like a silly thing.
Even the Turing test,
which I thought always was like this very clear milestone,
you know, there was this like fuzzy period.
It kind of like went whooshing by, and no one cared.
But I think the right framework is it's just
this one exponential.
That said,
if we can make an AI system
that is like materially better at all of Open AI
than at doing AI research,
that does feel to me like some sort of important discontinuity.
It's probably still wrong to think about it that way.
It probably still is the smooth exponential curve,
but that feels like a real milestone.
-Mm-hmm.
Is OpenAI still as committed to research
as it was in the early days?
Will research still drive the core of our advancements
in our product development?
-Yeah, I mean, I think more than ever.
There was like a time in our history
when the right thing to do was just scale up, compute,
and we saw that with conviction.
And we have a spirit of like we'll do whatever works,
you know?
Like we have this mission, we want to like build safe AGI,
figure out how to share the benefits.
If the answer is like rack up GPUs, we'll do that.
And right now, the answer is, again, really push on research.
And I think you see this with o1.
Like that is a giant research breakthrough
that we were attacking from many vectors
over a long period of time
that came together in this really powerful way.
We have many more giant research breakthroughs to come.
But the thing that I think is most special about Open AI
is that we really deeply care about research
and we understand how to -- I think,
it's easy to copy something you know works.
And, you know, I actually don't even mean that as a bad thing.
Like when people copy OpenAI, I'm like,
"Great, the world gets more AI.
That's wonderful."
But to do something new for the first time,
to like really do research in the true sense of it,
which is not like,
you know, let's barely get soda at this thing,
or like let's tweak this,
but like, let's go find the new paradigm and the one after that
and the one after that, that is what motivates us.
And I think the thing that is special about us as an org,
besides the fact that we, you know,
marry product and research and all this other stuff together,
is that we know how to run that kind of a culture
that can go push back the frontier.
And that's really hard.
But we love it.
And that's, you know,
I think we only have to do that a few more times
and then we get to AGI.
-Yeah, I'll say like the litmus test
for me coming from the outside, from, you know,
sort of normal tech companies,
of how critical research is to OpenAI
is that building product at OpenAI
is fundamentally different than any other place
that I have ever done it before.
You know, normally you have some sense of your tech stack.
You have some sense of what you have to work with,
what capabilities computers have,
and then you're trying to build the best product, right?
You're figuring out who your users are,
and what problems they have,
and how you can help solve those problems for them.
There is that at Open AI.
But also,
the state of like
what computers can do just evolves every two months,
three months,
and suddenly computers have a new capability
that they've never had in the history of the world
and we're trying to figure out how to build a great product
and expose that for developers and our APIs and so on.
And, you know, you can't totally tell what's coming.
It's coming through the mist a little bit at you
and gradually taking shape.
It's fundamentally different than any other company
I've ever worked at.
And it's, I think, because research is so --
-Is that the thing that has most surprised you?
-Yes.
Yeah, and it's interesting
how even internally we don't always have a sense --
you have like,
"Okay, I think this capability is coming,
but is it going to be, you know,
90% accurate or 99% accurate in the next model?"
Because the difference really changes
what kind of product you can build.
-Yeah.
-And you know that you're going to get to 99,
but you don't quite know when and figuring out
how you put a roadmap together in that world
is really interesting.
-Yeah, the degree
to which we have to just like follow the science
and let that determine what we go work on next
and what products we build and everything else is,
I think, hard to get across.
Like we have guesses about where things are going to go.
Sometimes we're right, often we're not.
But if something starts working
or if something doesn't work that you thought
was going to work,
our willingness to just say we're going to like
pivot everything and do what the science allows,
and you don't get to like pick what the science allows,
that's surprising.
-I was sitting with an enterprise customer
a couple weeks ago and they said,
"You know, one of the things we really want --
this is all working great, we love this,
one of the things we really want
is a notification 60 days in advance
when you're going to launch something."
And I was like, "I want that too."
All right, so I'm going through;
these are a bunch of questions from the audience,
by the way,
and we're going to try and also leave some time at the end
for people to ask audience questions.
So we've got some folks with mics when we get there,
so be thinking, but next thing.
So many in the alignment community
are genuinely concerned
that OpenAI is now only paying lip service to alignment.
Can you reassure us? -Yeah.
I think it's true we have a different take on alignment
than like maybe what people write about on whatever that
like internet form is.
But we really do care a lot about building safe systems.
We have an approach to do it
that has been informed by our experience so far.
And touching on that other question,
which is you don't get to pick where the science goes,
of we want to figure out how to make capable models
that get safer and safer over time.
And, you know, a couple of years ago,
we didn't think the whole Strawberry
or the o1 paradigm was going to work in the way
that it has worked.
And that brought a whole new set of safety challenges,
but also safety opportunities.
And rather than kind of like plan from a theoretical once,
you know, superintelligence gets here,
here's the like 17 principles,
we have an approach of figure out
where the capabilities are going
and then work to make that system safe.
And o1 is obviously our most capable model ever,
but it's also our most aligned model ever by a lot.
And as these models get better intelligence, better reasoning,
whatever you want to call it,
the things that we can do to align them and things
we can do to build, really, safe systems across the entire stack,
our tool set keeps increasing as well.
So we have to build models that are generally accepted as safe
and robust to be able to put them in the world.
And when we started OpenAI,
what the picture of alignment looked like
and what we thought the problems
that we needed to solve were going to be turned out
to be nothing like the problems that actually are in front of us
and that we have to solve now.
And also, when we made the first GPT-3,
if you asked me for the techniques
that would have worked for us to be able to now deploy
our current systems as generally accepted to be safe and robust,
they would not have been the ones that turned out to work.
So by this idea of iterative deployment,
which I think has been one of our
most important safety stances ever,
and sort of confronting reality as it's in front of us,
we've made a lot of progress and we expect to make more.
We keep finding new problems to solve,
but we also keep finding new techniques to solve them.
All of that said,
I think worrying about the sci-fi ways this all
goes wrong is also very important.
We have people thinking about that.
It's a little bit less clear kind of what to do there,
and sometimes you end up backtracking a lot,
but...I also don't think it's fair
to say we're only going to work on the thing in front of us.
We do have to think about where this is going,
and we do that too.
And I think if we keep approaching the problem
from both ends like that,
most of our thrust on the like okay, here's the next thing,
we want to deploy this, what needs to happen to get there,
but also like what happens if this curve just keeps going?
That's been an effective strategy for us.
-I'll say also, it's one of the places
where I really like our philosophy
of iterative deployment.
When I was at Twitter back, I don't know, 100 years ago now,
Ev said something that stuck with me,
which is,
"No matter how many smart people you have inside your walls,
there are way more smart people outside your walls."
And so when we try and get our --
you know, it'd be one thing if we just said we're going to try
and figure out everything that could possibly go wrong within
our walls, and it would be just us and the red teamers
that we can hire, and so on.
And we do that. We work really hard at that.
But also, launching iteratively and launching carefully
and learning from the ways that folks like you all use it,
what can go right, what can go wrong,
I think is a big way that we get these things right.
-I also think that as we head into this world of agents
off doing things in the world,
that is going to become really, really important.
As these systems get more complex
and are acting over longer horizons,
the pressure testing from the whole outside world,
I really believe, will be critical.
-Yeah. So we'll go actually, we'll go off of that
and maybe talk to us a bit more about
how you see agents fitting into OpenAI's long-term plans.
-What do you think?
-I think they're a huge part of the --
I mean, I think the exciting thing is this set of models,
o1 in particular, and all of its successors,
are going to be what makes this possible,
because you finally have the ability to reason,
to take hard problems,
break them into simpler problems, and act on them.
I mean, I think 2025 is going to be the year
that this really goes big.
-Yeah.
I mean, chat interfaces are great, and they will,
I think, have an important place in the world,
but the -- -- when you can like ask a model --
when you can ask like ChatGPT or some agent something,
and it's not just like you get a kind of quick response,
or even you get like 15 seconds of thinking,
and o1 gives you like a nice piece of code back,
or whatever,
but you can like really give something
a multi-turn interaction with environments,
or other people,
or whatever,
and like think for the equivalent of multiple days
of human effort,
and like a really smart, really capable human,
and like have stuff happen. We all say that, we're all like,
"Yeah, AI agents are the next thing.
This is coming, this is going to be another thing."
And we just talk about it like,
"Okay, you know, it's like the next model in evolution."
I would bet, and we don't really know until we get to use these,
that it's -- we'll of course get used to it quickly,
people get used to any new technology quickly,
but this will be like a very significant change to the way
the world works in a short period of time.
-Yeah, it's amazing.
Somebody was talking about getting used to
new capabilities and AI models,
and how quickly -- actually, I think it was about Waymo,
but they were talking
about how in the first 10 seconds of using Waymo,
they were like, "Oh my God, is this thing --"
Like there's a bike, let's watch out.
And then 10 minutes in, they were like,
"Oh, this is really cool."
And then 20 minutes in,
they were like checking their phone, bored.
You know, it's amazing
how much your sort of internal firmware updates
for this new stuff very quickly.
-Yeah, like I think
that people will ask an agent to do something for them
that would have taken them a month,
and it'll finish in an hour, and it'll be great,
and then they'll have like 10 of those at the same time,
and then they'll have like 1,000 of those at the same time,
and by 2030 or whatever, we'll look back and be like,
"Yeah, this is just like what a human is supposed
to be capable of, what a human used to like,
you know, grind at for years, or whatever,
or many humans used to grind at for years.
Like I just now like ask the computer to do it,
and it's like done in an hour."
And that's -- why is it not a minute?
Like this, you know. -Yeah.
And it's also,
it's one of the things
that makes having an amazing developer platform great,
too, because, you know, we'll experiment,
and we'll build some agentic things, of course.
And like, we've already got --
I think just like,
we're just pushing the boundaries
of what's possible today with,
you've got groups like Cognition doing amazing things,
and coding like Harvey, and Casetext,
you've got Speak doing cool things with language translation
like we're beginning to see this stuff work,
and I think it's really going to start working
as we continue to iterate these models.
-One of the very fun things for us,
about having this developer platform,
is just getting to like watch the unbelievable speed
and creativity of people that are building these experiences,
like developers very near and dear to our heart.
It was kind of like the first thing we launched.
And just many of us came from building on platforms.
But so much of the capability of these models,
and great experiences,
have been built, by people building on the platform,
we'll continue to try to offer like great first-party products,
but we know that will only ever be like a small narrow slice
of the apps, or agents,
or whatever people build in the world.
And seeing what has happened in the world
in the last 18, 24 months,
it's been like quite amazing to watch.
-Well, we'll keep going on the agent front here,
what do you see as the current hurdles
for computer controlling agents? -Safety and alignment.
Like if you are really going to give an agent the ability
to start clicking around your computer,
which you will,
you are going to have a very high bar for the robustness,
and the reliability, and the alignment of that system.
So technically speaking, I think that,
you know, we're getting like pretty close
on the capability side,
but the sort of agent safety, and trust framework,
that's going to, I think, be the long pole.
-And now I'll kind of ask a question,
it's almost the opposite of one of the questions from earlier,
do you think safety could act as a false positive
and actually limit public access to critical tools
that would enable a more egalitarian world?
-The honest answer is yes, that will happen sometimes.
Like we'll try to get the balance right,
but if we were full YOLO,
didn't care about like safety and alignment at all,
could we have launched o1 faster?
Yeah, we could have done that.
It would have come at a cost, there would have been things
that would have gone really wrong.
I'm very proud that we didn't.
The cost, you know,
I think would have been manageable with o1,
but by the time of o3, or whatever,
like maybe it would be pretty unacceptable.
And so starting on the conservative side like,
you know, another thing people are complaining like,
"Voice Mode, like it won't say this offensive thing,
and I really want it to,
and, you know, you're a horrible company, and let it offend me."
You know what? I actually mostly agree.
If you are trying to get o1 to say something offensive,
it should follow the instructions of its user
most of the time.
There's plenty of cases where it shouldn't.
But we have like a long history of
when we put a new technology into the world,
we start on the conservative side,
we try to give society time to adapt,
we try to understand where the real harms are
versus the sort of like, kind of more theoretical ones.
And that's like part of our approach to safety,
and not everyone likes it all the time.
I don't even like it all the time.
But if we're right,
that these systems are -- and we're going to get it wrong too.
Like sometimes we won't be conservative
enough in some area. But if we're right
that these systems are going to get as powerful
as we think they are, as quickly as we think they might,
then I think starting that way makes sense.
And, you know, we like relax over time.
-Totally agree.
What's the next big challenge for a startup
that's using AI as a core feature?
I'll say -- -You first.
--- I've got one, which is, I think one of the challenges,
and we face this too,
because we're also building products on top
of our own models, is trying to find the kind of the frontier.
You want to be building --
these AI models are evolving so rapidly,
and if you're building for something
that the AI model does well today, it'll work well today,
but it's going to feel old tomorrow.
And so you want to build for things
that the AI model can just barely not do.
You know, where maybe the early adopters will go for it,
and other people won't quite,
but that just means that when the next model comes out,
and as we continue to make improvements,
that use case that just barely didn't work,
you're going to be the first to do it
and it's going to be amazing.
But figuring out that boundary is really hard.
I think it's where the best products
are going to get built though.
-Totally agree with that.
The other thing I would add is,
I think it's like very tempting to think
that a technology makes a startup.
And that is almost never true.
No matter how cool, a new technology,
or a new sort of like tech tidal wave is,
it doesn't excuse you from having to do all
of the hard work of building a great company
that is going to have durability,
or like accumulated advantage over time.
And we hear from a lot of startups,
and at YC this was like a very common thing,
which is like, I can do this incredible thing,
I can make this incredible service,
and that seems like a complete answer,
but it doesn't excuse you from any
of like the normal laws of business.
You still have to like build a good business,
and a good strategic position.
And I think a mistake is that
in the unbelievable excitement and updraft of AI,
people are very tempted to forget that.
-This is an interesting one,
the mode of voice is like tapping directly
into the human API.
How do you ensure ethical use of such a powerful tool
with obvious abilities of manipulation?
-Yeah, you know, Voice Mode was a really interesting one for me.
It was like the first time that I felt
like I sort of had gotten like really tricked by an AI in that
when I was playing with the first beta of it,
I couldn't like -- I couldn't stop myself.
I mean, I kind of, like I still say like please to ChatGPT,
but in Voice Mode,
I like couldn't not kind of use the normal niceties.
I was like so convinced like it might be a real --
like, you know.
And obviously it's just like hacking some circuit
in my brain, but I really felt it with Voice Mode.
And I sort of still do.
I think this is a more --
this is an example of like a more general thing
that we're going to start facing,
which is as these systems become more and more capable,
and as we try to make them as natural as possible
to interact with,
they're going to like hit parts of our neural circuitry
that was like evolved to deal with other people.
And, you know,
there's like a bunch of clear lines about things
we don't want to do. Like we don't --
like there's a whole bunch
of like weird personality growth hacking
like I think vaguely socially manipulative stuff we could do.
But then there's these like other things
that are just not nearly as clear cut.
Like you want the Voice Mode to feel as natural as possible,
but then you get across the uncanny valley,
and it like, at least in me, triggers something.
And, you know, me saying like, please
and thank you to ChatGPT, no problem.
Probably a good thing to do. You never know.
But I think this like really points at the kinds
of safety and alignment issues
we have to start paying a lot of attention to.
-All right, back to brass tacks.
Sam, when is o1 going to support function tools?
-Do you know? -Before the end of the year.
There are three things that we really want to get in for --
all right, go for it.
We're going to record this, take this back to the research team,
show them how badly we need to do this.
But, I mean, there are a handful of things
that we really wanted to get into o1.
And we also -- you know, it's a balance of,
should we get this out to the world earlier
and begin learning from it,
learning from how you all use it?
Or should we launch a fully complete thing that is,
you know, in line with it,
that has all the abilities
that every other model that we've launched has?
I'm really excited to see things like system prompts,
and structured outputs,
and function calling make it into o1.
We will be there by the end of the year.
It really matters to us too.
[ Applause ]
-In addition to that,
just because I can't resist the opportunity to reinforce this,
like we will get all of those things in
and a whole bunch more things you all have asked for.
The model is going to get so much better so fast.
Lie we are so early.
This is like, you know, maybe it's the GPT-2 scale moment,
but like, we know how to get it to GPT-4.
And we have the fundamental stuff in place
now to get it to GPT-4.
And in addition to planning for us to build all of those things,
plan for the model to just get like rapidly smarter.
Like, you know, hope you all come back next year
and plan for it to feel like way more of a year of improvement
than from 4 Turbo to o1.
You don't have to clap, that's fine.
It'll be really smart.
-What feature or capability of a competitor
do you really admire?
-I think Google's notebook thing is super cool.
-Yeah. -What do they call it?
-NotebookLM. -NotebookLM yeah.
I was like, I woke up early this morning
and I was like looking at examples on Twitter
and I was just like, this is like -- this is just cool.
Like this is just a good, cool thing.
And like I think not enough of the world is like shipping new
and different things.
It's mostly like the same stuff.
But that I think is like --
that brought me a lot of joy this morning.
-Yeah. -That was very well done.
-One of the things I really appreciate about that product
is they -- I mean, there's the --
just the format itself is really interesting.
But they also nailed the podcast style voices.
-Yes.
-It's like they have really nice microphones.
They have these sort of sonorant voices.
Did you guys see,
somebody on Twitter was saying like the cool thing
to do is take your LinkedIn and put it, you know,
PDF it and give it to these -- give it to NotebookLM.
And you'll have two podcasters riffing back
and forth about how amazing you are
and all of your accomplishments over the years.
I'll say mine is,
I think Anthropic did a really good job on Projects.
It's kind of a different take on what we did with GPTs.
GPTs are a little bit more long-lived.
It's something you build and can use over and over again.
Projects are kind of the same idea,
but like more temporary, meant to be kind of stood up,
used for a while,
and then you can move on,
and that the different mental model makes a difference.
And I think they did a really nice job with that.
All right, we're getting close to audience questions.
So be thinking of what you want to ask.
So at OpenAI
how do you balance what you think users may need
versus what they actually need today?
-Also a better question for you.
-Yeah, well, I think it does get back to a bit
of what we were saying around trying to build
for what the model can just like not quite do, but almost do.
But it's a real balance too,
as, you know,
we support over 200 million people every week on ChatGPT.
You also can't say, "No, it's cool,
like deal with this bug for three months or this issue."
We've got something really cool coming.
You've got to solve for the needs of today.
And there are some really interesting product problems.
I mean, you think about,
I'm speaking to a group of people who know AI really well,
think of all the people in the world
who have never used any of these products.
And that is the vast majority of the world still.
You're basically giving them a text interface.
And on the other side of the text interface
is this like alien intelligence that's constantly evolving
that they've never seen or interacted with.
And you're trying to teach them all of the crazy things
that you can actually do with all the ways it can help,
can integrate into your life, can solve problems for you.
And people don't know what to do with it.
You know, like you come in and you're just like,
people type like, "Hi."
And it responds, you know, "Hey, great to see you."
Like, "How can I help you today?"
And you're like,
"Okay, I don't know what to say."
And then you end up, you kind of walk away and you're like,
"Well, I didn't see the magic in that."
And so it's a real challenge figuring out how you --
I mean, we all have 100 different ways
that we use ChatGPT and AI tools in general,
but teaching people what those can be
and then bringing them along as the model changes
month by month by month
and suddenly gains these capabilities way faster than we
as humans gain new capabilities,
it's a really interesting set of problems.
And I know it's one that you all solve in different ways as well.
-I have a question.
Who feels like that they've spent a lot of time with o1
and they would say like,
"I feel definitively smarter than that thing?"
Do you think you still will by o2?
You still. No one.
No one taking the bet of like being smarter than o2.
So one of the challenges that we face is like,
we know how to go do this thing
that we think will be like at least probably smarter
than all of us in like a broad array of tasks.
And yet we have to like still like fix the bugs and do the,
"Hey, how are you," problem.
And mostly what we believe in
is that if we keep pushing on model intelligence,
people will do incredible things with them.
You know, we want to build the smartest,
most helpful models in the world and
people then find all sorts of ways to use that
and build on top of that.
It has been definitely a evolution
for us to not just be entirely research focused
and then we do have to fix all those bugs
and make this super usable.
And I think we've gotten better at balancing that.
But still, as part of our culture,
I think we trust that if we can keep pushing on intelligence
and looks like we still have four of you to run down here,
people will build just incredible things
with that capability.
-Yeah, and I think it's a core part of the philosophy
and you do a good job pushing us to always,
well, basically incorporate the frontier of intelligence
into our products,
both in the APIs and into our first-party products.
Because it's easy to kind of stick to the thing you know,
the thing that works well,
but you're always pushing us to like get the frontier in,
even if it only kind of works
because it's going to work really well soon.
So I always find that a really helpful push.
You kind of answered the next one.
You do say please and thank you to the models.
I'm curious, how many people say please and thank you?
-Isn't that so interesting? -I do too.
I kind of can't -- I feel bad if I don't.
-Yeah, me too. -Okay.
Last question, and then we'll go into audience questions
for the last 10 or so minutes.
Do you plan to build models specifically made for
agentic use cases?
Things that are better at reasoning and tool calling.
-We plan to make models that are great at agentic use cases.
That'll be a key priority for us over the coming months.
Specifically, is a hard thing to ask for
because I think it's also just
how we keep making smarter models.
So yes, there's like some things like tool use and function
calling that we need to build in that'll help.
But mostly we just want to make
the best reasoning models in the world.
Those will also be the best agentic based
models in the world.
-Cool, let's go to audience questions.
I don't know who's got the mic.
All right, we got a mic.
-How extensively do you dog food
your own technology in your company?
And do you have any interesting examples
that may not be obvious? -Yeah.
I mean we put models up for internal use even
before they're done training.
Like we use checkpoints
and try to have people use them for whatever they can
and try to sort of like build new ways
to explore the capability of the model internally
and use them for our own development or research
or whatever else as much as we can.
We're still always surprised by the creativity
of the outside world and what people do.
But basically
the way we have figured out every step along our way
of what to push on next, what we can productize,
like what the models are really good at
is by internal dog fooding.
That's like our whole -- that's
how we feel our way through this.
We don't yet have like employees that are based off of o1,
but as we like move into the world of agents,
we will try that.
Like we will try having like,
you know, things that we deploy in our internal systems
that help you with stuff.
-There are things that get closer to that.
I mean, like customer service,
we have bots internally that do a ton
about answering external questions
and fielding internal people's questions on Slack and so on.
And our customer service team is probably, I don't know,
20% the size it might otherwise need to be because of it.
I know Matt Knight and our security team
has talked extensively about all the different ways
we use models internally to automate a bunch
of security things
and, you know, take what used to be a manual process
where you might not have the number of humans
to even like look at everything incoming and have models taking,
separating signal from noise and highlighting
to humans what they need to go look at, things like that.
So I think internally there are tons of examples and people
maybe underestimate the --
you all probably will not be surprised by this
but a lot of folks that I talk to are --
the extent to which it's not just using a model in a place,
it's actually about using like chains of models
that are good at doing different things
and connecting them all together to get one end-to-end process
that is very good at the thing you're doing,
even if the individual models have, you know,
flaws and make mistakes.
-Thank you.
I'm wondering
if you guys have any plans on sharing models for
like offline usage?
Because with this distillation thing,
it's really cool that we can generate our own models
but a lot of use cases,
you really want to kind of like have a version of it.
-You want to take that? -We're open to it.
It's not like high priority on the current roadmap.
If we had like more resources and bandwidth,
we would go do that.
I think there's a lot of reasons you want a local model
but it's not like a this year kind of thing.
-Okay. Hi.
My question is; there are many agencies in the government
about the local, state and national level
that could really greatly benefit from the tools
that you guys are developing
but have perhaps some hesitancy on deploying them because of,
you know, security concerns, data concerns, privacy concerns.
And I guess I'm curious to know if there are any sort of,
you know, planned partnerships with governments,
world governments, once whatever AGI is achieved,
because obviously if AGI can help solve problems
like world hunger, poverty, climate change,
government is going to have to get involved with that, right?
And I'm just curious to know if there is some,
you know, plan in the works when and if that time comes.
-Yeah, I think,
I actually think you don't want to wait until AGI,
you want to start now, right?
Because there's a learning process
and there's a lot of good
that we can do with our current models.
So we've announced a handful of partnerships
with government agencies.
Some states, I think Minnesota and some others, Pennsylvania,
also with organizations like USAID.
It's actually a huge priority of ours
to be able to help governments around the world get acclimated,
get benefit from the technology.
I mean, of all places, government feels like somewhere
where you can automate a bunch of workflows
and make things more efficient, reduce drudgery, so on.
So I think there's a huge amount of good we can do now.
And if we do that now,
it just accrues over the long run
as the models get better and we get closer to AGI.
-I have a pretty open-ended question.
What are your thoughts on open-source?
So whether that's open weights; just general discussion.
Where do you guys sit with open-source?
-I think open-source is awesome.
Again, if we had more bandwidth, we would do that too.
We've like gotten very close
to making a big open-source effort a few times.
And then, you know, the really hard part is prioritization.
And we have put other things ahead of it.
Part of it is like there's such good open-source models
in the world now that I think that segment --
the thing we always have been most tempted by
is like a really great on-device model,
and I think that segment is fairly well served.
I do hope we do something at some point,
but we want to find something that we feel
like if we don't do it,
the world will just be missing this thing
and not make like another thing
that's like a tiny bit better around benchmarks,
because we think there's like a lot of good stuff out there now.
But like spiritually, philosophically,
very glad it exists, would like figure out how to contribute.
-Hi, Sam. Hi, Kevin.
Thanks for inviting us to DevDay.
It's been awesome. All the live demos worked.
It's incredible.
Why can't Advanced Voice Mode sing?
And as a follow-up to this,
if it's a company, like legal issue in terms of copyright,
et cetera, is there daylight between
how you think about safety in terms of your own products
on your own platform
versus giving us developers kind of the --
I don't know, sign the right things off
so we can make our Advanced Voice Mode sing.
Could you address this?
-You know, the funny thing is Sam asked the same question.
"Why can't this thing sing? I want it to sing.
I've seen it sing before."
Actually, there are things obviously
that we can't have it sing, right?
You can't have it sing copyrighted songs.
We don't have the licenses, et cetera.
And then there are things that it can sing.
You could have it sing Happy Birthday
and that would be just fine, right?
And we want that too.
It's a matter of once you --
basically it's easier with finite time to say no
and then build it in,
but it's nuanced to get it right.
And we -- you know,
there are penalties to getting these kinds of things wrong.
So it's really just where we are now.
We really want the models to sing too.
-People were tired of waiting for us to ship Voice Mode,
which is like very fair.
We could have like waited longer
and kind of really got the classifications and filters
on, you know, copyrighted music versus not,
but we decided we would just ship it and we'll add more.
-But I think Sam has asked me like four or five times
why it can't sing. -It is a nice feature.
I mean, we still can't like offer something
where we're going to be in like really bad legal hot water,
developers or first-party or whatever.
So yes, we can maybe have some differences,
but we still have to be able to like comply with the law.
-Could you speak a little to the future of
where you see context windows going
and kind of the timeline for when --
how you see things balanced
between context window growth and RAG,
basically information retrieval?
-I think there's like two different takes on that matter.
One is like,
when is it going to get to like kind of normal long context,
like context length, 10 million or whatever,
like long enough that you just throw stuff in there
and it's fast enough, you're happy about it.
And I expect everybody is going to make
pretty fast progress there, and that'll just be a thing.
Long context has gotten weirdly less usage
than I would have expected so far.
But I think, you know, there's a bunch of reasons for that.
I don't want to go too much into it.
And then there's this other question of like,
when do we get to context length,
not like 10 million, but 10 trillion?
Like when do we get to the point
where you throw like every piece of data
you've ever seen in your entire life in there?
And, you know, like that's a whole different set of things.
That obviously takes some research breakthroughs.
But I assume that infinite context will happen
at some point, and some point is like less than a decade.
And that's going to be just a totally different way
that we use these models.
Even getting to the like 10 million tokens of very fast
and accurate context, which I expect measured in like months,
something like that, you know,
like people will use that in all sorts of ways
and it'll be great.
But, yeah, the very, very long context,
I think is going to happen and is really interesting.
-I think we maybe have time for one or two more.
-Don't worry, this is going to be your favorite question.
So with voice and all of the other changes
that users have experienced since you all
have launched your technology,
what do you see is the vision for the new engagement layer,
the form factor,
and how we actually engage with this technology
to make our lives so much better?
-I love that question.
It's one that we ask ourselves a lot, frankly.
There's this, and I think it's one
where developers can play a really big part here,
because there's this trade-off
between generality and specificity here.
I'll give you an example.
I was in Seoul and Tokyo a few weeks ago,
and I was in a number of conversations with folks
with whom I didn't have a common language,
and we didn't have a translator around.
Before, we would not have been able to have a conversation.
We would've just sort of smiled at each other and continued on.
I took out my phone.
I said, "ChatGPT, I want you to be a translator for me.
When I speak in English, I want you to speak in Korean.
When you hear Korean, I want you to repeat it in English."
And I was able to have a full business conversation,
and it was amazing.
And you think about the impact that that can have,
not just for business,
but think about travel and tourism and people's
willingness to go places
where they might not have a word of the language.
You can have these really amazing impacts.
But inside ChatGPT,
that was still a thing that I had to --
like ChatGPT is not optimized for that, right?
Like you want this sort of digital,
universal translator in your pocket
that just knows that what you want it to do is translate.
Not that hard to build, but I think there's,
we struggle with trying to build an application
that can do lots of things for lots of people,
and that keeps up, like we've been talking about a few times,
that keeps up with the pace of change
and with the capabilities, you know,
agentic capabilities and so on.
I think there's also a huge opportunity
for the creativity of an audience like this
to come in and like solve problems
that we're not thinking of,
that we don't have the expertise to do.
And ultimately, the world is a much better place
if we get more AI to more people,
and it's why we are so proud to serve all of you.
-The only thing I would add is,
if you just think about everything
that's going to come together,
at some point in not that many years in the future,
you'll walk up to a piece of glass,
you will say whatever you want. There will have like --
there will be incredible reasoning models,
agents connected to everything.
There'll be a video model streaming back to you,
like a custom interface just for this one request.
Whatever you need is just going to get rendered
in real time in video.
You'll be able to interact with it,
you'll be able to like click through the stream
or say different things, and it'll be off doing like, again,
the kinds of things that used to take like humans
years to figure out. And it'll just,
you know, dynamically render whatever you need,
and it'll be a completely different way
of using a computer,
and also getting things to happen in the world,
that is going to be quite wild. -Awesome, thank you.
That was a great question to end on.
I think we're at time.
Thank you so much for coming to DevDay with us.
-Thank you all.
[ Applause ]
-All right, we can't wait to see what you all build.
-Thank you.
Loading video analysis...