Yao Shunyu: Let Me Go a Little Crazy! Training Models at Anthropic & Gemini, Heroism Is Over
By Zhang Xiaojun Podcast
Summary
Topics Covered
- Traits That Actually Matter in AI
Full Transcript
English subtitles were generated by AI and are for reference only.
Hello everyone, I'm Xiaojun Today our guest is Yao Shunyu, a researcher at Google DeepMind There are two famous Yao Shunyus in Silicon Valley One previously worked at OpenAI, then jumped ship to Tencent to become their Chief AI Scientist He's been on our show before Today I've invited the other Yao Shunyu He was previously at Anthropic Now he's at Google DeepMind We'll start by talking about the recent series of massive model changes
So next is my interview with Shunyu Anthropic as a company It's able to implement this kind of relatively top-down mechanism is something quite unique But is this difficult for other model companies?
Very difficult. For example, OpenAI can't do it And Gemini also finds it difficult Big companies and startups Their strategies are fundamentally different Because for startups, what's important is making bets I have to bet on something I think everyone right now is basically everyone is a surfer Fundamentally it's a wave Not the surfer But anyway, it just feels like this AI thing doesn't really require much brains Doesn't require much brains Really doesn't require much brains
Then what does it require?
I think in this industry, the most important trait is being reliable Being detail-oriented And taking responsibility for what you do These are the most important traits Aren't there two Yao Shunyus in Silicon Valley?
Why don't you first introduce yourself to everyone And then explain to everyone The difference between the two Yao Shunyus Ah Sure yeah So my name is Yao Shunyu And Obviously there's also a friend with an almost identical name (Yao Shunyu, Chief AI Scientist at Tencent, former OpenAI researcher) And Our main career paths also have some overlap (overlap) So it might look very difficult to tell us apart Yeah and I used to study physics
I did my undergrad at Tsinghua I worked on condensed matter theory back then Then later went to Stanford to do theoretical high-energy physics And quantum information and black hole-related areas After leaving Stanford went to Berkeley briefly stayed for two weeks as a postdoc (postdoctoral researcher) Then quit and went to Anthropic Stayed at Anthropic for a year
Around late September to early October last year joined Gemini Yeah and If everyone insists on telling us apart I think the biggest difference is That Shunyu, he has always been doing CS from the start Computer science-related stuff While I actually In a sense, came to this halfway Yeah, I mainly did theoretical physics before Yeah Are you two good friends?
You guys seemed to have known each other since college And you were in the same year, right? (Yes)
What kind of person is he?
What kind of person are you?
Evaluate him Evaluate yourself too (hahaha) Yeah yeah, we knew each other since undergrad Because we were in the same year in undergrad At Tsinghua But he Of course he studied computer science from the start So he was in that Yao Class the computer science experimental class And I studied physics So I was in the Ji Class Yeah, and later he went to Princeton I went to Stanford This might also be another somewhat puzzling point
Which is, it seems like in the general world people think Stanford is where computer science people should go And think Princeton is where physics people should go But we happened to do the opposite Haha So that might also have caused some confusion And we really are quite different I think he's a much more interesting person than I am I think I've also learned a lot from him In the past as well
I've been able to learn things that are quite different from my own strengths For example, he probably spends a lot of time thinking Like in AI He spends a lot of time thinking about Human-AI interaction And also some product-related things And I think, for me, He's a very different kind of friend And I've also learned a lot from him When you were in Silicon Valley How often did you meet?
Do you still call each other frequently now?
How frequently?
We did meet quite frequently when we were in Silicon Valley Maybe every few weeks But it seems like we mainly met just to hang out Hahaha Doing what?
Well It was really just purely for fun Like going out for a walk And chatting about random stuff And sometimes having a meal Playing cards or something like that Right haha right And after he went back We actually still Often call each other What did you talk about in the most recent call?
I think it was one or two weeks ago Ah How did you know?
Uh, probably just Every few months Then we catch up a bit Share recent updates, yeah Has he tried multiple times to get you to join him?
Uh Hmm Ha Maybe he does, I guess But but I don't think it matters It doesn't matter, hahaha Why don't you go?
I think for myself I Haven't figured it out yet Yeah, I think it's mostly my own reasons And then I didn't join any Chinese companies either And I think the main reason is Around September or August-September last year I think When I left Left Anthropic
And when deciding where to go after leaving, my biggest motivation was I wanted to learn something different Yeah, for me I probably didn't consider No no More seriously consider Being able to lead a project Or lead a project or something I was more At that time More focused on prioritizing learning something new So that's why I chose to go to Gemini, right I noticed You two are always being compared and discussed together
Is it more of a bother or more enjoyable for you?
I don't really feel anything about it And Because I'm not really someone who pays attention to social media So I really don't feel anything about it Yeah Because Shunyu He said last year AI has entered the second half Entered the second half This became a very famous viewpoint What do you think of today's AI What stage is it at?
Can you give it a definition?
Yeah, for me I might Not see so clearly what the first half means What the second half means Or rather, this definition Has never been particularly clear to me For me AI has indeed entered a stage Where I think Everyone has started to worry less about one thing Whether AI can do it And more about whether the problem itself is well-defined Yeah, I think this is a huge difference.
For example, I think a year ago or maybe early last year at that time I was at Anthropic and what everyone was worried about was like, 'Hey,' OpenAI's reasoning is so strong do we have a chance to catch up?
and how likely are we to surpass them?
Everyone was still very worried about this.
I think now, at least among at least among Gemini, OpenAI, and Anthropic, these three I don't think any of them is really worried about not catching up.
Mm-hmm.
And I think what might be harder for everyone now is figuring out what to actually do.
This is something that I think is is a bet but also I think it's also something that requires a lot of human insight. Yeah.
So that also means model capabilities have been leveled out.
Right? They've become homogenized commoditized.
So there's not a huge difference between the models.
In terms of good versus bad, there's not a huge difference.
But they need to differentiate.
I think from the actual user experience, you can feel the differences between these three companies' models.
But the hard part is in the past, you could see this difference on paper too.
What do you mean by 'on paper'?
'On paper' means, like, publicly available there are many kinds of benchmarks these standardized measurement frameworks.
And for example, people used to look at SWE-bench Yeah, yeah, yeah, you could look at SWE-bench.
And for math, back then people would compare things like simpler ones AIME and harder ones like IMO. Back then it felt like you could tell just from the numbers.
'Hey, this model seems stronger at reasoning' 'that model seems stronger at coding' 'that model is stronger at this.'
Now, on paper, everyone is actually pretty close.
And when you look at the numbers on paper like looking at SWE-bench you'll find, 'Hey,' it seems like the best is only maybe one percentage point or two percentage points better than the not-so-good ones, but actually everyone is around 80%.
A slightly higher number around there or a slightly lower one is mostly just noise It's mainly just noise rather than signal.
Yeah. But on the other hand, in actual usage, people can still experience the differences.
I think, mm-hmm.
From what I personally know, Claude is still the more general-purpose in terms of this tool-using agent, the best-performing one.
And in pure coding maybe Codex has caught up a bit recently narrowing the gap a little.
And Gemini might be better at pure reasoning and in some more everyday usage scenarios it might still be better for now.
And then in in coding and agents, it's still in a state of catching up.
Mm-hmm. These capabilities—
are they deliberately choosing which direction to prioritize or is it simply a matter of good versus bad?
Is it a capability issue or a prioritization issue?
I think there is actually an element of prioritization involved.
Especially in the past, it was mainly about prioritization.
When everyone could see the differences on paper, prioritization was definitely the dominant factor.
Because maybe like Claude has always valued this tool-use capability more.
Mm-hmm.
And including coding.
So maybe OpenAI also placed a lot of emphasis on reasoning for a while.
Yeah, and of course now they're starting to focus on coding too.
So back then, prioritization definitely accounted for most of it.
Because if you're more willing to prioritize something it means you can spend more effort building the right infrastructure.
the right infrastructure building the right data and especially data it's something that in a sense takes a lot of time and effort right so back then, it was definitely driven by willingness but at this point I think both factors are at play because well, on paper, everyone looks pretty similar and even if you do some more internal testing
the numbers become not that different and then the harder thing becomes how you define your problem define the behavior you want mm-hmm and when this thing isn't defined very clearly a lot of the model differences actually come from things that you wouldn't even imagine right by 'things you wouldn't imagine,' I mean
I mean, things you wouldn't imagine if you ask me now it's hard for me to give you a very clear answer maybe after some time, looking back I'll be able to give a clear answer but I can give an example of something you wouldn't imagine like, if we go back maybe one, two, or even three years back then, if you went online
to collect pre-training data you'd see models learning to write code of course, there wasn't this agentic way of writing code back then it was just writing a piece of code (mm-hmm) and you'd find that models wrote code very well but back then, people didn't know why but the unexpected reason behind this might be if you just randomly collect from the web without any data filtering naturally
the quality of code data would be a bit higher than others because if you look at web pages you'll find GitHub's quality is significantly higher than other normal web pages before we get into today's topic I'd like to talk about some recent news about our models you see everyone's been talking about OpenClaw recently mm-hmm as a frontline researcher what do you think of this new product form what discussions are happening around you what's interesting is
I feel like the discussion outside the industry seems more intense than inside the industry oh, no one inside the industry is talking about it?
people inside are talking about it but I think for industry insiders it's not really, um a particularly surprising thing oh, what do you mean like, maybe inside the company some people have already done similar experiments or demos like this it's just that it wasn't packaged as a product and seriously marketed polished and launched right and of course, the reality is
if you look at OpenClaw the earliest version of the code on GitHub actually, that code was in a sense not particularly clean but I think what's important is it showed everyone this possibility mm-hmm, and after showing this possibility the OpenClaw author himself joined OpenAI and then then probably these model labs
or some larger startups will catch up quickly and polish this into a truly usable product mm-hmm (right), so I understand actually, before OpenClaw was released people at Google were already working on this it just hadn't been released yet because big companies have longer processes right my my my at least personally that's the impression I've gotten What we're seeing is exactly that. Right.
So behind this product form, similar to OpenClaw, what does that inherently tell us?
At this point in early this year, I think, actually, technically speaking, it doesn't really prove much.
I mean, this OpenClaw product, of course it relies on many things the model can do, but those capabilities weren't actually only ready by early this year.
I think maybe last year, like when Opus released 4.5 (Claude series), and then, and then...
of course back then, Opus was actually ahead of OpenAI and Gemini 3 in terms of tool use capabilities.
So I think at that point, doing this thing, it was already something you could demonstrate.
And actually, it didn't blow up immediately upon release.
It only went viral some time after the launch.
Hmm.
So, for me personally, technically it's not really not really something so surprising.
It's a natural overflow of model capabilities.
Right, right, right, I'd say so.
But I think the surprise for everyone might be that perhaps nobody had realized this before.
It made everyone realize this could actually be done.
Realize what?
Realized that you can, like, let the model do very...
I mean, you can control many different models and do many different things, and then aggregate all of that, and after aggregating, do this kind of very, very, very long-horizon task.
This kind of work.
I think maybe previously, people hadn't widely reached a consensus on this.
This thing showed everyone this kind of possibility.
You see, what went viral early last year was Manus, and what went viral early this year is OpenClaw.
So from Manus to OpenClaw, what changed?
Is it a change in model capabilities, or a change in the product?
This is also something I've never really understood.
Hmm.
Like, What is the qualitative difference between Manus and OpenClaw?
It's something I actually haven't quite figured out myself.
To be honest, haha.
OK.
Hmm like or in other words, maybe OpenClaw went viral, but if you were to ask me retroactively, why Manus couldn't do this, I don't understand why Manus couldn't do it.
Maybe they just didn't get it right.
But you see, whether it's Manus or OpenClaw, they both chose to sell.
Manus was sold to Meta (Note: This acquisition has since been revoked; our program was recorded before the revocation).
OpenClaw was sold to OpenAI.
What does this phenomenon tell us?
Why did they both sell?
I think, hmm, my own feeling is that for something to survive long-term, it still needs to have some moats.
The moat is the model.
I think at least for now, many moats are on the model side.
But whether product-side moats will emerge in the future, I think that's hard to say.
Because everyone...
This is all an age-old topic in the market.
Many people talk about this.
Things like data flywheels and such.
For now, I don't think there's any scenario that has truly formed a data flywheel.
Even purely AI-native application scenarios.
I think currently, besides agentic coding, other than writing code, there's no scenario that is truly AI-native.
became hugely successful because in a sense chatbots are actually an extension of search A chatbot is an extension of search Right, that's why it's not independent of search It's because because think about it the most common way people interact with chatbots is I have a question and they ask the chatbot and that's essentially what search has always done But what it offers something far better than search is
it becomes very interactive (交互的) It has interactivity You can ask follow-up questions and it can even help you summarize some of the information you get through it helping you distill it into a condensed answer to your question Right, this is something search could never give you before Mm-hmm (right) But of course it's not exactly the same need But in terms of demand from a broad demand perspective it's fairly similar to the demand that existed before
Manus and OpenClaw I think they're the most famous wrappers right now But wrappers ended up being sold to model companies (Note: Meta's acquisition of Manus was later reversed; our show was recorded before the reversal) Doesn't that show that wrappers still can't escape the grip of model companies The escape velocity isn't enough It's not fast enough, is it I think I think for wrappers to survive in the current environment there are two approaches I can roughly imagine
One approach is what you just said Escape fast enough That is, my growth is fast enough that by the time model companies catch on I've already captured significant user mindshare And when model companies catch up to your product by that time I've already evolved my own model I think Cursor is trying to take this path (mm-hmm)
So Cursor, in this AI-native scenario is pretty much the fastest-growing startup I can think of Even a company like that is feeling a strong sense of crisis right now How strong is that sense of crisis Anyway, my feeling is that for Cursor its relationship with Anthropic right now has entered a very delicate phase It's like They used to be close, seamless partners Anthropic provided the model
Cursor provided the product Later Anthropic developed Claude Code itself Claude Code has become very successful And then Cursor is now trying to build its own model So Cursor is working hard training its Composer So I don't even think we need to talk about the future It's already happening right now They're already in a fairly competitive relationship (mm-hmm) If they lose in this competition I think it would be quite problematic Because when it comes to coding
when it comes to coding at its core it's essentially a professional need serving professional users It's a productivity tool A common scenario with productivity tools is winner takes all I think this applies whether to Cursor or to Anthropic or for any company doing coding It's probably something they're all quite worried about Mm-hmm (right) So that's what I was saying That's one path
(It has to be fast) That is, you grow fast enough You grow like crazy before anyone even thinks about acquiring you just grow wildly By the time they want to acquire you you're big enough Another way is for the market to be small enough so small that model companies can't even be bothered I think Midjourney is exactly that example That's it The market is so small that perhaps
even though you could say Gemini could make an effort to replicate what Midjourney does It might take some effort some money some data to pull it off but it's small enough To the point where Gemini probably wouldn't want to spend much time on that It's beneath them Right haha I think that might also be a way to survive Yeah So even Cursor hasn't escaped the model's grasp today
Has anyone successfully escaped?
For the big ones, I haven't seen any so far For smaller ones, maybe Midjourney is an example Of course there must be other examples I just haven't seen them yet Right, smaller ones I think there will be There will be examples Does Lovart count?
I think they have a shot They have a shot Anyway, you can't do the general-purpose thing I think this is something the founder has to decide Whether you want to bet on something big with a one-in-ten-thousand chance of survival and swing for the fences Or go with a one-percent chance of survival and lock down something small first If it were you What would you choose?
Hahahaha If it were me Deep down I'd definitely want to swing for the fences But honestly I genuinely think You can't get there overnight So if it were me I'd choose to secure a small win first But I'd pick a small one with huge upside potential Why do you think OpenAI acquired OpenClaw?
Why did Meta want to buy Manus (Note: Meta's acquisition of Manus was later revoked; our show was recorded before the revocation) Why doesn't Google acquire anyone?
Oh, Google did acquire someone Google bought the Windsurf team Okay Windsurf Yeah I don't get it Haha What do you mean you don't get it?
Honestly, it's just that I don't get it I think I think Meta's acquisition of Manus I think for them The biggest benefit was If aside from how much they spent The biggest benefit was gaining a really strong product team in Asia What does being in Asia signify?
Because I think on one hand Obviously everyone knows China's AI talent pool is still quite deep Although perhaps currently in terms of technology Purely from a technical standpoint Chinese AI hasn't really caught up with the US yet But Obviously there are many talented AI people in China Whether in pure technology or in product In terms of product, I think China essentially
has better talent than the US Right, so for them I think Manus became a foothold in Singapore So they can attract some For example, from China Or from Singapore or East Asian talent And I actually haven't fully figured out How important this product itself is to Meta Or in other words
Why couldn't Meta just build this product themselves?
But whether it's Manus or OpenClaw They were in fact born from outside teams Why weren't they built by this group of Silicon Valley researchers?
Have you thought about that?
Yeah, I think Hmm, for me this question Actually I think once a company gets big Its burden gets bigger too Like, I might be a researcher and we can build something really interesting-looking very distinctive products But once I make that product public
There's a ton of responsibility that comes with it First, you can't just launch this product and tell all your users You need to go buy another computer to do this Otherwise it might gain access to everything on your computer All the permissions— and crash your system Mm-hmm So for a big company Take Google, for example Google would never release a product like this Right? Mm-hmm
Right? Mm-hmm So it takes a lot of time to polish the product And you have to make sure there are no legal risks and that it won't damage your brand with users Plus, if you ship it you probably have to allocate some relatively fixed resources to serve this model or serve this product line So yeah, yeah For big companies
I think there's quite a lot of burden But for individuals it doesn't matter I mean, it's an open-source project anyway So what if my code is terrible Come help me write it Right? Hahaha, yeah
Right? Hahaha, yeah I think whether it's Manus or OpenClaw they actually point to a direction which is this is also a possible narrative for 2026 What are your thoughts on 2026 and what are your expectations I think there are really so many possibilities And for me in terms of model capabilities I think
Models— I sometimes really love saying this slogan which is that models should achieve train with finite context, use as infinite context (finite in training, infinite in use) In other words you use this limited this context length (context window) to train it but in usage, it can use a very, very long even nearly infinite context length I think this has a chance of being realized this year
And once this is achieved I think it will unlock many new applications because, to give the simplest example you could potentially let this model interact with you continuously and continuously receive your information And as it runs it will continuously evaluate the current context and your conversation and possibly discard information it deems unimportant And then it becomes the personal assistant everyone dreams of
Yeah, I think technically speaking I think this will definitely be realized this year no matter what But of course, of course I think what people haven't reached consensus on yet is how to technically achieve this Mm-hmm Obviously there are many technical paths But I think right now it's more about trying to see which path can work There might be several paths that all work
Then we'll have to test them experimentally under common user scenarios to see which path is the most efficient Yeah, I think we're more at this stage right now rather than a stage where no one has ideas Everyone has ideas but we need to figure out which idea is the right one Standing here in Q1 2026 as a frontline researcher do you think the pace of model improvement is slowing down
I think not at all (not at all) I think not at all How does its velocity curve compare to '25 and what's changed from '24 Mm, it's hard to say quantitatively because you need to give me a standard before I can quantitatively tell you because if the standard you give is, like I just look at some Benchmark like say SWE-bench how many points it gains each month then this will definitely slow down because by definition
this Benchmark maxes out at 100% Mm-hmm so the closer you get the slower it definitely gets but this doesn't necessarily mean that users feel the model's capability growth has slowed because going from 50% to 60% it might feel like, hey that's a bit better but quite possibly For example, from 70% to 75% It found that the gains are even greater than from 50% to 60% Mm-hmm That's entirely possible If it's from 80% to 90%
Or 90% to 100%, the difference would feel even more significant Not necessarily Because maybe past Maybe around 80% to 90% Users wouldn't notice any difference It might even get worse You said it doesn't get slower at all Based on what criteria?
I think it's based on my personal feeling as a researcher Like My personal impression is the model's ability to learn things is getting stronger and stronger It used to take a lot of effort to get the model to learn to do something But now it probably doesn't require that much effort The most important thing is you need to clearly define the problem and figure out how to build the right data (Mm-hmm) Of course, data
Data is broader now including environments and such And the rest often seems to fall into place naturally Right Why is the learning ability getting stronger?
The model's learning ability has improved I think maybe on one hand There could be many reasons But I think one reason is pre-training Actually, over the past few months I think it has been getting stronger Pre-training Right right Model pre-training has actually gotten stronger in the past few months (Mm-hmm) I think this might be somewhat controversial in a sense Because a few months ago I think
many people were already discussing whether this Scaling Law had reached its limit Mm-hmm My experience is that it hasn't And my feeling is in the next four months I don't see any signs of it ending either Mm-hmm Why do people think it's reaching its limit?
I think, well I-I-I obviously don't know why people think it's reached its limit Because I myself don't feel it's reached its limit But my guess would be When someone thinks a pattern has reached its limit it's basically one of these two situations Ah One situation is they feel the applicable range of this pattern has reached its limit Ah maybe Maybe Fundamentally speaking Scaling Law
simply can't extend infinitely which could be true But this is just a guess That is, this person might feel that the applicable range of this pattern has reached its limit Another possibility is this person feels that this pattern one of its conditions can no longer be met For example, they feel that data has already hit a wall Then I simply haven't extended it further Another possibility But actually there's a third possibility
The third possibility is that there's a bug somewhere in their work that they haven't noticed themselves So they think it's reached its limit Oh From my perspective From my observation I think probably the vast majority of people who hit a wall it's because of the third reason It's because there's a bug What kind of bug?
I think There are many possible kinds of bugs For example, one possibility is When you're working on Scaling Laws Some scientific assumptions weren't quite right For example, what kind of token horizon you choose That is, for each model size, what kind of expected training data volume you pick And then this amount of data Where this data comes from And then It's possible that these more scientific choices weren't made clearly
That's one possibility But I think there's another possibility, which is there's simply a bug Actually, I don't think this is surprising in the industry Many times Fixing a single bug The progress it brings is far greater than some fancy tricks Right And of course, there are other situations These two examples I just gave are situations I've seen quite often
So how do you deal with bugs How do you solve bug problems I think, right I feel like this is more of a mindset issue Because when you encounter a bug If you think it can't be fixed You'll say we've hit a wall When you encounter a bug I think, oh This can definitely be fixed Then you'll feel like we haven't hit a wall yet Because everyone definitely encounters bugs I think, I think
This might be like what you said That is There are some things that are more about belief But for me A more important thing is the working system That is, when something is different from what you predicted Can you systematically rule out various possibilities I think this is a very important thing Mmhmm This is something I think Gemini and Anthropic do well That is Especially in pre-training
That is, when behavior at a certain scale might be different from what you imagined People can design reasonable what we call ablation experiments (消融实验) reasonable experiments like this can help you see test whether some of your imagined possible factors are actually the real factors I think this systematic approach to problem-solving is the key Mmhmm You think
Model capabilities can still improve Then its driving force Data and compute Algorithms Which do you think is the main driving force I think they all contribute But in a sense Data and compute are two things that are actually very strongly correlated Data and compute, mmhmm Right because When your compute goes up you'll naturally attract more data When data goes up you'll naturally need more compute Right, and then
For algorithms, I think Algorithmic progress often has a phase transition That is, there's a phase where you haven't figured out what to do at all At that stage, algorithms are extremely critical Because when you haven't figured out what to do at all you might have no way to scale up at all And then you might get stuck there But at a certain point you might discover the most important thing in the algorithm
Then it might suddenly go from completely impossible to possible And then after that, algorithmic improvements are more of a gradual improvement That is It might improve your computational efficiency or the efficiency of using data Right, and then Let me give an example For example, from the perspective of language model pre-training Then this leap in algorithms Well
I mean, the development of the Transformer But after the Transformer was discovered It's been mostly gradual and smooth Improving its efficiency Or your use of data Or the efficiency of compute usage has been improving Right So the current drivers are compute and data I think within the relatively clear frameworks we have now The main drivers are compute and data By clear framework, I mean For example, pre-training and post-training
Whether it's post-training based on reinforcement learning Or based on supervised learning That is, post-training with supervised learning For example, within these two relatively clear paradigms (范式) Indeed, compute and data are the main drivers But it's undeniable That in some other directions, the driving factors might be different Hmm, what do you mean?
To give a simple example For instance, multimodal generation Hmm Well I think it's probably something that, algorithmically speaking Hasn't been fully figured out yet So that's still a scientific problem That hasn't been solved yet Right But language is no longer a scientific problem Natural language generation I think, for now Before this technical approach hits a wall I think it's relatively clear scientifically But in terms of engineering There's still so, so, so much to be done
How much more do you think pre-training can improve?
Improving model capabilities through pre-training How much more How much further can it go Can we expect That's just how people are I mean, when you haven't hit the wall, you Don't actually know how long the road is What I can What I can see is that we haven't hit the wall yet But I don't know when we'll hit it either If I really had to estimate a timeline As I just said I think four months
The next four months will still see progress But in the AI field No one can predict what happens after four months Hmm, so over the past few months When you look at pre-training and model capabilities You're still very excited Is this the general mindset and state around you?
I think so I think so Is this within a small environment at Google Or in the entire Silicon Valley environment I think it's hard to say for all of Silicon Valley Because Silicon Valley is too big a place People working on products might be excited about products Right, for product people What excites them most might be something like OpenClaw Hmm But for people working on models It's probably
That we get more excited about this kind of model progress Hmm I think Uh For people working on models Is excitement a consensus?
Over the past four months I personally think so Oh, I personally think so At least within the circle I have access to I think at Anthropic and Google, people Or at Gemini, people are probably thinking more about How our AI will keep progressing And soon we'll be replaced After being replaced, what should we do?
Haha, rather than worrying about what to do when models hit a wall Hahaha Speaking of which Why Over the past few months Coding has been developing the fastest Why is this the case?
I think the coding scenario First of all, coding itself Hasn't just been developing the fastest over the past few months I think coding itself Actually From Claude 3.5 (new) Or some people out there called it Claude 3.6 (yeah) After that It's been in a state of rapid development ever since And I think That was early last year Or the end of the year before That was October of the year before last
October of the year before last Yeah yeah It should be, maybe October or November But around that time From then on I've been in a state of rapid development I think the coding scenario has Two biggest advantages The first advantage is its reward signal (回馈信号) That is, its feedback signal Is very well-defined Because
For example, if you For example, something like a software engineer (软件工程师) task Often the situation is I need to write some code To implement a feature A feature (Yeah) This feature needs certain inputs And produces certain outputs This is something very easy to Very easy to test So its feedback signal is very clear Your input and output match up
Then it means your implementation is successful If not, then it's unsuccessful (Yeah) But this is just one example In coding-related work There are many, many Many such well-defined feedback signals And another big advantage is Coding data has a very natural foundation That foundation is GitHub GitHub has aggregated over the past few, roughly
Decades A lot of high-quality code written by many excellent programmers And starting from that code You can build a tremendous number of environments I think these two things, from a model perspective Are why coding can be done very well Of course, I think from a product perspective There's another reason Which is that coding The demand for this product
Is in a sense Relatively singular It's not like when you build something like a social media app Or a game Where everyone might have different tastes And it might be hard To satisfy everyone's needs Then you might need recommendation algorithms But with coding The good thing is that excellent programmers writing code
Actually have fairly similar styles What kind of style Clean and concise Yeah, right, good code is (Not messy) There are some shared standards For example, like you said The code is concise Structurally clear Suitable for future development And has reasonable abstractions And of course many other standards But I think good programmers tend to have A fairly consensus-driven standard
On this matter So from a product perspective It actually makes the coding product much simpler In your current work What percentage of code do you write with Claude Code How many times more productive does it make you You just asked a question that almost got me fired Google doesn't allow using Claude Code Hahahahaha, oh right
I think, for me A conservative estimate Maybe 90% of the code is model-generated But it might be I need to spend a lot of time reviewing the code To see if it's written appropriately Written reasonably Whether it's really what I wanted it to write And I think after having AI-assisted tools The most important thing about writing code Has become
How you design it How you design the logic of your code And which files it needs to interact with Files to associate with And what things need to be done And you need to give the model Maybe provide some reasonable context I mean, like For example, this code You can use it as a reference (参考) to take a look Right, actually outputting code I think models are way more capable than humans
So for me If you actually count How many lines of code I wrote by hand How many lines of code the model wrote I'd say conservatively, the model wrote over 90% If not conservative, maybe 99% or 100% The remaining 10% is what it can't write Or why you didn't let it write Conservatively 90% Giving myself some credit Hahaha I think what it can't write And the part I can write is becoming less and less
Less and less and less What was it like in the past It was what it couldn't write I think Very early on, maybe about a year and a half ago At that time To be honest, on the market Only Claude was able to Actually write this kind of software engineering code At that time You could still feel many flaws in the model For example Sometimes when it wrote code
It would only focus on this one file It wouldn't pay much attention to multiple The relationships between multiple files And if, say, a class Its definition was buried many layers deep Or it wasn't directly nested in this This direct tree structure The model probably couldn't find it Now I think this is happening less and less Hehe Really less and less As a researcher
Your programming workload How many times that of the past Because from the perspective of writing code It's quite hard to quantify this But if we talk about, say, running experiments And the efficiency of implementing ideas I think compared to a year or even a year and a half ago It could be 20 or even 50 times faster
Right, because models have really become It can be pretty insane You can open several at the same time And you have several ideas And test them simultaneously And sometimes even The model can help you monitor some experiments Monitor some results and stuff So It's really quite a significant efficiency boost Right but If we talk about personal working hours
I feel like it has made my working hours longer Why is that It's just that Because development speed has increased The more you try, the more you want to try There are more and more ideas to try So it feels like before, you might have had this situation You have something Like this file You haven't seen it before You might not quite understand it yourself Then you'd definitely have to spend time finding that person
And you'd schedule That person, maybe a few hours later But now it's not like that You just see this file You don't understand it, just ask Claude or Gemini Gemini might tell you the result in five seconds And you just keep going Hahaha, so in terms of working hours I feel like working hours have actually gotten longer And the intensity has increased too Well, Google isn't that Google anymore Is that so Not that
Google where you can coast along Not that work-life balance Google I feel like in the GenAI (生成式人工智能) field No one can just coast along Hahaha So what hours are you keeping these days I usually start around 9 in the morning Get to the office at 9 At 9 AM, I might first get up and check emails And look at the experiments from the night before Then get to the office around 10 And then at night
If I'm alone in the US I might stay until around 10 or 11 Of course, if my family is here If my wife is here I might go home a bit earlier But at home I'd be working anyway So I think in the GenAI field No one is just lying around Unless You've completely lost interest in technology And have no ambition for yourself Then no one would care if you just lay there
But I think most people are quite self-driven They just want to do it themselves, right Do you think other fields Will have more of these Claude Code moments Where will the next explosion happen after coding You asked a good question If I could see it clearly I might have gone out to start a company already Hahahaha Right, but but It's true that besides coding We can already see That many
Other directions are already having a big impact But if we only talk about those directions They might not be a good market direction Because Coding is special in that It itself is A very large market But if you look at some other directions They might not be Such a large market For example Some people say the next direction is
This kind of AI-generated content or something But AI-generated content How big is that market Right I think If you say this content Is for people to consume Then people have limited time No matter how much content you generate People's time is only 24 hours a day
Right Unless it completely replaces people Like replacing TV Then that would be another story Like the Vision Pro that came out before Then that would be another story But that would be A bigger story So I think Besides coding Everyone is still looking for The next big market And if there is one I think there will be
But it's just Not necessarily that big I think the most likely one might be This kind of interactive education Or maybe You said coding is not a direction for you Because coding itself is already very big Yeah, it's already a huge market Do you think AI researchers How should we treat coding Should we use coding to validate our ideas Or should we make coding itself the end goal
I think there are two types of people One type is They genuinely want to make coding better Another type is They want to use coding As a means to validate AI capabilities I think both are fine Both directions are fine But I think The people who genuinely want to make coding better They need to think more about products And the people who want to validate AI capabilities They need to think more about
How to build better benchmarks Right I think both directions are very meaningful Just their focus is different Do you think The current state of AI research Is more like A gold rush Or more like A scientific revolution I think it's a bit of both Like those things AI There are many things that AI actually can't easily do
But conversely, humans might do better For example, being a product manager To be honest, I think Being a good product manager Is something I currently can't figure out How to train AI to do Why is that There's no standard There's no standard (no metric) Like what makes a good product I can't really figure it out There's no very objective standard
You have to build it and let people use it Only then do you know it's good Then everyone will say it's good Right, I think That's something with very unclear feedback signals Then I don't know how to train AI to do that Right When will programmers be completely replaced Will there be such a day Mm-hmm I think that day will come But it won't come all at once
It won't be like programmers are all still there And after one night The next day all programmers are fired It won't be like that It will definitely be a gradual process But Everyone can already see this gradual process now Because some companies have already started laying people off Right, I think In a sense AI is a In a sense Of course it's a very good thing But from another perspective It might also be
A very unfortunate thing That AI is a very centralized technology It will make a small number of people stronger But will make most people lose Their unique value Right, so I think For traditional software engineering The final result might be Now 1/1000 of the people do the work of everyone in the past
Earning 100 times the current salary Then what advice do you have for programmers I think Haha, I think maybe Embrace new things I think that's very important I think One very important thing for future programmers might be How to effectively collaborate with AI Mm-hmm like There are many things that AI might do Not that well Like how to
Reasonably design an implementation plan for something And how to design it So that it might align with the company's Future development Those kinds of things You might have a hard time telling a model To make it understand these things Those things might still need humans to do But maybe things like specific Very specific Like the work many programmers did in the past Where your manager tells you to implement this plan
And give it to me by next Friday I think that kind of work Might not exist in the future Then what kind of programmers would be in that 1/1000 What are their traits 1/1000 is just a figurative number I really don't know if it would be 1/1000 Or 1/10,000 Or 1/100,000 Or maybe 1% Don't be so pessimistic I'm a famous pessimist So don't take it too seriously And I think Good programmers in the future
First, technically speaking They will definitely be very strong Because if you're technically weak There's no reason Why AI can't replace you But being technically strong might not be the only thing It won't be a necessary condition It might be a sufficient condition Another thing I think will be very important is that you have to understand how your part of the work fits into a large organization or a big company how to
how to adapt and integrate into it (Mm-hmm) This might also be an important thing Mm-hmm and And of course there might be many other things For example whether this person's planning ability is strong enough If their planning ability is strong they can definitely take this big very complex thing and break it down into many relatively smaller things and hand them over to different AIs to do But right now these three abilities seem important
Things that AI might not be able to fully do yet doesn't mean it won't be able to in six months Maybe in six months you come ask me I find that the last thing AI can already do Then only two things remain Another six months later Maybe the remaining two can also be done Then maybe my answer would become more pessimistic So No one can predict what will happen in six months
I can only speak from the current perspective That past Spring Festival Another thing many people paid attention to was Seedance Will Seedance make Google anxious I think actually Possibly yes But this anxiety Hasn't reached me yet Maybe it gives the Google DeepMind team responsible for multimodal generation some pressure But if you ask me
I think I might not think they have much to be anxious about Like I think It doesn't reflect any paradigm shift More importantly, I think ByteDance whether it's the product effect or possibly in terms of data and such These details are done very very well I think indeed ByteDance has historically had a relatively strong advantage in multimodal generation
But I think at least personally I haven't experienced that it's a paradigm shift Then maybe It's not enough to make everyone very anxious Right but there is definitely pressure Right Does Seedance's product capability come from model capability Or product capability I haven't worked At ByteDance So I don't know the specific details But if you ask me to guess I think the model probably accounts for the majority
Mm-hmm What does good model capability come from Comes from data Because there probably isn't fundamental innovation in algorithms I think algorithms First of all because multimodal belongs to what we just said, still belongs to that scientific problem Multimodal generation belongs to scientific problems Right, multimodal generation Still belongs to a relatively scientific problem (Has multimodal understanding been solved) Compared to generation it's definitely more systematic Has a more systematic understanding
But compared to text tokens Definitely still not that The paradigm isn't that fixed yet I think in generation it might be Because it's still something where the paradigm hasn't been fixed Maybe each company uses somewhat different techniques big or small differences And um right now we can mostly just see In terms of effects
Maybe ByteDance and Google DeepMind are In terms of effects The two that do it better Mm-hmm, so it might also come from details Done better Right if you ask me to guess I would guess data Data If you ask me to guess I'd guess data but I haven't worked at ByteDance either So I'm just guessing blindly haha (Mm-hmm) What do you think about Wu Yonghui going from Google to ByteDance (ByteDance large model team Seed lead)
Who am I to judge haha To evaluate Yonghui I think I think Of course, I haven't worked with Yonghui in the past, so actually I can't really give a very good assessment or an objective evaluation But I think after I joined Gemini, I saw more of Yonghui's good side I think, by looking at him, sneaking a peek at his past code commits
and the projects he's led, my feeling is that he's one of the few people I've met at such a high level and also very senior yet still has very strong technical skills I think that's extremely rare So I think I think I'm probably not yet at the level to evaluate Yonghui at that level But if you ask me I think Yonghui is extremely strong
You say, taking a snapshot in Q1 2026 Do you think the capability gap between Chinese and US models is widening or narrowing?
How far apart are they?
I think Um If we take a snapshot right now and look at the development trends over the past year or the past year and a half Obviously the gap between China and the US is getting smaller and smaller But whether this gap will eventually close completely or even if China surpasses the US I think that's an open question
I think for Chinese AI researchers and research institutions, it's also an opportunity And I think one very real thing is that China is indeed at a significant disadvantage in terms of actual compute resources It's at a big disadvantage But this significant disadvantage may have actually forced out some interesting things For example, Chinese model companies
are actually quite good at distilling from others Right Recently Dario (Anthropic Co-founder and CEO) called out three companies for distilling from them Recently Dario (Anthropic Co-founder and CEO) called out three companies for distilling from them I think distillation itself is actually an open secret But I think there are different ways to approach distillation There's brute-force distillation and smart distillation two different approaches Um
What do you mean by brute-force distillation?
To give the simplest example of brute-force distillation: It's taking a bunch of tokens generated by Claude and forcibly training on them If you do something like this I feel First, it's not very ethical from a business standpoint And intellectually, it's rather foolish Because the companies doing this
essentially demonstrate one thing they don't even know what they want to do The only thing they can do is copy others and make their model look a bit better on the benchmarks Right, but essentially it shows that they don't even know what they should be doing That's brute-force distillation But actually, distillation also involves some very interesting scientific questions For example, is there a possibility that Just a random example Like, could it be that
in my process of generating my own training data pipeline I use other models as assistants Or the answers generated by my own model use other models as their evaluators This is actually, I think, commercially a bit of a gray area But from a technical perspective, it's quite interesting Because if you think about it, in a sense Chinese labs may have become
pioneers in Multi-Agent (multi-agent) training Oh And it's true Multi-Agent Because if they use models from different companies with these smarter approaches and integrate them into a single training system each model's distribution might be very different The distribution of their language is very different This is true Multi-Agent It might be more so than
for example, using several Geminis together It's something more technically interesting So I think, for me, the distillation of intelligence I don't know, commercially whether it'll end up being clearly wrong or clearly right But technically it's actually quite interesting Which companies are you referring to with these two types of distillation?
Can we bleep out the names in post-production?
(Sure) Hahahaha First of all, I haven't worked in a Chinese lab(实验室) So I don't know exactly who But my feeling is XXX probably used hard distillation And XXX might have done hard distillation before But later they probably gradually tried to shift toward soft distillation I think it's fairly obvious The one that probably distills less is ByteDance I feel like ByteDance's model
is still quite distinctive Hmm, what makes it distinctive?
For example, this model How smart would you say it is?
I think Doubao is definitely not as smart as Gemini or Claude But first of all, Doubao For example, Doubao's voice generation is extremely good Wait, is that difficult?
Technically, Doubao is indeed the best at it Because I find that for life questions I just want to ask Doubao Because it's so fast But other models Why don't they optimize this product feature?
I think it still has to do with their user base In the US I think people are more focused on how to improve work efficiency Don't you have life questions?
I do in my life First of all I personally am indeed pretty boring in my personal life So I don't have many interesting life dilemmas to ask Doubao The questions I have more often in life are all technical ones Asking a smart model like Gemini is the best Hahahaha Right I don't have this urge to open Doubao at midnight for late-night emotional support It's not just emotions, but many things Like when you're cooking
Hmm You might run into some problem You might need someone to tell you right away But you don't have such a person Hmm those I think it's probably more of a data issue And probably for US companies the main priority right now is intelligence or work efficiency Someday in the future Will it become these daily matters?
I think it's possible The fact is If you ask about these daily topics actually you'll find that Gemini from generation to generation does better and better Hmm Actually, many of my friends including myself in the past When I was at Anthropic before I might ask Claude to write code But for daily lookups I would ask Gemini, right Have you used Doubao?
I've actually only used it once or twice I noticed you guys don't really use it much Hmm, first of all Is it a pecking order thing?
(There's an intelligence pecking order) Hahaha, no no, not that serious I just think first of all It's like people in China trying to use American models There are some complicated things involved Oh Me using Chinese models in the US is actually quite complicated too Second, I simply don't have the motivation for it Especially since I think in my life Work is work
When I'm relaxing, I just find different work to do So for me My best companions are Claude and Gemini But it might not be like that for others So it might just be my personal thing The one or two times I used Doubao myself It was because someone showed me the Doubao phone Hahahaha right So what do you think of the Doubao phone?
I think it's a great idea Personally, in terms of results They actually did a pretty good job Of course, what I don't know is Technically, how well optimized it is I mean, it I think it executes some tasks in real time From a results perspective, there's no problem But I don't know how much overhead it has If that overhead is very, very large
Then it's probably a technical issue that needs to be solved. Mm-hmm.
Because you don't want, you know Your model to book a high-speed train ticket for you And end up costing more than the ticket itself That would definitely be unacceptable Right so Technically speaking I personally don't know how mature it is And from a product perspective For everyone, it's still quite Can't say surprising But it's something that gets people pretty excited And I think Apple probably wanted to do something like this before
It's just that Apple's own models haven't been that great Apple doesn't seem to care much about its AI strategy Now, I think Apple definitely cares about AI strategy Because Siri, the phone assistant Was in Apple's product launches A very, very important highlight But their own models didn't catch up Now they might be trying to do this through a partnership with Gemini
To try to make it happen As for whether they care about it now First of all, I don't know If you ask me to guess, I'd definitely say they care But if you ask me to explain Why from the outside it doesn't look like they care that much My only guess is that If from the outside it looks like you care a lot And you still can't pull it off Then you just look stupid Ah
Saving face Ah, right, hahaha (I don't care) Then let's talk about Doubao's model You just said Doubao's model is quite distinctive Can you be more specific?
One is that its voice is really well done That's the first point I think the voice is really well done It's the most distinctive thing I can feel I mean, I think the voice quality might be To put it politely, probably one of the best in the world To put it bluntly I think it's simply the best in the world Mm. Is that hard?
Mm. Is that hard?
Mm I haven't gotten to that level myself So I don't know if it's hard or not But I think it might be something that takes a lot of effort Whether in terms of data or various optimizations Is it a product thing or a model thing?
It has to be a model thing It might also include some product aspects But it's definitely a model thing Right. And then
Right. And then I think that's one aspect And on the other hand On the other hand, I don't have that much personal experience Because I haven't actually used it that much So it's probably more from Feedback from friends and family That is Hey, this Doubao model is just fun to talk It's just fun to chat with Haha right But I think that Is more of some subjective feedback I think one is the voice
And another is that it Generates very fast, which is also very important Because many models Are showing you their chain of thought But I'm talking about trivial things in your daily life I don't want to see its chain of thought Right. I don't think this is technically difficult
Right. I don't think this is technically difficult It's just that maybe People haven't spent more time on it yet On this And the fact is If you try Gemini 2.5 Pro and Gemini 2.5 Flash You'll find Gemini 2.5 Flash When completing the same problem It's already much faster than before And much less fluff So I don't think this is a Mm-hmm, in my view it's not a technical difficulty
It's more about when to pay attention to it And do something about it I think maybe it's now Right now these American companies Are all still in the stage of Working hard to push the upper limits of intelligence forward And ByteDance Of course it's also pushing the upper limits But I think It might just be doing very well in user optimization too Also doing quite well Recently there's another topic
That Chinese robots are very hot right now At the Spring Festival Gala I don't know if you have any observations about this I've watched some performances Also searched for some prices on Amazon I was really surprised they're so cheap Haha, did you buy one No haha I wouldn't have any use for it even if I bought one But indeed I used to I don't know, in my mind I thought humanoid robots And
Of course at the software level there's nothing really But mainly hardware I thought for hardware to be this mature It would probably cost something like Several million dollars or something But it seems when I checked The price is much cheaper than that I think this still reflects China's hardware industry chain Still has a lot of advantages But I Don't really know if it As a As a robot In terms of hardware
I think it's indeed very very strong And from the software perspective I haven't quite figured it out I think robot models Are also something with relatively large disagreement right now Right What do you mean What I mean is I think robot models are probably more in the Feature engineering era Like you have a given environment A given scenario
You optimize for that scenario People know how to do that Mm-hmm but doing RL Doing reinforcement learning Doing reinforcement learning Building appropriate virtual environments Still virtual This kind of This kind of data Then you do training Can improve But it doesn't have strong generalization I think this is Whether there is generalization Is actually a watershed for many AI directions
A deterministic scenario A very single scenario Can you do this well This wasn't solved just in recent years It could be done more than ten years ago Like language is also language In this era before Transformer-like architectures It wasn't completely impossible Right, back then You could also train very strong models to do translation Mm-hmm You could train a very strong model To do semantic analysis
But what you couldn't do is I can improve all abilities across the board By improving at one level Mm-hmm I think this is a watershed And I think language models After Transformer and GPT Entered that kind of stage Crossed a threshold Where you can improve all abilities by improving at one level And you might train at one point It will abstract this ability And generalize it to all related things
But I think robots haven't reached that stage More still before that stage Where I have a single scenario A single thing Then I can optimize for that So what do you think About these robotics teams in Silicon Valley And there are also a lot of robotics people inside Gemini Mm What do you think That direction is a bit...
What would you call it Is it a sub-direction of yours Or a parallel direction Or what I think In the past, it was quite a parallel direction But now, for robotics I think people are also trying To see if they can leverage language models As a base model And then train something like For example, VLA (Vision-Language-Action model) Especially multimodal models Right, right, right, and um So now
It has become something closely related To the language model track Mm And personally, my feeling is They will become very important in the future But they haven't found their own path yet But what they're doing is really interesting I highly recommend everyone go check out Robotics labs They're way more interesting than language model labs Language model labs
Feel like normal offices But robotics labs, they really Have people controlling these robots Collecting all kinds of data And watching the robot in like Shelves picking up all sorts of items and stuff Doing things like that I think it's a very interesting thing Which one did you go to Ah, I went to Wait, Gemini's own lab No, not Gemini Google DeepMind's own lab I've been to see it And also that Dyna I've also been to see
They have a clothes-folding robot Right, their scenario might be a bit more narrow Like folding clothes Is one robot, maybe doing some other things Like pouring water and stuff Right, like that Your intuitive feeling Where does robotics progress compare to in LLM years It hasn't reached the GPT-1 moment yet, right Definitely not I think it definitely hasn't, right Mm It's like everyone still hasn't Figured out how to scale up
I think for me Whether it's robotics or multimodal generation Neither has reached that point Then let's get into today's main topic We're still very interested in you And chat about How you went from someone who studied physics Into the world of AI Mm Where did you grow up How did you grow up I I was born in Ningxia In a very, very small city
Called Dawukou See, that confused expression of yours Already shows how small this city is Mm This city existed in the past because of a coal mine Also because of Shitanjing A coal mine And then this city came into being Right, so I was born there But I Went to Shanghai with my parents during elementary school And so The latter half of elementary school and my middle and high school were in Shanghai
Then I went to Beijing for undergrad What I just mentioned Undergrad in Beijing Then PhD in the US Right You had good grades since you were young, right You got into university through physics competition And studied theoretical physics at Tsinghua and Stanford Right, I didn't get in through physics competition Hahaha I think I was quite mediocre when I was young Hahahaha Ah first of all
The middle school and elementary school I attended were both nobodies Hahaha I think I The middle school I attended at that time, competitions Were not something you should consider It was that kind of middle school Called Shangnan Middle School East Campus Another school that makes everyone confused A school that leaves people baffled
Okay, since we're here, which elementary school was it What was the elementary school called (Dezhou Second Village Elementary School) My context management ability is too strong I can't even remember what it's called actually Hahaha mm-hmm right And right It was that middle school It was um In a small environment within one class There were still some classmates who wanted to do things properly But overall
I think that middle school was in a relatively laid-back state Right and I think maybe my grades were okay (What do you mean by okay) Okay means at that time the situation was Shanghai high schools had so-called At that time there were so-called four top schools Like Shanghai High School Then Hua Er Jiao Tong and Fudan affiliated high schools Right And at that time the situation was I could get into these four schools
But couldn't get into the best classes in these four schools But at that time I really wanted to do competitions Because I had never done competitions before You started competitions in middle school I didn't do competitions in middle school Oh, I never did competitions in middle school Why did you want to do competitions if you never did them Because I never did them So I wanted to do them
How did you get that idea (Hahaha, that's just how I am) My personality is I always love doing things I'm not good at Hahahaha right And at that time I hadn't done competitions But I knew about them So I felt that compulsory education Not compulsory education, but before going to college I should give it a try So but then My grades weren't good enough for that
So Going to the four top schools, the best four schools I couldn't get into their competition classes Then I discovered there was a slightly worse school That school was Gezhi High School A slightly worse school But that school had a competition class And I felt this competition class In today's terms it's an underdog Hahahaha Impressive
In the words of that time, I felt like the barefoot aren't afraid of those wearing shoes Hahahaha I think, mm-hmm Worth a shot So actually at that time, back then At that time Shanghai still had this so-called early admission system Where before the high school entrance exam You could sign a contract with a school And then you would reserve a spot at that school in advance And then go directly there
And then it was very natural to go And then go do competition high school So you were actually between the regular classes of Shanghai's four top schools And the competition class of Gezhi High School Without hesitation Chose Gezhi High School's competition class Of course I can't say I can't say that when I made the choice Getting into the best four high schools Was a sure thing Although my score was indeed enough later
At that time the high school entrance exam hadn't happened yet Right right but at that time I felt Even if I could get in I should go to an underdog place and take a gamble Why Because I wanted to do this What was your purpose for wanting to do competitions I think the main thing at that time was wanting to experience it I felt I hadn't done it I had to find an opportunity to do it
Why did you have to do it First, I felt it was indeed difficult Ah, it was indeed more There was just this excitement about difficulty Right It's indeed At least at that time Before I started The impression everyone gave me was That this thing was much more challenging Than the stuff you learn without doing competitions Mm-hmm The people who do this seem really strong If you don't do it you're just the smoothest stone
Among all the mediocre rocks So at that time I felt I should do it So I went and did it Of course doing it actually brought some benefits Looking back later If I hadn't done competitions at that time I probably wouldn't have gotten into Tsinghua Oh, did you get bonus points or something At that time actually The competition direct admission system had already declined significantly Only those who made the national training team could get direct admission
My high school Anyway I think I wasn't at the level of making the national training team So let's not talk about that But before taking the senior year competition exam By a twist of fate I went to Tsinghua for a summer camp And by a twist of fate on the last day of the summer camp I heard they were doing Independent enrollment But mainly aimed at Beijing students I frantically texted the admissions office teacher
Saying I wanted to take the exam with them He agreed And then he agreed to let us take the exam You all or just you Just agreed Me And the few people from our high school who went together Those high school classmates from Shanghai who went to that summer camp Oh what reason did you use to convince him to text him I've forgotten the specifics of that text But the general idea of that text was
You give Beijing students the exam Why not give Shanghai students the exam Oh, you were quite righteous about it Did you think they were playing favorites at that time I didn't think they were playing favorites I just felt they had this opportunity Why not give it to us Everyone's competing on the same playing field You were classmates at that time And so I sent this message And they actually let us take the exam How many people
I can't quite remember Maybe from Shanghai There were probably about seven or eight people in that exam room You sent that text Maybe Maybe other high schools had other students who sent texts too But from our high school I was the one who sent it Oh so They were all Shanghai high schools Students who went to Beijing for that summer camp Students who attended the summer camp Students who attended the summer camp
And then they let us take the exam And then we signed That easy to talk to Right, so what I learned from that incident The most important life lesson is Be bold Haha If you don't fight for it you'll never get it Even if you fight for it you might not get it But if you don't fight for it you definitely won't get it Were you nervous when you sent that text
You were still in high school I can't remember anymore At that time I felt Was this a very bold thing for me No, at that time I was completely thinking I have to fight for it now If I don't fight for it today I won't be able to fight for it tomorrow haha Like The day I heard about it I immediately started frantically texting Frantically texting who Texting the admissions office That Tsinghua admissions office teacher
Texting one person or multiple people Can't remember, probably one teacher Did he reply quickly Mm-hmm mm-hmm I think Tsinghua Just said yes I don't know if they discussed it among themselves But anyway in the end they said they agreed And then we took the exam together Right So I so I Why do I feel like I've always had quite a soft spot for Tsinghua I just feel that this school is willing to give people opportunities
to provide equal opportunities for everyone How did you do on that exam?
Well, when I came out, I felt like I totally bombed it Because I couldn't solve half a problem But later I found out others missed even more So I did get in after all Hahaha yeah exactly How many of your Shanghai classmates got in that year?
Ah, I think two Independent recruitment Was it a score reduction or something?
It lowered the cutoff to the first-tier university line Lowered to the first-tier line Oh So how did you do on the gaokao?
Later, sure enough, my gaokao wasn't high enough for Tsinghua But I could get into any school except Tsinghua and Peking University Oh So why Online it says you were recommended for admission I think it's just that people who didn't go to school during those years find it hard hard to really understand what happened back then Because two cohorts before mine you could still get recommended admission with a provincial first prize
A provincial first prize got you recommended admission What about your time?
In our time, with a provincial first prize you made the provincial team then represented the provincial team at the national competition and only by making the national training team could you get recommended admission I made the provincial team and went to the national competition But I didn't make the national training team Right So in my year, I didn't have a recommended admission slot Oh Were you good at competitions?
I think I was pretty mediocre Like Isn't not being the best basically the same as being mediocre?
And I obviously wasn't the best So I was just mediocre What was your family's attitude toward you doing competitions?
What was their attitude?
The best thing about my parents is they didn't really interfere much They may have tried to control me at some point but later found they couldn't Oh, how so?
I just didn't listen to them Oh I think most Chinese families it's already considered pretty good when kids discuss things with their parents I usually just informed them Haha, informed them of what?
Informed them, oh, I'm going to the independent recruitment exam Yeah and Including filling out applications for high school and college My parents might not have even seen my application forms Oh, they're pretty laid-back, huh?
I think they just when you can't understand what someone is doing the best thing is to not meddle I think my parents understood this very well Yeah hahaha So you're pretty rebellious, huh?
I think I am Pretty My personality is I really care about what I want to do If it's something I've figured out I want to do Don't try to stop me And I'll definitely do my absolute best But if it's something I don't want to do Forcing me won't help, I won't do it. Right
Are you very competitive?
Pretty strong Yeah, but I think I'm more competing with myself pushing myself, I guess Not really willing to compete with others Oh right Of course, if well it's something I think is important and you also think it's important then I definitely have to outdo you, hehe So then you got to Tsinghua, that was even more amazing You studied quantum physics, why?
Yeah, I was doing condensed matter theory at the time Why did you choose this major?
A twist of fate Looking back now Of course I can come up with some very reasonable-sounding explanations But honestly, going back to that time I think it was just a twist of fate So at that time we were in the Jixian class And the Jixian class had a very good tradition First of all, although the Jixian class was in the physics department It didn't restrict what students could do
Actually 2/3 of the students in the Jixian class wouldn't do physics Ah And for Why did you enter this class Uh At that time the entire Tsinghua physics department was Jixian class Maybe not anymore now Anyway it was at that time And another good tradition it had was It encouraged students to learn through practice So it encouraged students To enter research labs as early as possible And learn through research
And at that time I really wanted to do theory Was it because you found it difficult It feels like you have a fascination with difficulty Maybe it's also a kind of illness I can talk more about this later What are the bad consequences of this illness Hahaha Right and then then right Then I wanted to do theory And of course the Jixian class Or what we call the Xuetang class Had a smaller class And then the
Teacher recommended saying hey The Institute for Advanced Study is a great place Tsinghua Institute for Advanced Study The research institute founded by Mr. Chen-Ning Yang Is a great place So I went there to find a teacher And there happened to be A teacher who was still young at that time called Called Wang Zhong, he was my undergraduate teacher Mm-hmm, at that time he didn't have many students either And we chatted Of course I knew nothing
But he was quite patient And gave me Gave me some papers to read And after reading I discussed with him Later I discovered condensed matter theory Especially the project we were doing at that time Was related to topological insulators And these kinds of directions Actually Was a direction very suitable for undergraduates to get started with It didn't require too much background knowledge You only needed to know
The most basic thing is you need to know quantum mechanics Statistical mechanics Solid state physics Which are actually very very easy to learn Basic knowledge But it might really test The depth of your understanding of this knowledge So for undergraduates It's actually a particularly good direction Where you can get started quickly And do some actual projects And then we did some work together Among which possibly
The work in open quantum systems Looking back now is still quite important work Right and then In a sense I think looking back now Doing that work Doing research during that period Is actually very very similar to doing AI now It's more that you have an idea You have an understanding And at that stage you can You can do a numerical experiment
To verify whether your idea and understanding are correct You find AI is actually the same AI is also you have an idea You have an understanding You design some experiments To verify whether your understanding is correct And then you design some model Training pipeline To implement your ideas Right so actually these two are very similar Can you talk about your non-Hermitian system research Ah, I can talk about it I'll try to speak in human terms
But it's also possible I'll actually be talking nonsense So those who don't want to listen can skip ahead Hahahaha Slide the progress bar You can set two markers on the progress bar Right and then right Non-Hermitian systems are like this One of the most basic assumptions of quantum mechanics is An isolated system Its evolution is described by unitary evolution Unitary evolution is kind of nonsense
Sorry What unitary evolution means is It's a linear process And this linear process Can be described by an operator Called the Hamiltonian Ah, the Hamiltonian, in a certain sense It's somewhat like the energy of the system But not exactly It's somewhat analogous to So It determines how the system evolves over time And if it's An isolated system This Hamiltonian will be a Hermitian matrix
A Hermitian matrix is one where you transpose it And then take the complex conjugate And it's the same as the original But real systems The vast majority are not isolated systems For example, you Me, as a human being Definitely have to exchange information with the outside world And exchange matter Materials are the same If you put a piece of material there Unless you create an extremely high vacuum You always have to interact with the substrate
You have to exchange with the external environment So real systems Are mostly not isolated systems And isolated systems Won't be described by a unitary process And the corresponding Hamiltonian Won't be Hermitian either Hamiltonian That's where the term 'non-Hermitian' comes from It's essentially for studying open quantum systems Quantum systems that exchange with the outside world Their behavior And at that time, something very puzzling was discovered We were initially trying to study
Some topological phenomena in these open quantum systems And then we found The theoretical results from hand calculations Just couldn't match the numerical results no matter what More precisely The hand calculation result Assumed the system Had periodic boundary conditions For example, on a ring Or on the surface of a torus And numerically Because it's closer to the actual situation It would calculate with open boundaries
For example, the behavior of a material in a square shape And these two results just couldn't be reconciled So we tried to understand this And later found The basic paradigm people used to describe Hermitian systems A fundamental paradigm Is the so-called Bloch wave Which assumes the eigenstates of the system are Linear combinations of waves This Sine and cosine waves, that kind of thing Linear combinations of such waves
This assumption This assumption In non-Hermitian systems, it actually breaks down — it becomes wrong The fact is Later we found In non-Hermitian systems Actually, the energy eigenstates All Can potentially accumulate at one edge of the system Right, and then we systematically established this Set of descriptive methods
And then built a framework To describe a non-Hermitian system with open boundaries How to describe its eigenstates And thereby describe its time evolution And some dynamics So That was the work at that time And later there was a lot of Because it was actually a A paradigm shift So later there was a lot of Follow-up work
But later I actually switched directions So I didn't continue much in this direction Why didn't you continue with it It's hard to catch a paradigm shift, isn't it It's hard to catch a paradigm shift Yes yes This is the weakness of human nature I feel like I always love challenging myself with things I don't know Hahaha especially at that time Just I don't know what I was feeling in that direction
Maybe looking back at that work a few years later It would become the most important work in that direction Later when you do some more work It might indeed make you more famous Get more citations Write more good journal articles Find a good faculty position But it feels like for a scientific career It wouldn't be that exciting So at that time I wanted to switch to something else Switch to something I wasn't good at
Do it right And then So when doing my PhD I switched directions To do high energy theory High energy theory, right High energy physics, right So your undergraduate and PhD were also different Also different It's not just jumping from physics to AI Actually your undergraduate and PhD both look like physics But the directions had already changed significantly Right, two directions with almost no connection Oh, that's quite amazing Including your choice of competitions
Going to Gezhi High School was also quite amazing Right What kind of human nature is this I think it's just To put it badly, I love torturing myself Hahaha, to put it nicely, challenging myself Hahaha Mm-hmm, are you happy being tortured I think if someone tortures themselves just for the sake of being tortured Then that person has psychological issues But If a person is being tortured in order to learn more things
And enrich their experiences and abilities I think it's worth it Your undergraduate teacher Teacher Wang Zhong was also an underdog, right Does he count No hahaha He was doing quite well How can you say that about him haha (At that time) I just said he was very young No no no, he was very young But he My impression of him has always been He is a very sharp person Very capable of seeing problems
Trying to understand problems Understanding them very clearly Indeed he might not be like many teachers who are Very famous In society or very dazzling At least not at that time Now he's very famous At that time he wasn't that famous yet But I think in terms of ability I think he's very strong Right, and actually he started out When he was doing his PhD he studied with Teacher Shoucheng
Teacher Zhang Shoucheng So People who can be chosen by Teacher Shoucheng Basically won't be too bad Mm-hmm Did he say anything about you changing directions for your PhD He didn't say anything I think he is He is someone who doesn't like to interfere with others Hahahaha I don't know what he was thinking inside But I think He is someone who doesn't like to interfere with others Eh, quantum physics
What kind of worldview is it as a whole It and, um I think I think the biggest difference is I think, um There are many Many differences from classical physics But I think They are two corresponding concepts, right Classical physics and quantum physics They are theories at different energy and time Or spatial scales That is, essentially our world is all quantum Of course right now We don't know what exists at smaller scales
Right, like At smaller scales There are many different ideas For example, string theory is an idea And then look at other ideas Quantum gravity is also an idea, things like that Right, but none of those can be verified Verified The effective theory at the smallest scales is quantum physics The tiniest, tiniest scales That can be experimentally verified The effective theory at the smallest scales is quantum Of course, this includes quantum mechanics and quantum field theory
And classical physics is When the spatial scale you're looking at and Is relatively large This quantum physics Will gradually, gradually reduce to classical physics Actually, it's more about at different scales Having different effective theories This, this thing Is actually a very profound idea in physics It's what's called the renormalization group What the renormalization group says Is that
The theory describing a system At different energy scales May look completely different Right, and even if they may ultimately, at the root Are all a grand unified theory Of course, right now There isn't really a true grand unified theory If one exists Even if they share the same root at the origin But at different scales They may also look completely different So classical physics and Quantum physics Are more like two descriptions at different scales
Speaking of quantum physics There are several terms that seem related For example, the butterfly effect For example, quantum entanglement Can you talk about these I think this is something everyone can understand And I don't know physics either Don't blame me, everyone I don't know quantum physics either Right, I think Quantum entanglement Is indeed something relatively well-known And quite unique to quantum physics And then it's very simple It's like, say I have two particles
For example, they're in an entangled state And then maybe they're actually very far apart But actually If I perform some measurement on one of them Or perturbation It will also affect the state of the other This is real This is real, right What kinds of things have quantum entanglement What kinds of two objects, there are many There are many Just, there are many Actual situations It's actually When you look closely enough, enough, enough
At a small enough, microscopic scale The vast majority of particles may be in entangled states But practically speaking You can For example, create one spin and another spin First bring them together Then collapse them into an entangled state Then you can pull one of them very far away Then it becomes an entanglement A state entangled over a long distance And I think even, I remember a few years ago There were people who specifically did experiments
Putting a bacterium and some other thing Into a quantum entangled state What do you mean by prepare Into a quantum entangled state This can be manually operated This is something that can be manually operated Why, how do you operate it Generally speaking It's through some Some measurements and the action of evolution operators Can put it Into this state But the hard part here Is actually how to implement this experimentally This process
You can imagine It's like you perform some quantum measurements And some, some so-called quantum gate operations Actually It's quite difficult Which brings us back to the question just now That every system is actually not isolated You might have these two spins And you think, hey If I prepare them this way Don't I get an entangled state?
Then I just separate them and I'm done But the real problem is These two particles actually live in our world Other particles constantly Bump into them Or external heat disturbs them a bit And the state is gone just like that So the hard part is How to actually implement this process experimentally Right, and then Another example of entanglement might be more well-known I should actually mention that example Which is Schrödinger's cat Schrödinger's cat
That's a much more famous example It says its state is actually a superposition Of a radioactive source emitting a particle And the cat being dead That's one state The other state is the radioactive source not emitting a particle And the cat being alive, a superposition of these two So for example If you measure that radioactive source And find that it emitted a particle You know the cat is dead
No matter how far apart the cat and the source are Right, so that's entanglement But the butterfly effect is a Is a different thing And the butterfly effect Well the famous part of the butterfly effect Is actually from classical physics What people hear about in classical physics The butterfly effect is that famous example Where maybe a butterfly in South America Flaps its wings Half a month later
A typhoon hits North America But from a more mathematical formulation It says that at time At the initial moment If you make a very tiny perturbation And then measure the impact of this perturbation How large it becomes in the future You'll find That this perturbation grows exponentially Right, that's mathematically A description of the classical butterfly effect
But something people were puzzled about before Is how could this phenomenon exist in quantum systems Because as we just said, isolated An isolated quantum system undergoes unitary evolution It's a very linear process So in a certain sense If you have one state That is, one vector and another vector With not too large an angle between them initially Then after some evolution This angle shouldn't change And so there should always exist
This situation where initial states are Very slightly different And in the future, bam, it grows exponentially That seems from quantum mechanics, like Something unlikely to happen But as we just said Our world is actually quantum at the microscopic level And becomes classical at the macroscopic level But they're part of the same continuum How can one have it and not the other That's what people were trying to understand And of course Later people gained a better understanding
Which is that actually When discussing the butterfly effect in quantum systems You shouldn't discuss the change between two states This change Instead you should discuss something Called local observable(局域可观测量) That is, the change in local observables That actually corresponds to what you see In classical physics, those changes So after four years of studying quantum physics What were you thinking at the time What do you think physics helped you with
When you were about to graduate as a senior I think the biggest benefit of studying physics as an undergraduate Is first of all Think things through clearly Reading isn't about reading a lot But about reading deeply Reading a lot doesn't mean you can discover new things But if you have A perspective different from others on something That's what's more valuable To society This one thing And another thing is don't trust theory too much
Don't trust pure theory too much Because I came to this conclusion Because the main reason that discovery happened at that time Was because we could do numerics It started because numerics and theory didn't match Then we carefully studied that problem And discovered this thing Then why did you go study high energy physics for your PhD That's also a theory This brings us back to the topic we just discussed That always loving to challenge very difficult things
Sometimes also brings some bad results What bad results For example I feel like I think my PhD, for myself personally I learned a lot Grew a lot But for this world It didn't produce any contribution Haha, this high energy theory direction It's difficult enough Very very difficult And um But the bad thing about it is It's actually not particularly verifiable
There are no objective evaluation criteria Because High energy theory has developed to the point where Experiments completely can't catch up at this stage Experiments completely can't catch up to what you're discussing in theory Whether it's energy scales Or these microscopic scales Right How does it progress What does its progress depend on If not experiments One source of progress
Comes from mathematical self-consistency Mm-hmm, like for example You propose a framework To describe these things Then can you be self-consistent with existing Already verified theories at lower energy scales Like for example You study string theory Then naturally the question everyone asks is Can string theory at low energy Return to quantum field theory And then return to classical physics Then this self-consistency is one criterion
I think this is very reasonable A very scientific thing Of course there are also some unscientific factors That when this field completely lacks experiments And objective standards There definitely won't be just one framework that appears There definitely won't be just one self-consistent framework that appears At this time who does well Who doesn't do well Actually depends on The subjective judgments of some old-timers in the field
Did someone hurt you I wasn't hurt by anyone It's just that the longer I stayed in that field The more I felt this thing was stupid, like A person's life isn't that long Why waste your own time Serving old-timers Right So it feels like spending 5 years learning a lot of knowledge Buying a big lesson This lesson is
This big lesson is to (do experiments) Hey, it's about doing Things with relatively objective evaluation criteria Mm-hmm, or from another perspective Or from another perspective Like Do things that can have an impact on this world So actually your undergraduate went relatively smoothly, right In the quantum physics research field Very quickly You very quickly had very good academic results And it was paradigm-level change But you quickly felt it wasn't attractive anymore
So You wanted to challenge something more difficult in your PhD Right And during the PhD period it was actually quite lonely At least in terms of results it was like that Hahaha The outside world couldn't tell From the outside it all looks like a very glamorous resume PhD at Stanford Right, I think In terms of actual research output I think No one would say my PhD papers were bad But if I'm being completely honest
How much impact did they have on the world?
I think almost none No impact, practically zero Right, so for me personally I was really unhappy with that But I also wasn't unhappy enough to, you know worry that people would say I was slacking off I really wasn't slacking off You can still meet all the external expectations Right How do you pull that off?
Well, this is something that You know how it really feels, right?
Right exactly I think meeting external expectations Or meeting the standards of a small circle It's like training a model Once you're in that small circle And you know what their evaluation criteria are It's easy to do well Even if you don't actually believe in those standards You can still meet them Mhm But deep down, you know you don't buy into them Because sometimes even when you don't believe in it And you hit those marks
You can fool yourself and just keep moving forward But I eventually realized I couldn't fool myself Couldn't lie to myself Mhm Right When did you realize that?
I think probably around The last two years of my PhD I started having that feeling But back then, I hadn't really figured it out yet Hadn't figured out what to do if not this So I spent some time Exploring different directions For example At first I mostly looked into Quantum computing Or quantum information, that kind of direction
Then I got a postdoc offer After getting the postdoc offer It felt more urgent Because when you're still in school You can still have a student mindset After leaving school, it's your own career(事业) You have to carve out a path for yourself So at the time I felt Quantum computing and AI were probably two
I think they offer young people More opportunities So what was your postdoc direction?
The postdoc had no direction It was basically just theoretical physics A postdoc is a very independent position You basically do whatever you want Right, it's more like In a way, it's kind of like doing charity Huh?
Who's doing charity?
Well, there are probably some Whether it's government organizations that care about research Or private organizations They donate money To the university Or allocate funding to the school The school uses that money to hire postdocs Who then do research in a department And share their research Broadly with other people in the department I think it's more about creating a kind of social atmosphere This kind of This kind of work Right, and so
So there really aren't many restrictions You can basically do whatever you want But I didn't actually do The postdoc for very long I was probably at Berkeley for two or three months in reality But officially, I was only there for two weeks What do you mean by officially?
I mean I had actually already gone there before I officially started Because I was already in the Bay Area anyway I went there before I officially started But after I officially started I only stayed for two weeks before quitting What happened during those two weeks?
Nothing happened in those two weeks I wasn't even planning to start the position But the people at Berkeley were just too nice They were like, uh No worries, just wait until things are settled Come for as long as you can Oh, so you told them you were actually talking to Anthropic Right I told them Actually I think I might go do AI Maybe I shouldn't join Mm-hmm But Berkeley wasn't Not just Berkeley
I think the Bay Area Teachers at both these schools are very nice They really take care of you They felt you haven't fully finalized things yet So better hold onto the current job first Do you think physics helped you later when doing AI In what ways I think in terms of hard skills there wasn't much help In terms of pure tool-based skills Actually the transfer from physics to AI Is very very little
But I think if you really have to ask I think maybe the main Main No Can't say it's ability It's personality Maybe Maybe physics people want to get to the bottom of things more Want to understand something more And want to do things more systematically Because we're used to this very systematic Whether it's experimental methods Or theoretical methods So I think this might be A good thing
But I don't think this is unique to physics people either Like Why wouldn't computer science people have this trait I know many computer science people Who also have this trait Many chemistry people also have this trait Biology students also have this trait So I don't think it's unique to physics Right but actually it's quite interesting There are indeed many in this field Especially with language models This kind of large scale AI
There are indeed many people from physics backgrounds Who have been very successful Right especially at Anthropic this company When many people describe this generation of AI They all say it's a black box Can you use a scientific perspective To understand this black box The operating principles of artificial intelligence I think Everything in this world is a black box Like even physics Something everyone thinks they understand
Actually doesn't really have An understanding from its microscopic behavior All the way to macroscopic manifestations Like whether it's quantum mechanics Or quantum field theory They all describe behavior at that energy scale Essentially the system is still a black box You still don't know at its most microscopic level What kind of dynamics AI is the same Whether it's a black box or not Is actually all relative We indeed don't understand language models to the level of
Neurosurgery-level precision It's not that I understand this behavior To the extent of Saying this behavior is caused by which neuron Which artificial neuron's which activation Producing this behavior We don't have that Haven't reached that level of understanding Except in some very sparse Very small networks Like Anthropic Has this so-called Interpretability Interpretability team They might do some similar work
But in practically usable language models We haven't reached such understanding But it doesn't mean we have no understanding at all For example Scaling Law It describes how models at that scale With model size and data improve in perplexity Under this metric get better and better Mm-hmm so you say there's no understanding at all Well if Scaling Law
Doesn't count as a small part of understanding Then can we also say We actually don't understand this world at all either This world is also a complete black box So Scaling Law is a scientific law It's an empirical law An empirical law Right But The boundary between empirical laws and scientific laws is quite blurry For example If we look back at these thermodynamic various different laws
The first law, the second law The Clapeyron equation and whatnot all this messy stuff When they were first discovered they were also empirical laws It's just that later on as time went by we gradually understood their microscopic mechanisms Then they might have become scientific laws Right, I think maybe something like Scaling Law or things like that Right now it's definitely still very impressive But in the future, when the technology becomes more fixed
and people start to understand it more and more the microscopic process will it become a scientific law if such a definition exists I think it's possible Can you explain in scientific terms this so-called intelligence emergence First of all, this term itself isn't very scientific So naturally there's no way to use scientific language to describe something unscientific Intelligence emergence?
Well, I think intelligence emergence to me it's more of a subjective feeling rather than an objective phenomenon When many people talk about intelligence emergence what they might have in mind is that previous language models could only do one type of thing like only translation only analysis only certain things But now it seems like the model can do everything But this thing Again, I think it's like to me
it's more of a technical emergence rather than a behavioral emergence It's that through research we discovered how to do this kind of large-scale training and then be able to lift all capabilities across the board I think this is the more fundamental thing As for intelligence emergence itself Actually, I think, um everyone probably has a different definition in mind Right Your definition is
To me, there's no definition Haha, to me The only qualitative difference is whether there's been a technical breakthrough that allows us to scale up and lift all capabilities across the board This, to me is a well-defined thing You ended up choosing AI between quantum computing and AI How did this shift happen Right, I think I still spent some time understanding
where the bottlenecks lie in both directions I think the good thing is they both give young people opportunities The good thing is both have opportunities But quantum computing seemed to you to be closer to your main path at that time, right Well, that's why I needed to understand the details Because after understanding the details, I found out it's not It's the opposite Because quantum mechanics Oh, not quantum mechanics I mean quantum computing
I think its main bottleneck right now is actually in the experiments It's not about how you design those algorithms or design those operators It's more about how you implement it experimentally That's something I'm actually not good at It's actually quite unrelated to many things I'm interested in It's actually relatively unrelated On the other hand, the things related to me are more Like AI, as I just mentioned It's more about having an idea
and then you can use some numbers to verify it This numerical aspect in AI might be training a model or something like that Right and this is actually quite similar to doing physics It even is That's why I've always liked to compare this With 18th century physics Make comparisons It's more like physics of that era In that era theory and experiment weren't separated There were no theoretical physicists Experimental physicists You just did physics Just did physics
You could do experiments yourself And also do theoretical speculation I think AI is a bit like that era So actually The distance from theoretical physics to experimental physics Is farther than directly jumping to AI Farther mm-hmm Actually farther And in terms of interest it's also farther You don't like experimental physics (I think) You don't like doing experiments I think, um It's indeed not where my interest lies Mm-hmm although I'm not willing to do it myself
But I am indeed very interested In knowing how other people's experiments are going Hahahaha Doesn't AI require doing experiments Yes, but it's more like numerics Right it's not quite like That thing where you go to the lab and build an optical table And whatnot You also have to I think experiments are really something Maybe because I don't understand I haven't reached that level
So some things seem quite mystical to me For example Everyone knows how to build this optical table But some people can build it for you Some people just can't build it after 6 years This is hands-on ability I just don't get it Hahahaha I sometimes think This thing is a bit mystical Oh Mm-hmm so numerics are still better Numerics are much clearer
Right right right, for me Doing numerical experiments Or like AI Training models And studying various different techniques To look at certain details This thing is actually um, is I can understand why it's done this way Mm-hmm but when it comes to building the table I'm completely at a loss You've done it before I of course have Everyone has probably done basic Physics students definitely all
Done basic experimental training But more importantly I have many friends who do experiments Whether visiting their labs And watching how they do experiments Or chatting with them about how to design experiments I feel like there are many things I can't quite understand But indeed some of them do it well Some don't do it well So you say doing AI research now Is like doing thermodynamics research in the 17th century What it's actually expressing is
Although everyone can't very clearly Scientifically explain and understand this thing But it won't stop it from developing Right it's more like Why Comparing to thermodynamics of that era In that era Everyone actually didn't understand the microscopic theory of heat Everyone didn't know what heat was Just like now we can't understand Right just like now Everyone can't understand
Which matrix element in this language model Is doing what Actually everyone doesn't understand But it doesn't prevent you from having some good empirical laws Like various laws of thermodynamics And various Scaling Laws now So From this perspective it is From this From the perspective of this direction Yes At this level It's something like And from a researcher's perspective
It's that other point I was making Theory and experiment actually go hand in hand So how did you end up interviewing at Anthropic How did your Anthropic journey unfold I think the main thing was I had former colleagues at Anthropic Haha yeah Former colleagues So Anthropic actually has a lot of people from physics backgrounds especially theoretical physics backgrounds Why is that In terms of their hiring choices why did they choose this group of people I think
Of course, many Mmm A lot of people might come up with reasons like physicists are good at this or that But from my personal perspective I think the main reason is still connections Just connections Because in Anthropic's founding team there were actually three or four fairly technical people at the time and two of them are still very much on the technical front lines in leadership
Both of them came from physics backgrounds And the people they might have recruited also came from physics backgrounds So it just continued that way But actually, at this stage after I joined they barely hired anymore people with no AI background at all. Right.
So it's also a I think it's also a product of its era Right, and then Anyway, I decided to go into AI at that point So I tried to reach out to a few places And then You only looked at Anthropic?
No, I also reached out to OpenAI and GDM That is, Google DeepMind But Google DeepMind because it was too slow back then Hahaha, so I didn't Just didn't end up in consideration But Too slow You mean their interview process was slow But later Obviously later They made huge strides with Gemini They moved really fast after that Haha yeah
And then Anthropic Well anyway What about OpenAI I reached out to OpenAI too But OpenAI probably didn't find a particularly good fit in terms of projects and people And Anthropic was because I reached out at that time And then it was my first that manager my first manager And he used to do theoretical physics too And he said at the time
We're trying to do reinforcement learning Trying to do this kind of large-scale reinforcement learning There are many scientific questions to understand That was in '24 Around August or September At that time actually reinforcement learning wasn't as mature as it is now Back then most people didn't really know how to do it Because o1 hadn't been released yet Back then, o1 was just, yeah, yeah, yeah It was just Just Everyone knew it was out there
But no one had seen the results yet But Anthropic didn't actually know how to do it back then They had a general idea at the time But there were many details that needed careful study So he told me, hey There's this thing Would you like to come interview And I thought, hey It might be a good opportunity How did you perceive reinforcement learning back then No clue, haha You roughly know pre-training Post-training yeah exactly
I roughly knew the pipeline But I didn't really know the specifics of how industrial-grade language models are trained Mm I only knew how it's done in academia Right, and then So looking back, what I knew then In hindsight, it was basically nothing Right, and then, mm More than anything I felt at the time that this was an uncertain thing But it was a good opportunity
So I just went for it Mm Of course there was some interview prep and the interview process, right How did you prepare What did you talk about At the time Who did I interview with Anthropic, some of my later colleagues interviewed then And then The interview questions weren't too hard Anyway haha right But for me I didn't know how to prepare back then either I just went through all the courses I could find
Learned everything I could on my own Did all the assignments I could do And then I hand-rolled a whole system myself That Andrej Karpathy He has that famous project called I think it's called nanoGPT or something Anyway, he has one where You can train a tiny GPT model inside a Google Colab Notebook And I hand-rolled that And then I went to the interview And that was it Right And got the offer pretty quickly And then, right
Got the offer And then Your first direction was large-scale reinforcement learning Actually, back then two teams reached out Two team managers Came to talk to me One was doing evaluation Basically model evaluation And the other was doing reinforcement learning I chose reinforcement learning You chose reinforcement learning back then Because it was more unclear, right Mm-hm, and back then
Anthropic wasn't the big company it is now The company was actually quite small back then How many people When I joined Our big team only had about 10 people Or 10 people Or 11 people What was the big team called It was called Horizon Right, and then Back then that big team So like the parallel teams to this big team What were they That big team later basically became
The team that covered every aspect of reinforcement learning Right, but back then Its whole larger group Was just reinforcement learning The whole larger group Well, for a startup It's hard to say what that group's goal was Because They probably had many different goals at various points But just at that stage The main goal was probably doing reinforcement learning Right, and then Of course there were also teams more focused on data below that
Teams more focused on environments and infra and infrastructure And teams more focused on research and algorithms And the team I joined Was more on the research and algorithms side Mm, how many people did Anthropic have back then Uh, back then probably Around seven or eight hundred in total But the whole company Seven or eight hundred, right What was your first impression when you joined
I think I think my impression of Anthropic Has actually been pretty Pretty consistent I mean, after joining My impression of the company was that it had very strong execution It's just that It's actually a relatively top-down company Right and then So after many things are decided They go all in And The atmosphere between employees in the company is actually very good
Everyone Doesn't hide things And especially when I first joined it was very small So Everyone knew each other So the atmosphere was very good And I think If you're doing Just doing language model related things Actually looking back now That was a very very good learning opportunity Where you could get exposed to every aspect of Training this model
And could find corresponding people to ask Did Anthropic at that time already have What we all know now That very firm bet Yes yes Where did this bet come from Why did this bet exist I don't know its complete source One obvious source I could see Was the previous generation model After Claude 3 was released On Twitter, which might not have been called X yet
Many people on Twitter were discussing That Claude 3 seems to write code better than GPT-4 In that era GPT-4 was still a model with a huge gap from everyone else So being able to do one important thing better than GPT-4 Was quite impressive So it was discovered through trial I think at least that's one of the reasons It was very quick feedback on the market
Right, this is also something I think this company is very strong at Its execution is very very strong Once it gets a signal That makes it feel very reasonable Something this company should do Then it will go all in It doesn't have that redundancy of large organizations Why was its coding definitely better than GPT-4 Can't say haha Oh there is a reason There is a reason There is a reason, right
But it's a random reason Not because I chose this So this result happened It's a purely technical reason But Indeed, I don't I can't determine whether it was randomly tried at first Or deliberately chosen If you ask me to guess I would definitely think it was randomly tried Oh A purely technical reason There was someone who did something There was indeed a certain team that did something Was it top-down Or bottom-up
I think at first it might have been bottom-up But later it became a top-down thing To quickly capture some market Right, internal and market signals Right right I think this is Need to quickly go all in Right right I think this is something Anthropic is very very strong at It's very very reactive Reacts very quickly Where does its execution come from Comes from this person Dario Comes from his certain trait I feel like
Mm-hmm Anthropic As a company It can implement this Relatively top-down mechanism Is a very unique thing Why Because Implementing top-down actually has one very difficult point That the person making technical decisions Must also be the company's decision maker Mm-hmm Mm-hmm First of all you have to be technically convincing
Then the researchers below will You can then Convince the researchers below to do this thing On the other hand, you have to be the decision-maker at the company You have to be able to take responsibility for the company Anthropic has that going for it That is, its technical leader Is actually a cofounder of the company Who are you referring to?
Not Dario Amodei Like Jared Kaplan And Sam McCandlish And both of them are cofounders of the company They make this decision themselves It's their company So they have the authority to do this top-down Then Dario, as CEO Does he get to say yes or no?
I don't know about their decision-making discussions Hahaha okay What role did Dario play?
I can only say The technical leader has the decision-making power I can only say For my work at that time The person I worked with the most was Jared But is this hard for other model companies?
Very hard. For example, OpenAI couldn't do it <b>When Ilya was there, wasn't it possible?
<b>When Ilya was there, it might have been possible But Ilya later, on one hand I don't know for what reason He seemed to have lost the ability to make decisions And then he left So...
So...
What about other companies?
I think other companies all find it pretty difficult Even Gemini finds it pretty difficult But I think Gemini has a completely different playbook It's a bit different That is, um I think big companies and startups Their playbooks are fundamentally different Because for startups, what's important is to make bets That is, I have to bet on something If I want to bet It means there's risk So that means
I can make decisions very quickly And push decisions through strongly So perhaps in this situation Top-down is a big advantage, I think So I think organizationally, Anthropic Has an advantage over OpenAI But as a big company It might have a different mindset Because a big company's mindset might be Not only can I minimize the gambling aspect But I can also have reserves in every area
And then if anything succeeds I can catch up And if I succeed at something myself I might even take the lead That's probably the big company mindset So at Gemini Google is a very traditional Very bottom-up organization At the company level There may be some well-defined frameworks To evaluate whether your work is good or bad To guide you to do things the company needs But essentially It's still you deciding what you do yourself
So you think Anthropic can make bets (referring to betting heavily on coding) Because of its unique culture Organization and culture, yes This sounds like Something other companies should be able to do too But it's very strangely found that Other companies find it hard to do While Anthropic can do it Yes, I think it still requires technical credibility Or the company's leaders need to have credibility I think this is actually quite difficult
You're not even talking about the CEO having credibility It's the #1 technical person having credibility Yes, to me I think it's very important for the #1 technical person to have credibility But at the same time The CEO may not have become an obstacle Yes Is this hard?
Ah, I think it depends on your This cofounding team Whether there's enough mutual trust This is also crucial I think Anthropic is also strong in this regard Very strong among startups Its cofounding team Not a single person has left the company If you look at their past They are a group of people who have truly fought battles together In the past
They originated from, they were all former OpenAI employees Mm-hmm right And Many of them were even Co-authors on a series of key papers Co-authors, because like The Scaling Law paper Was Jared Kaplan and Sam And of course Dario And some others Maybe Tom Brown was there too I can't quite remember if Tom Brown was there And the GPT-3 paper had Tom Brown
And Benjamin Mann And Jared Kaplan and Sam were both there Dario was also there So they are people who have been in the trenches together I think mutual trust is still very key Mm-hmm, many companies might just be doing their thing And can't even keep this small group united Then how can you expect This big company to stay united You're talking about OpenAI right Mm-hmm hahaha When you joined Anthropic
What was the most important Project the company was working on Did you participate in that big project Right At that time the goal was to do large-scale Large-scale reinforcement learning And use it to improve coding ability That was the most important thing at that time Mm-hmm and we were doing this This team The research focus at that time was this thing This is also why this team later gradually grew bigger And became more and more important
And The final result was Everyone trained this 3.7 together The Claude 3.7 model Hey you said internally there was a 3.6 This is Not internally called It's from the outside Claude 3.5 actually had two versions One might be the June version Another October version, and then You can also see Anthropic this company Used to have no product capability either Actually calling two models by one name
Hahahaha So later outsiders to distinguish Called the later version of 3.5 as 3.6 So Anthropic followed this outside convention And called it 3.6 Called this newer model 3.7 So If you look at the actual product timeline of this company It's actually 3.5, 3.5new, 3.7 How could there be a 3.5new
What were they thinking Haha I can only say Anthropic at that time Probably really had no product ideas So your first project was 3.7 or 3.5 3.7 3.7 Or 3.5new 3.5new Actually I Didn't participate, almost didn't participate But 3.5new Already showed signs of coding Really? When you first started
Really? When you first started At the time of 3.5new Already saw Anthropic's model Would be stronger than other models in agentic coding Why is that Can't say hahaha So when you went in It was exactly when They knew about this thing That management also knew about this sign Right and when they wanted to make bets You had very good luck I think I think, right I think when I joined Everyone had definitely already seen
This thing could be done and was important But didn't quite know how to do it And when I went in I was researching with everyone how to do it Right so the method was large-scale reinforcement learning Right from the big picture perspective But of course There are many technical details that need to be researched What know-how is in here Haha there are lots of NDA (Non-Disclosure Agreement) contents Hahaha
Would NDAs be written in such detail Actually in principle In principle Employees cannot during their employment and after leaving Disclose any information related to the company's internals Of course in reality Everyone probably has a sense of degree in their mind That is If this technology hasn't been made public Definitely won't discuss it publicly But I think although I can't discuss it publicly But
I think Doing simple things cleaner than anyone else Is the most critical thing What do you mean by clean You also used this word just now Right it's it's I think there are many fancy techniques For example doing reinforcement learning The simplest algorithm is Policy Gradient But that doesn't mean it's the only algorithm There are other algorithms Like various complex Search algorithms and such But
Are these complexities necessary And these complexities might bring you Some efficiency That is efficiency improvements But they might bring you some For example Infrastructure difficulties Then how do you trade off these things These things actually need to be understood in research How to balance these different factors And choose the best path The most stable path
Right and I think a lot of know-how Is actually in these These details How to handle all these aspects of details Then how was coding described as important at that time I think Is it considered a branch of large language models An important branch Or what I think everyone might have different ideas For me For me There are two reasons it's important One reason is One reason is
What Anthropic has been talking about That coding itself Is also part of language model research If you can do coding very well It might make your research efficiency Improve by multiples Mm-hmm, forming a research flywheel This is one reason For me Another reason Is because coding is actually a model Using tools and interacting with the environment A very good abstraction
First of all the benefits of this abstraction What are the benefits of this abstraction For example the feedback signal is clear And data is abundant And Actually it's very hard in other scenarios To find Tool-using scenarios that have both these traits simultaneously So for me this is a good abstraction Some research done in this area Might be useful for more general
Those abilities to use tools and interact with the environment Some useful Useful lessons What was Cursor's status at that time At that time Cursor was still a Pure product company I think in a sense It seems like before I went to Anthropic During that period Claude and Cursor were both in relatively underdog states
And somehow at 3.5new, which is 3.6 The outside world's 3.6 generation First the model capability went up Then Cursor discovered This model Could really do this kind of Agentic coding tool It's just a shell Right but this shell wrapping this model Suddenly let the public experience Not the public The public here means the software engineering community At that time, um
I realized Wow, this really seems like a productivity tool So after that, it just took off So around that time Anthropic realized Cursor is a future competitor I don't know about that You'd have to ask Dario, hahahaha, alright How was 3.7 made This was a watershed moment For Anthropic It was a watershed model I think for Anthropic's post-training It was a watershed Before 3.7
Post-training was in a relatively Um Small-scale And It was more like patching things up That kind of state for the model People didn't value post-training, right?
It's not that they didn't value it Everyone from the start For a long time No one really figured out How post-training should scale up Oh, but during that period Whether OpenAI or Anthropic Or even like China's DeepSeek, right They realized how to scale this up And how to scale it up You have to find The right environment Where the feedback signal is clear enough
And the environment itself is a strong data source And then On top of that You can make the training very stable Then it can work Yeah, I remember back then Actually no one knew What OpenAI's secret project was Just knew it was called Strawberry Called Strawberry And then, um People thought it would bring a new paradigm A new paradigm of post-training reinforcement learning But no one knew much more than that Yeah actually
I think when I joined Anthropic People already had a pretty good idea About how this should roughly be done The general direction of how to do it And then Later on, as time went on As I learned more and more about this field I discovered At that moment The way OpenAI was doing things And Anthropic were actually quite different How so?
In terms of the specific algorithms And the way they used data They were actually quite different Although both are called post-training and reinforcement learning Um, although both are called that But of course I don't think those are the fundamental differences In terms of the big picture They're the same They found some Found some very regression-like Very clear signals Very objective And the data itself is relatively clean And learnable for the model
And do stable reinforcement learning training on top of it In the big picture, that's the direction But the specific implementations differ quite a lot But later it was proven The specific implementation Each company actually went in different directions But they all succeeded Um, and at the time OpenAI's goal wasn't coding either From what I understood, the narrative was Pre-training as the first paradigm The gold mine is almost exhausted So now we're opening a second gold mine
Which is post-training and reinforcement learning To let the Scaling Law continue, right I think for a long time OpenAI had this idea I don't know if their thinking has changed now For me My thinking has gone through shifts Around the era of 3.7 I actually felt like I At that time I also had the feeling that pre-training was almost Party is over This kind of feeling
And right when you were about to join Right when I first joined And at that time when doing these 3.7 related These kinds of experiments I also once had this idea But later as my understanding deepened I felt I discovered Actually there's still room to do things And um Pre-training Scaling Law
It doesn't tell you to keep getting bigger It's actually a very systematic framework That can tell you what kinds of things are more effective Right mm-hmm And So later discovered Actually there are still many things to do The fact is Later Anthropic And Gemini's pre-training Have also been continuously progressing OpenAI itself was stuck for a long time Haha, are they paying attention to pre-training again now
They should have been paying attention to pre-training for quite a while It's just recently they might have made some progress So pre-training and post-training as two paradigms Neither has reached its plateau I think neither has But you say predicting how far it will go Can't do that Right I think I think reaching a plateau has Two possibilities Two possibilities
One possibility is the technology itself has reached Where you still have things you want the model to do But these two technologies just can't teach it Another possibility is The things you want to do have reached a plateau I think now it's the latter Right now we know oh There's a Chatbot You can teach it to do this And then there's coding You can teach it to do this And then we don't know
Right, don't know what else to teach it That is to say This model is still a very smart kid Right you can actually teach it many things Right but we humans as teachers Now don't know what the next thing to teach is Right right Or how to reasonably teach it Using current paradigms Speaking of 3.7, what other know-how How many months did this take This finally all in all
From starting training to release Probably took about four or five months From when you first joined From when everyone started Doing research for this thing That probably took two or three months And then later from starting training to training completion With bumps along the way Many things to handle And there was a lot of new infrastructure Actually infrastructure is really important Very time-consuming
And then probably took about two months or so What important work did you do in it I don't think I did anything important Hahaha I think My personal contribution I personally My contribution to any model My statement Is always I feel like I'm not that important to that thing I think more importantly I was very lucky To have the opportunity To join an important project at that time
And did some things Mm-hmm, because in a sense I think AI in recent years This thing itself is unstoppable It doesn't depend on whether you do it or not If you don't do it someone else can do it just as well So I think in this era Actually all things that give individuals credit Are somewhat hyped
Suspicious Of being hyped But indeed I think for me I am very lucky Being able to join at that stage was a big deal And, well, I learned a few things So you were lucky to be there at that stage At Anthropic this company's large-scale reinforcement learning team what did you do I think around the 3.7 era, what we mainly worked on was still
working on this agentic coding thing how to scale this thing up or how to prepare like how to set up all kinds of environments and data including what algorithmic problems you'd run into Most of the research at the time was on this part Any tips on this?
Looking back, there aren't really any particularly useful tips, haha I think When it comes to technical tips this is actually something that on one hand, people are really eager to hear about but companies won't let you talk about and in reality isn't very useful Why?
Because a lot of algorithm design isn't actually independent independent of the algorithm itself It's very strongly dependent on your infrastructure A simple example is some companies there's a problem people often discuss which is during reinforcement learning the sample(采样) machine, the one that generates these these trace(轨迹) these token(词元) — that machine and the trainer(训练器) used to actually train the model
and then update the model weights — that machine these two machines might be different But the difference is partly due to numerical differences and partly because of using this kind of asynchronous training architecture so naturally fundamentally they're different So different companies might have different degrees of this difference so your algorithm design will also differ Some companies might have these two differences
being very, very large then the biggest part of your algorithm might be how to control this and how to keep the training stable Things like the actual training effectiveness will be weighted slightly less But some companies might have particularly excellent infrastructure so the difference between these two isn't that big then you can probably spend more effort on the training effectiveness So a lot of these small tips
are actually not very useful A lot of know-how is actually not very useful I say this because I've indeed noticed that many other labs — well, not people in these three labs probably really want to know like how Anthropic does this or how Gemini does that But sometimes I'm reluctant to answer One main reason is that fundamentally I think answering this question would mislead them
Modern AI training is a large system You actually need to understand all aspects of this system to have a holistic understanding of what makes something useful because of what rather than saying the thing itself is useful What happened from 3.7 to 4.5?
Both pre-training and post-training, yes And um Of course it's just more scaling up And data Whether it's data or training the compute is at a much larger scale But I think in terms of paradigm, there wasn't anything particularly major that changed How many people was it when you left Anthropic?
Close to 2,000, I think More than doubled Ah um So during your time at Anthropic it happened to be going through its most dramatic transformation Ah, I probably just caught the tail end of it being a small company Actually, I think after three or four months, the company already started and suddenly there were way more people Did the culture change?
There were still some rather chaotic phases And then Especially around the time when I left The period right before I left I think culturally, it went through some some chaos Because some people came in from outside and there was probably some conflict with the original culture Oh, the previous culture was I think before, it was just pretty simple Yeah, it was very simple It was more like a small workshop Everyone was friends
And everyone knew what the others were doing And No one was particularly you know, doing too much self-promotion or anything like that Doing pointless things No one was doing pointless things Everyone had a lot on their plate And the company back then probably had a stronger sense of urgency And later on, people probably felt that with more people this kind of culture would definitely take some hits What kind of atmosphere did it bring?
I think There were indeed some people I personally didn't like very much Of course, that doesn't mean they're actually bad I'm just saying I personally didn't like them I mean, I probably don't like people who talk a lot in this field Like, I think 'idea is cheap' Ideas are cheap Many ideas are actually quite obvious, everyone knows them The hard part is how to implement them How to break it down into small
actionable steps and actually get it done I don't think I like those who spend a large part of their day on Slack, I mean Slack is a workplace software used in the US and spending a lot of time on Slack talking about grand principles I think it's just not very useful, haha Why did you suddenly leave later on?
Had you completed some milestone at the time?
How long had you been thinking about it?
At the time, I think I'd been thinking about it for a month or two about a month or so a little over a month That was fast, yeah yeah I think one aspect was Um, it was I actually didn't really agree with Dario's anti-China stance Ah, I think as a company CEO For him personally
whatever views he holds, I think it's fine But as a company CEO I think pushing this view to such an extreme was a very emotional reaction Yeah, and this was a relatively minor reason But on the bigger picture There are many companies Like I just mentioned There were some cultural shocks at the company And including myself I probably wanted to learn some different things I mean, Anthropic
is after all very focused And you might be doing If you really want to work on everything related to language models in all aspects And working on this kind of tool use, this Agentic stuff and coding and such then Anthropic is actually great You can learn a lot But there are many things Anthropic doesn't do For example, no one at Anthropic is doing this kind of multimodal generation
You want to learn but there's nowhere to learn it And Anthropic probably didn't spend too much energy on this kind of more low-level engineering infrastructure Right So probably wanting to learn more things was also one of my motivations for leaving at the time What percentage was the anti-China stance?
Because of Dario's personal reasons I've in public Combined say 40% But this number anyway just listen to it This number just tells you It's not the main reason But it is indeed a very big reason Not controlling Not a controlling reason Right not a controlling reason But it's a majority holder reason Your choice is also quite amazing Because most people
When it's still an underdog Joining will create more emotional attachment Willing to accompany the company for a longer time But you instead jumped to Google Because many researchers once they enter Google They feel Google doesn't give enough scope Mm-hmm So they instead want to jump to places like xAI Or smaller organizations like Anthropic
Your move seems to be the opposite Right I think Actually depends on what you yourself want If what you really want is I have a very clear Like you said a very clear scope And this thing Is closely related to my final product model I must get one of my ideas Into this model Then Google might be a very bad place Because after all there are so many researchers
So many already mature organizations Doing this thing Has a very complicated process But I think Gemini is very If what you want is research freedom Freedom to explore And want to learn from broader humanity I think in this world You probably can't find a second place stronger than Gemini So So it's I think
Essentially it still depends on what you yourself want But I think many people when they leave Regardless of where they leave from After switching to another place The main reason they might feel unhappy Is because they didn't figure out what they wanted For example if you came to Google But told me At first you thought you wanted research freedom And more motivation was learning And after you went Discovered you still wanted product impact
Then you might feel very uncomfortable haha You don't pursue impact You also said this Now AI is a very large system And is a Very large collaborative effort What are you pursuing in it I think it's divided into stages I think At Anthropic After experiencing too much Product-related things I might also want to change my mindset
To learn some different things But you say is there any day I might switch back to this mindset And want to produce some product influence That's also possible How do you quantify product influence This is very clear internally Really Hard to quantify I think Because when publishing papers there was still first author This kind of lead author Now Mm-hmm actually there's no way to quantify The reality is there's no way to quantify
This is also why I think in this era Actually talking about each individual's influence Is a very very ethereal thing I think essentially it's still the organization that did Such a thing Or the world needs this So producing product impact is a subjective feeling At least on the model side it is At least on the model side it is Right and then Of course actually you can I think you can
The details are about what things you yourself have done Specific technical contributions And the effects produced technically This can be discussed objectively But more subjective things are You were saying how much did this effect account for in the final product No one can really say for sure Can you describe what you did on 3.7 What kind of technical work did you do that actually had an impact on the model It was mainly related to agentic coding
and the environment around it And some algorithmic work as well On the algorithmic side, it was mainly about making the training more stable To be honest But I do think there were definitely some algorithmic improvements but they didn't achieve particularly ideal results To be honest It's definitely better than the previous algorithms Yeah But I don't think that was my personal contribution
I think it was a collective effort from everyone, haha Right, every time I ask you you always say it's a collective effort It's not an era of individual heroism anymore Right, I think the era of individual heroism for language models has probably passed When was it?
It was the Transformer moment Right, at that point when the technology hadn't yet reached the scale-up stage The person who discovered that technology might be a hero Or a small group that discovered it might be heroes After that technology was found for probably a long time from the model side, it's all been I think more about collectivism whether this group can work together whether they can toward a common goal spending their own time together
and their own energy That's the most important thing Rather than what each individual contributed The reason you say collectivism is because the capability actually comes from AI, is that right?
The reason I say collectivism is because I think AI as a field is fundamentally simple Like I don't think there's any Except maybe that leap moment where the idea might require some really deep insights In the process after that many ideas are actually very trivial (微不足道的) Very stupid, basically Anyone could think of them Anyone could do them It's just that you got lucky and happened to seize the opportunity to do it
Including when you described Anthropic doing coding, it seemed like there was some randomness to it too But you have to seize it Right, right. But I think when it comes to coding
Right, right. But I think when it comes to coding it might still involve more than the technical stuff on the model side a bit more corporate heroism, perhaps That is, whether you can bet on it fast enough Yeah, Anthropic was indeed very strong in that regard But if Anthropic hadn't done it today some other company probably would have I think so. It's inevitable
So it's all about emergent capabilities in AI It's just about whether you can seize that capability Whether it's a company or an individual Right right I think before usable language models before large-scale language models emerged a lot of things were not inevitable Like whether someone could invent something whether a language model could be trained at scale and whether the GPT paradigm could be discovered There was a lot of uncertainty
But like you said, for example if there had been no Google Brain back then Transformer might not have been discovered It might have taken many, many years before another well-funded organization with talented people discovered it That would have been a huge impact But after entering that stage especially now, the situation has reversed Any organization that wants to stop AI progress can't do it
Anthropic has Anthropic is very concerned about AI safety But does Anthropic have the ability to stop AI development?
It doesn't If you stop developing Others will continue Your voice will only get smaller Right, actually right now it's It's more like this kind of situation The world is pushing us forward Rather than us pushing the world forward I feel like in the future it'll be even harder for us to stop AI Haha, I think we already can't stop it I just think Trying to prevent one specific thing from happening with AI
Probably isn't the right mindset to begin with This also relates to what we were just talking about Because we were just talking about Anthropic One of Anthropic's very important motivations Is so-called AI safety I think when it comes to AI safety The motivation when it was founded Right What does that have to do with it now The relationship now is complicated, meaning A natural
Question people might ask is A company focused on AI safety Why is it now training frontier models Anthropic's explanation is that First, I need to have the most cutting-edge model Only then do I have a voice to push my AI safety agenda So actually, its thinking all along has been I want to build the best model in the world Everyone will have to listen to me To push forward my safety policies
But from my personal perspective I think this idea is very naive Looking at this now It's not going to happen What's more likely to happen is Everyone will have great frontier models And you won't be able to stop anything from happening Maybe for this issue What we should focus on and think more about now is If you really want to avoid AI Bringing about some crisis
What would be a more self-enforcing approach Let me give an example of a self-enforcing mechanism Like nuclear weapons, for example Nuclear weapons are also something that everyone thinks, hey This might have the power to destroy the world But with nuclear weapons, in the end The way they were ultimately controlled Is multi-party control In this world There are many countries with nuclear weapons They all have the ability to destroy each other
So stability is maintained through this kind of balance of power I think if you want to stop AI from doing bad things Maybe Ultimately, you'll need a similar mechanism to achieve that Rather than hoping Pinning your hopes on One company setting a law to do something Mm right And it sets it itself It can only govern itself Mm, you also just mentioned Anthropic has an interpretability team Right How far has their interpretability gotten
In some relatively simple Relatively sparse neural networks They can do some interesting research For example Look at what a certain output Or input text or image What its internal representation looks like And then maybe you invert that representation somehow
What kind of thing it can output after that Doing this kind of research You also just mentioned a viewpoint That AI is essentially simple Can you describe what you mean by this This is a conclusion Right, I think this is This isn't even a conclusion It's just my statement(陈述) It's my statement It could be right or wrong Oh, and my explanation for this This is your view Right, my explanation for this
My explanation for this statement is I think the reason it's essentially simple is That you can run experiments Like, compared to things that are fundamentally difficult Like physics, for example The difference is With that Without experimental data at that energy scale You simply can't understand the theory at that energy scale But AI isn't bound by this(约束) It doesn't matter if you don't understand it It can still move forward
And also right now The fact is I can do any experiment I can think of It's just that possibly I need some time To scale up the compute Or get the infrastructure ready But there's no fundamental difficulty Right So I've always been saying I feel AI doesn't give people the sense That it's hitting a wall because First, you can try many things
Second It's not that everyone has run out of ideas With no ideas left to try More often it's that there are too many ideas Need to try them one by one Take time Mm-hmm Feels like humans are so insignificant In front of these experiments Yes so I think very soon AI might start doing experiments itself How soon is very soon Within 4 months I think in the next 6-12 months AI will do experiments itself
I think of course this statement Is not very well-defined Sorry I said something very vague Like um AI improving itself Or speeding up its own development process This is actually already happening Right Like we discussed earlier It's already helping us To achieve some of the things we want And speed up our experimental pace But I think in the next six to twelve
Sorry What it currently can't do is Whether it can From start to finish complete an AI research project Like not only can it write this code It can also run this experiment Run this experiment Can also see the results See the results Can also analyze the results Analyze the results Know where it did wrong Then propose new hypotheses Design new code Run new experiments This chain is not yet complete But I think
This chain Might be the next thing to gradually become complete Based on your various reasons At the moment you left Decided to leave Anthropic What were your expectations for this company's future I think when I left I was actually quite pessimistic about this company But later obviously I was overly pessimistic Hehehe why pessimistic The reason I was pessimistic at that time was I think when I left Anthropic
Anthropic actually um Its main revenue source was API Selling tokens And This is a bad business Is a bad business Because this business Is only a good business for one company Which is Google Because this This business eventually leads to price wars Eventually it will be price wars In price wars if you don't have the complete chain
There's not much advantage But later Anthropic obviously on the product side I think indeed there were many clever ideas Did many good things Whether it's Claude Code getting better and better And Claude Cowork And various Work and efficiency related things All slowly converged So it feels like it has now become more than What I thought at the time
If you ask me which of OpenAI and Anthropic would die first Of course they won't really die Just which would become less important first At that time I would think hey Maybe Anthropic would become less important first But later first OpenAI got punched by Google Then Anthropic itself got on track So now it seems Anthropic has more advantage Haha Have you ever regretted it Mm-hmm not really I think for me personally
My personal motivation was still wanting to switch places Improve myself I think for this For the thing I wanted to do This choice wasn't wrong You also mentioned Anthropic's products have many clever ideas Especially this year Like Cowork and such Where does this come from I think I didn't see Cowork's development process So I don't know And Claude Code I think the person, the product Might also
Really have some opportunities for individual heroism Is it a researcher or a product manager Boris Cherny I think Claude Code almost At least the beginning of this thing Was him wanting to do this thing himself To improve his own or colleagues' work efficiency Finally became something Important to everyone What kind of person is Boris I didn't have too much personal contact with him
I mostly just saw his work, when at the company He's a researcher right Right but he's mainly on the product side So Anthropic does have a dedicated product department Didn't used to be so separated Later had a separate one Right, Anthropic seems to really understand AI products Right I think I think this is why When we first started talking Felt that product managers Might still be quite hard to replace with AI currently
Hahaha mm-hmm Good product managers Hey he doesn't seem to be the previous generation of product managers He's not the kind who arranges features and such He seems to know how to collaborate with AI Some kind of product manager Right I think the previous generation of product managers might But not entirely The previous generation also had some Interaction Interaction-level changes But every interaction-level change Actually brings a very big product
Like maybe Douyin Is a product with interaction-level change Then it immediately brought huge Mm-hmm opened new directions And I think Maybe Claude Code is also a product at this level Claude Code and Cowork were both by Boris I don't know who did Cowork OK I already left I see Then tell me about after you arrived at Google DeepMind, has your work focus changed Work focus changed or not
Mm-hmm still Some changes happened And I anyway mainly focus on Doing ML coding And some relatively long horizon things These two things 其实刚才都都大概提了一嘴 Like ML coding Actually just now both were roughly mentioned Actually it mainly wants to achieve The complete AI training itself process we just talked about
Of course in this process There are many practical problems Many practical details to solve I think in the big picture Everyone actually has quite a consensus on how to do it But still back to details There are many things to handle in details Like how to choose appropriate data How to choose appropriate feedback signals And it brings new infrastructure challenges And
Now it's about slowly figuring out these things Slowly figuring them out And Like long horizon Is the other thing we just talked about That is wanting to achieve That this model can Still that slogan Train with finite But use as infinite I think wanting to make this training Length longer and longer and longer
Might not be making a single training This segment's length keep increasing Might not be a very realistic solution But a very realistic thing is How do you under limited context Do longer work Actually if you think about it Humans are actually like this Human context is actually very very short If you ask me now what I ate last night I can't remember at all Ah you might still remember Hahaha I can't remember at all Because why
Because it's not critical to my current scenario Right Like even if I knew what I ate last night So what So I choose to forget it So human context is essentially very short But they can selectively forget And selectively retrieve To bring back these important Information relevant to the current scenario So I think that might also be for me A very interesting direction These two things are actually somewhat related
Somewhat complementary Why, these two things Actually both are within the large category of models using tools and with environment And different models Different people interacting Within this category The node everyone completed in the past Is Agentic coding, which is both tools and environment Environment is this virtual machine Or interacting within your own computer
And this thing Actually horizontally it grows different usage scenarios Then doing AI research Is actually horizontally Another scenario in this scenario This scenario Actually not only horizontally is it a new scenario Vertically It also makes the scale of this thing longer Because completing a code completion or something
Is a very quick thing But doing a complete AI research Or doing this kind of computer science research Is a very long process Right so It's actually like a T-shape Horizontal extension Vertical extension too Is long horizon still a scientific problem Mm-hmm there are scientific problems Also engineering problems I think its scientific problems are more about
How to try different solutions After trying in a more scientific way To find the path we ultimately want to take This solution What are the ways Mm-hmm I might not be able to say too specifically But broadly speaking Some solutions are from the pre-train perspective From the pre-training perspective Some solutions Are similar to this sparse attention Sparse attention
For example DeepSeek also has some work And academia also has a lot of work And from the post-training perspective Also have post-training solutions Like for example externally Like what you use every day, Cursor and such They have very strong context management Managing this context ability Like it can let the model choose I think this middle segment is unimportant Just throw it away And that segment is important so store it in some file Retrieve it when needed
These two broadly speaking These two solutions Both have people researching Of course the specific implementation details Are more than the examples I just mentioned The examples I just mentioned Are relatively public examples The specific implementation details Of course each company has its own little secrets Well, I think ultimately it all comes down to that And then
I personally spend a lot of more time on post-training approaches Because Well, first of all, because I myself haven't actually spent official work time on pre-training Pre-training is more of an interest to me something I want to learn about But I myself haven't actually done that much work on it And on the other hand,
I think post-training approaches actually align better with my own understanding of this My understanding of this is exactly what we've been talking about whether you can train with short context but still handle long-context tasks Pre-training approaches essentially still require you to have long context Training it requires the data to contain it Right yeah.
Right, right. So
so it doesn't quite fit my philosophy on this problem Oh right.
So do you think it's possible now?
Training for long with short I think It's definitely possible but we're not sure which approach works best Gemini does long-context really well Why is that?
There are some tricks [laughter] There are some tricks that really surprised me, haha Oh, this is about pre-training, right?
Doing long context well definitely requires both sides But I'm just saying, for me, the pre-training side that trick still really surprised me [laughter] Right, OpenAI doesn't do it as well as Gemini on long context But there are also different opinions Some people say that with this Gemini 3 generation long context actually got a bit worse and stuff like that. Right.
Again, when you joined Gemini it felt like people didn't have high expectations for Gemini No, I already had pretty high expectations for Gemini at the time Haha, what year and month was that?
I joined at the end of September last year That was before Gemini released Gemini 3 You had high expectations for it What about others?
I think people in the industry still had a pretty good impression of Gemini back then I mean, I think before, everyone thought Google was in real trouble under OpenAI's impact I think people's perception probably shifted with the Gemini 2.5 generation Because 2.5 was clearly you could tell Google was getting the hang of it Of course, even before that, Gemini's
1.5 also had some, you know, small things where it was already pretty strong in specific areas It was clearly no longer far behind But 2.5 was really truly a generation I think it was when people actually started using the model Anyway, I myself have used 2.5 quite a bit used it quite a lot You went to Gemini because you saw 2.5?
My going to Gemini had nothing to do with that Mainly it's because I knew what kind of atmosphere Gemini had There were a lot of people doing different kinds of research And I also knew some people actually doing really interesting research And many Gemini engineers I think their technical skills are extremely, extremely strong I think I learned so, so much from them And um
that's the reason for me But I think from everyone's perception I think people in the industry, after seeing Gemini 2.5 probably realized that Gemini was catching up So for you that wasn't a signal for you to join Gemini, right?
It wasn't a signal for me to join Then why did you join Gemini?
Well, like I just said, Mainly because I wanted to accomplish something back then Actually, I wanted to have that But you know Gemini has strong people Right? Yeah, exactly
Right? Yeah, exactly It's because when they came When they approached me, they'd definitely want me to Go talk to their people, right?
So from those conversations You can actually get a sense of how things are Oh, so they came to you Yeah But I think in the end it became a two-way street So hahaha Wasn't OpenAI an option for you back then?
If you wanted to leave Anthropic OpenAI was also an option at the time OpenAI should still have been stronger than Gemini In terms of momentum, right?
At that time But Though back then Weren't there all those internal politics Infighting was starting to emerge I think so So OpenAI was indeed an option for me back then And of course there were also options like xAI And I think The main reason I didn't end up at OpenAI Was that I had concerns about its Culture, at least at that time I had pretty big concerns about its culture
I just felt that To put it bluntly, people who actually get things done There weren't as many as at Gemini Even fewer than at Anthropic Right? I really care about that
Right? I really care about that Hahaha yeah So a sense of cultural and personal connection brought you to Gemini Yeah And then you also caught that Gemini 3 inflection point, right?
Hmm Gemini 3 should have been a major turning point for them A turning point period, right?
I think in terms of actual impact I think it was two things That created a major turning point for Gemini Turning it into a heavyweight player in the market The player is Nano Banana Nano Banana and Gemini 3 Two things back to back, which is I think if there were only Gemini 3 It probably wouldn't have had such great results Because when your market share is less than
Even 10% Whether your model is slightly better or worse It just spreads too slowly But what Nano Banana did was First, it went viral in the market, it was a huge hit Then a ton of people downloaded the Gemini app And then Gemini 3 was released right after Retaining those users So Now it's become a major player I think if Gemini hadn't thrown this punch
OpenAI's position would be really comfortable Its market share is so high that Whatever you do with the model It doesn't actually matter that much to them To be honest I think when ordinary people use models Their perception of the model's capabilities Is actually very, very weak Most people don't even use the o-series models Most people just use the regular ChatGPT one
Right, so I think for Genimi This Nano Banana built up the user volume And then Gemini 3 retained those users Was something critical How many ChatGPT users did it actually take away?
Hmm, I don't know the exact numbers now But my feeling is Gemini's market share is probably around 20% But I haven't really checked the current data carefully Looking at it with hindsight These two factors Together contributed to Gemini's challenge to OpenAI today So from an insider's perspective you must have known earlier
What happened and why Google would undergo such changes Yeah, I think First of all, Google's technical reserves Have always been sufficient Hmm, enough talent Yeah, they've always been sufficient And then Organizationally speaking It became increasingly clear later on It's having a better framework to let Everyone work together on this thing So there might slowly be some progress
Right and then I think in a sense As an outsider In a sense I think OpenAI saved Google's life Oh because everyone used to worry This chatbot Would completely replace search Right if this really happened Google would actually be in a tough spot But fortunately OpenAI did this thing first
Then made Google realize this thing is important But it didn't take this thing all the way Didn't take this thing to the extreme Didn't completely kill off search Maybe just ate some market share As a result Let Google itself catch up on chatbots too Now the one in a tough spot is them What if For example there's a company, just hypothetically In a fictional world A company not only made a chatbot But also marched forward triumphantly
Doing better and better Really just ate up your search in one go Completely didn't give you a chance to fight back Then it would be very tough Did the chatbot not eat up search Because OpenAI didn't do it well Or why Or because it can't kill off search I think Both sides actually have reasons That is first um Current chatbot interaction methods Actually won't completely eat up search Because it's stronger than search
Like we said earliest just now The one point it's stronger than search Is that it has strong interactivity You can follow up And It can help you condense some very complex information This is where it's very strong So this portion of usage scenarios It will indeed steal people from search but There are still some very stupid scenarios in search Where you have a very simple thing You don't want to waste this time On a chatbot
Like I just I just search buy rice I search buy and it's done Just Do I have to ask ChatGPT Do I have to ask which one is good And it's still spinning there Spinning for half a day Then gives you a link You click again Then go to the webpage to buy Right there's no need for that So from actual usage Its current form Is not enough to completely eat up search Right and
Of course from another perspective It might not have reached the peak in the chatbot thing either It really let Google catch up Now it's not quite caught up yet In terms of product I think in terms of product it's not caught up But in terms of model it has already caught up But if you want investors to invest in OpenAI They would say When they placed their bet They recognized clearly OpenAI is actually a product company
Its moat is actually product and brand Then from today's perspective It seems Google hasn't been able to in this matter Catch up Can't say surpass OpenAI Catch up to OpenAI Right I think This is actually Anyway this is all from my perspective as an outsider An observer's perspective You're a commentator today Hahaha From an observer's perspective I think Google has traditionally been a bit slow with products
Has always been relatively slow And so 然后所以 Do you think OpenAI has an advantage when it comes to products?
I think it's possible.
Right.
And what's one thing Google is particularly good at?
Finding an extremely simple product form.
Everyone looks the same.
Then it just competes with you relentlessly on technology.
And you can't outcompete it.
Oh right.
That's exactly what Google is good at.
Because search engines are exactly like that.
Search is a classic example.
Everyone has the same search box.
One button, but it just searches faster than you.
And more accurately than you.
There's nothing you can do about it.
Mm-hmm.
So that's why.
Like.
It feels like all along.
Google has been in this state of doing very well, but...
Wall Street never really bought into it.
Everyone always wondered where this company's moat really is.
There's no product ingenuity.
No retention mechanisms either.
But it has survived until now.
So what's the reason its technology is so good?
I think it's still about the people, right?
I think it's the culture.
It's said to be.
A place that particularly, particularly values.
In the past, it particularly valued engineers.
Later, it particularly valued research.
That's the kind of culture.
So it's very well suited for.
Products where technological capability spills over.
Capability-based products.
Right, if you look at it from this angle.
Then do you think OpenAI's position is secure?
Now?
I don't think anyone's position is secure right now.
Hahahaha right.
I think the form of AI.
Still has a long way to go.
Mm-hmm.
We're not at any endgame yet.
That's the feeling about this.
Right.
It feels like back home there's already a bit of this sentiment.
Yeah, I don't get it.
Like, why don't I get it?
I'm really puzzled.
Like.
So back home, people think we're fighting over a super app.
A super app is zero-sum, right?
I think conditioned on the chatbot thing (taking the chatbot as the condition to build on) that's the super app.
Then maybe there's something to fight over.
But the problem is.
Is this form the super app form?
What if someone else.
Comes out with a completely different form one day.
And your functionality becomes a subset.
Of that thing.
That's quite possible, right?
I don't think there's anything.
I don't see anything impossible.
Why wouldn't the chatbot be the ultimate form?
But after all these years, this is all we've seen.
Right, it's all just a chat box.
I think on this matter, I really don't have any.
Rational or quantitative criteria.
To explain it.
More like you just feel like this whole thing is stupid.
Like this model clearly has so many capabilities.
But the way we use it is a chatbot (Note: This video was recorded over 2 months ago, when the agent paradigm was not yet clear).
It just doesn't quite make sense.
You know what I mean, so.
We need a product manager.
To unlock the model's capabilities.
Hahaha.
Humans have only communicated with AI through chatbots until now.
That seems stupid to you, right?
It's stupid because.
Then what should we use to communicate with AI?
Haven't figured it out.
If I had figured it out, I'd already be doing it.
Hahahaha.
Hey, you didn't tell me.
What exactly changed inside Google.
To lead to what the outside world saw.
The rapid leap in model capabilities.
Right, like I just said, it's one thing.
I think the organization has more clarity now.
And.
Once the organization is clear.
Did the organization change?
Right.
Especially pre-training.
Has become very, very clear now.
That is who is responsible for what And every point Who is the responsible person at every node These things are very clear Was it chaotic before It was very chaotic in the earliest days I wasn't there in the earliest days But according to colleagues Based on colleagues' or people I knew's descriptions It was still more chaotic before Mm-hmm right right And now At least pre-training has also become very very clear And plus This Google
Has always had This relatively strong technical background And it does things relatively systematically So I feel Pre-training at Google Is a very very controllable thing Mm-hmm predictable thing You can You can know The next generation won't be bad Oh you might even know how good it will be Through Anthropic's top-down management it also Mm-hmm not bad Then Google is this bottom-up
It's still bottom-up right It's definitely more top-down than before Compared to the earliest days But compared to Anthropic It's still more bottom-up Like different cultures can both work Right right For model training Right that's I think big companies have big company ways Startups have startup ways So big companies are You also just said It's a completely different narrative It's a different
Method, what is Google's method Now I think Google more says Like this kind of relatively deterministic thing Like pre-training Is already a relatively deterministic paradigm Then maybe Google will be more like Making it into an engineering project Google's engineering management ability is very strong So it can slowly do it well Mm-hmm what is an engineering project Engineering project means You are actually Actually very very
Very top-down organization And very clear What we need to do in the next stage Then go do this thing What nodes need to be handled in between And even doing research is like Having a very clear framework Telling you how to Verify whether your results are good or bad Evaluate whether your results are good or bad Right so this is Something Google is very strong at
In any big engineering project in the past So pre-training Actually I think has now entered Google's comfort zone And Post-training of course has more uncertainty Then maybe post-training currently Is still more bottom-up Everyone can try more broadly You say pre-train is also a kind of RL Why do you say that I think it's
It's hard to say from a pure technical perspective Pre-train is pre-training Or supervised learning What is the essential difference between SFT and RL Because pre-training and SFT Of course pre-training and SFT are essentially not that different That is You just take the data you get As your ground truth Then you treat that as your expert Treat that as your expert output
Then you align toward the distribution of that expert output Reinforcement learning might be a broader level One level, it's saying first this This original output Is also not a given expert But something I produced myself And among them there are good results And also bad results So you want good results to move closer to that and bad results to move away from it, something like that So in a sense
pre-training and SFT are a subset of reinforcement learning But these two things do, in this era have their differences Of course, for me the biggest difference lies in the data For pre-training data what matters more is having a good distribution The distribution needs to be broad enough, or aligned well enough with the scope you want to cover But data quality
doesn't need to be extremely high But for post-training it's the opposite In terms of distribution, it may be much narrower But for the data it does have the quality requirements are very high Yeah right So for now for me the most fundamental difference between the two is still in the data distribution rather than in algorithms or training paradigms So how do different labs organize these teams?
Are pre-training and post-training different?
Or are they the same?
Anthropic and Google are pretty similar Both of them have one team for pre-training and another team for post-training OpenAI might be more chaotic In the early days initially they had three teams They had pre-training and they also had reinforcement learning the Strawberry team and they also had a post-training team
And my I never worked there but my understanding is its post-training wasn't really its RL team, Strawberry and its post-training are actually what other companies call post-training and product Oh so they might have divided it in a different way and sliced it up They treat the later stages as product work As part of it their post-training is actually intertwined with product they're building the product Is it just that the name hasn't been updated?
Not entirely Because at most companies, the product team doesn't really train models anymore They mostly communicate the desired traits the model traits, to the team training the model But it seems like their post-training is in a sense its own product team but it can also train models Is that because their understanding of product is that people who train models should also build the product Yeah yeah possibly It could be a good thing
Yeah, but their org has also changed a lot since then So I don't know what their org looks like now You guys have released several models recently and I saw you were involved in all of them Gemini 3 Deep Think Gemini 3.1 Pro Well, I think I can only say that I was fortunate to be involved Hahaha yeah Again, it all feels like collective work Hahaha yeah How did you become such a public figure now
getting singled out and mentioned separately every time I don't get it I actually don't think it's great Every time I see it I feel like how am I going to face my colleagues in the office tomorrow Hahaha Does it feel awkward?
At the office it's fine I think my colleagues are just good people Like they probably don't care too much about these things But honestly I feel like every project I've been part of whether at Google or at Anthropic It would happen even without me Would all happen the same Effect wouldn't wouldn't wouldn't Get worse I am I am I think everyone now is Everyone is a surfer Essentially it's a wave
Not you the surfer Mm-hmm, is the wave AI Right it's AI This thing itself is this wave It will move forward Whether you surf this wave or not This wave will crash on shore Just that some people might surf this wave Some people might be a bit late Didn't catch the crest of the wave Okay You were fortunate to participate in these two projects What Mainly probably some some
Those small details in algorithmic design Then we would Discuss together And Some Some things on the data side But things on the data side I think Might have more impact on future work Do these models have paradigm changes Mm-hmm I don't think any No change is big enough to
From not knowing how to do large-scale reinforcement learning To large-scale reinforcement learning That level of change No change is big enough to that extent There are definitely some small changes Can you talk about these small changes These new models There are definitely some small changes Recently I feel models are already numb A bunch of domestic models And many foreign models too OpenAI you all
Mm-hmm domestic GLM, ByteDance DeepSeek has been expected but hasn't released yet Kimi can you highlight the key points for everyone I think In a sense None are that worth paying attention to Hey what are people competing over now Feels like chaos I think some things people are competing over Actually looking at it now In this era Already not that important Because of inertia from the past
Everyone would compete for first place on various Benchmarks To prove their model's basic capability is strong This thing Actually by now it has reached Public attention Those Benchmarks are somewhat maxed out Actually think about it, earliest everyone paid attention to SWE-bench Randomly everyone hit 80-something Fortunately no one exceeded 83 Because recently OpenAI just released a post saying they exceeded 83
Some of those problems are not well-defined Fortunately no one exceeded it Whoever exceeds it would be embarrassed Anyway And before everyone reasoned by finishing AIME then IMO After IMO what Can't think of RKGI and such Benchmark then RKGI Mm-hmm before Gemini 3 Everyone probably forgot the highest At that time maybe level 10 or so And everyone was like wow Hard as climbing to heaven
Then Gemini 3 made it 30-something Then Claude 4.5 or 4.6 became 4.6 should have become 60-something Then Gemini 3 Deep Think hit 80-something So this is also maxed out So now it feels like Just relying on hitting these publicly recognized model capabilities
Actually doesn't have much meaning anymore And um So from this perspective I just Essentially there aren't too many key points Although everyone is releasing very fast Mm-hmm Releasing fast also shows Actually this problem has become easy For everyone Everyone knows the know-how now There are no secrets anymore Right right It's still this, it's still that It's still that same thing The surfing theory, right It's still this
The wave is moving forward What's the next goal everyone might be looking for What's the next paradigm-level change Will there still be one Ah, I think The two things I just mentioned are I think ML coding and long horizon, right And these two are I think, I think Um Yes yes I think it might be something that hasn't reached paradigm-level change But I think it is
Something very valuable for Google Because first of all, ML coding is Because Google itself is a major player in AI research And it's also the most full-stack in AI research That is Not only does it have these model training parts It also has hardware design The part connecting hardware to models If this entire system can be accelerated Or better managed
That could be very valuable for this company Long horizon goes without saying Everyone knows Everyone thinks it's very important Right So I think that might be, for me Can't say it's paradigm-level Definitely not at the paradigm level But it's something I think is very valuable That needs to be able to, within the next few months Show some light at the end of the tunnel, and um I think paradigm-level
Might still be those more uncertain things Like multimodal generation, that kind of thing I think there might be a hero Or a group of heroes Haha, and um, right That kind of thing might have some Um, also talked about a lot is continue learning(持续学习) What about world models I think continue learning and this kind of long horizon Just said there's no fundamental difference with long horizon
Because um Because people used to think these two things were very different It's because Continue learning changes some of the model's weights And when you do this kind of For example, like open Open source Everyone does a lot of this kind of This kind of context management(上下文管理) Doesn't change model weights But actually, if you think about it, there's no fundamental difference between these two things Because those tokens in the context
Their own KV cache is also a kind of weight, isn't it So You think between these two approaches, which one can Which one will be more useful More useful in the long run I think it's unclear But essentially they Are both for doing what I just mentioned, long horizon This type of thing And world models Ten thousand people have ten thousand world models What does that mean? The definition isn't clear That is
First of all, I don't know what a world model is And secondly When everyone talks about the world models they're building They might be talking about different things For example, the world model that Gemini builds might be different from For example, like Fei-Fei Li The world models they're building are not the same thing Um sigh Describe the difference I don't particularly understand what labs like Fei-Fei Li's What these labs are doing What it's actually like
But um Gemini's world model is more of a It's a kind of end-to-end(端到端) level of training The result it wants is that I can, for example For example, video generation Is that given a description Then generate a video But the result it wants to achieve is Not only can I generate a video I am able to generate a scenario What is a scenario Scenario means I generate
The state at this moment And then I can also give it a condition A condition This condition is that under this state I did some What kind of Action And then its next moment state Will become a function of my previous moment State and action And it's end-to-end training this kind of capability Right so this might be one solution And I First I don't know What result everyone ultimately wants
And I also don't know what everyone's Definition of their own world model is So I think it's more of an exploratory state We haven't talked about one organization just now, xAI We just talked about Anthropic Talked about OpenAI Talked about DeepMind What about xAI xAI I don't understand haha As a commentator let's talk about it Why are they so turbulent recently I think they've always been quite turbulent
Hahaha why so turbulent recently I don't know either And Actually I don't have that much contact with xAI And Some people I contacted have also left now Actually I don't know what happened to them Hahaha When you were talking about Anthropic just now You said The technical number one being able to make bets Is very important
Then at Google who is this number one Who is this hero I think heroes Might be different people at different stages Mm-hmm but behind every hero there is one person Sergey Brin Google's cofounder Oh right I think ultimately many many big decisions Might not be decided by him on how to do them
But in the end he has to be the one to make the final call Mm-hmm even now What about Demis Hassabis I think the person who appears more on the front lines Is Koray Kavukcuoglu Right Yes DeepMind CTO And he's now also that Google SVP Oh what is Demis responsible for I think Demis might manage more of those Things leaning toward science Like for example drug design
Isomorphic Labs and such things Right right right Oh Gemini He doesn't manage much At least from my perspective The person I see more is Koray Of course it's possible that Company management matters Actually there are many parts I can't see Then I'm not clear about that You also mentioned AI is a whole system Mm-hmm What understanding do you have about how to systematically do AI Now After these two years of your work
Several aspects One aspect is from the whole system perspective It needs a relatively scientific attitude That you need to clearly understand like Scaling Law You need to clearly understand What assumptions you have made And when I make a change What factors are actually related to it What factors are not related Right And this is from the organizational perspective From the people's perspective Actually requires people to be very reliable
Requires very responsible people Actually every system Every evaluation framework Is very easily hacked Because you can always do something To make your metrics look very good But a trustworthy Or down-to-earth person He would actually think If the thing he did works well Is it really For example effective at large scales
Did I miss some factors in between Right Actually doing things systematically Sounds like one sentence But actually doing it is very complex There are many details Many resistances It actually goes against human nature Oh Because every individual's human nature Might be to make their own things Show up better But for a company or an organization The most beneficial thing Is to make the entire company's system
Very solid systematically This is actually the best for you personally Because once this system is solid You can leverage this system To produce more output But the bad thing is This system will make your individual heroism Not shine But you can rest assured that others' individual heroism Also won't shine But if you are in a system Where individual heroism can shine Then this system might
Not be particularly stable Because one person leaving Might cause the entire thing to collapse For example like OpenAI You say you love to challenge difficult things But this industry seems to require Doing simple things well repeatedly Actually I think the so-called simple things Doing them well repeatedly
Is actually a very difficult thing Because human nature doesn't like Doing repetitive things Because the most difficult thing in this industry Is actually doing simple things cleanly Why Because everyone can do simple things If you can't do them cleaner than others 需要研究员自己对于这个系统 怎么运作 有一个好的理解 然后以及对公司负责任才能做到
否则就是你很容易做到一件事 就是 你可能比如说你在考虑training的时候 是比别人好的 但你考虑training加sampling时候比别人差 你总可以选择你只是有training 但这就很糟糕 对 所以这个就是既需要你个人的负责任 又需要说组织所建立的这个体系里 能够能尽量的发现这些 有意的或者无意的 这种边界的事情
但是你作为个体的话 你不知道怎么样是对全局最好的呀 其实是需要 我觉得如果一个研究员做不到 对全局去考虑的话 他就不是一个好的研究员 在现在这个时代 嗯 就是这个 我觉得这个 和你就是在学术界做research 是很不一样的事 哦 因为在学术界做research 本质上是一个人吃饱 全家不愁的状态
这我为我的项目负责对吧 我为我的可重复性负责 但是在一个公司里 你其实更多的时候是 我得为这个公司负责 这是两种完全不一样的心态 那你这种自觉性从哪里来的 不知道哈哈哈哈哈 我觉得我可能就是拉不下脸 哈哈哈 拉不下脸是什么 就是 你对一个公司负责任
是你和这个公司的契约的一部分 其实我觉得没什么道理不这么做 这么做是没有原因的 所以这个人英雄主义会破坏这种整体性 我觉得 If you're just doing it for personal heroism and acting on that basis it's very likely to undermine the bigger picture Of course, in reality you might be very capable and you actually become a hero that's also possible Since you've also been through two organizations
what kind of organization do you think is better at fostering intelligence in this era I think this is actually a very controversial topic I mean as we were just discussing different organizations some tend to be more top-down some more bottom-up so the natural question is for example which of these two types fosters more innovation The traditional view was bottom-up was a necessary condition for fostering innovation because everyone needs freedom, right
only with freedom can there be innovation But purely bottom-up you find it doesn't actually work either because it just becomes chaotic That's what Google was like before Was it?
Yes At least in my impression from what I understand, that's how it was It was just chaotic People didn't even know what the point of what I was doing was That might not be great either So you probably need someone or a small group who can blend these two approaches somewhat Mm-hmm That's why I think whether an organization runs well or not it looks like an organizational issue but ultimately it comes down to the tech leader
Mm-hmm It's about whether this tech leader has the qualities to keep the organization running stably Because the optimal state is often the most unstable one It easily collapses toward a worse state Right, so you need a leader to control that So do you think it should always be the tech leader doing this rather than the CEO Well of course every company's CEO may have different responsibilities But there needs to be a leader
I think you need at least one leader who has two qualities to be able to do this One quality is that they can fight fires themselves It's not just talking about what to do What to do What to do but rather when something really runs into trouble they can step in and lead the team to solve the problem Of course most of the time a leader probably won't have time to do this
But at least they have the capability The second important quality is that they need to understand others Even if it's something that they wouldn't do themselves they can understand why what others are doing matters They can tolerate and accommodate others That might be another quality What do you think about Google's TPU In what ways does it outperform GPUs What are its weaknesses I think From a purely hardware perspective
it's hard to say which hardware is truly better or worse especially at this kind of large-scale commercial deployment Because fundamentally GPUs and TPUs In terms of usage the biggest difference, setting aside the hardware differences in terms of usage the biggest difference is GPUs have a better open-source ecosystem TPUs don't But this actually isn't an issue at large-scale commercial deployment It's not a problem Because for example, Google itself uses TPUs so naturally they'll spend time building
this infrastructure And infrastructure is For example, if you're only running a thousand cards it could be a heavy burden But if you're running a cluster of hundreds of thousands of cards then building out the infrastructure isn't really that big of a deal And in practice So basically when it comes to large-scale commercial deployment neither one is inherently superior or inferior But these two do have some differences in design philosophy Take GPUs, for example
At least for the more recent GPU generations I haven't used them much Like the Hopper generation of GPUs The H-series GPUs The design philosophy is that inside one pod (node) there might not be that many cards say, just eight cards and these eight cards can all interconnect with one another NVLink (NVIDIA's high-speed interconnect bus) is extremely fast So within one pod, there's basically no communication bandwidth bottleneck (insufficient bandwidth between GPUs) But TPUs take the opposite approach
It means that they've abandoned pairwise interconnection between cards but they try as much as possible to fit as many cards as possible into one big rack It has this kind of 3D Torus design (3D Torus topology design) So each card only connects to its three nearest neighbors in three directions but the entire cluster can be connected into one big Torus And if your compilers (compilers) or your sharding (data sharding strategy)
logic is written well enough you can take advantage of this architecture Effectively speaking you get more memory capacity and also reduce a lot of communication bounds What's the downside?
I think one downside is that compared to GPUs, it definitely at least at a small scale is more of a rigid structure So its ease of use or its general versatility might not be as strong Recently many neo labs have emerged in Silicon Valley What do you think of this trend?
Why are they all leaving jumping ship from these big model companies to start neo labs I don't really get it Haha, my feeling is that the vast majority of neo labs will die. And
Well, I think some labs genuinely have good people And some labs might actually be starting to do some real work For example, like Thinking Machines is still delivering some new things But some neo labs Please bleep out the names Haha, like XXX, that XXX I have absolutely no idea what they're trying to do And These two have actually been away from the field for a long time I think in 2026
China will place a lot of emphasis on the consumer-side narrative Who becomes that super app What do you think?
Do you think this It seems like nobody in Silicon Valley talks about this Right, because American enterprise is just...
It's companies Or rather, the productivity software market is just too big and the profit margins are too high So for the US there was basically only ChatGPT doing consumer before and there wasn't much money in it Not much profit So now everyone will probably focus first on productivity software or enterprise And So the trends in China and the US have already diverged I think Not just AI
The entire internet industry in the past was like this too It was all different What China is really strong at is the consumer side It can come up with, like really, really complex product features or structures and in a way that seems very indirect to you In a very unnatural way to snowball that profit For example What do I mean by indirect?
(laughs) Like, take something like Douyin (TikTok) It's not like you watch a video and I charge you 20 cents per video, right?
It says you can watch videos for free but I can quietly slip in ads I can quietly do live streaming I can quietly do e-commerce But that doesn't work for productivity software Productivity software is very straightforward Like, I help you write code My cost is 150 a month I sell it to you for 200, I make 50 It's that straightforward Mm Yeah, I think what the US has shown in the past
is that with these very straightforward products it can push technology to the extreme But there's never been a product that felt so sophisticated that you can't live without it yet you don't feel like it's taking your money but it's actually making money from you Hearing you say that, I suddenly feel Meta should just copy ByteDance Yeah, but I don't think Meta is as strong as ByteDance Because Meta can't find its own niche either And
there's no American company doing this No one has found the niche that Doubao occupies Then Meta should just copy Doubao It doesn't need such strong model capabilities either But I still think the Americans making products fundamentally, the people doing consumer products aren't good enough Far behind China This is the accumulation of the past decade, right?
Yeah Mm Because the positive feedback loop in the US over the past decade all came from doing B2B A lot of enterprise stuff Or it's just too easy to make money in the US Mm When it's too easy to make money you won't rack your brains over how to make money Hey Haven't a lot of people come to chat with you?
Any interesting people?
Oh, well A lot of people from China came Tech companies I think they're all pretty interesting And I did find that Chinese people doing products probably think in more sophisticated ways More sophisticated Yeah, they think more...
Their thought process is more convoluted Yeah, it's a completely different style from the US America is like As I just said about America It's like you build something and sell it directly Yeah, it's simple That's how it is You just need this capability Once you have it, you just need to be cheaper than others Then I can earn more than you And you can't do anything about it Okay What about China?
China seems to be all about this pattern Not making money at first But once it starts making money you can't stop it It's just that it can really form that that self-sustaining that loop When it really gets that flywheel spinning you can't break in anymore Do you think American companies understand ByteDance now?
My feeling is no Not yet It's already so big Oh, you mean whether they take it seriously?
Of course they do Everyone definitely knows ByteDance is a severely undervalued In terms of its valuation It's a severely undervalued company I think that's very clear to everyone And I think it's also clear that in the consumer market On this end, I actually think No American company can compete with ByteDance But after all it's a Chinese company At least in terms of public perception
After all it's a Chinese company So do people understand it I don't think people understand it But look at Meta It's also actively poaching people from ByteDance Mm-hmm, do you have any idols in the AI industry Or people you admire Although you've been in the AI industry for a short time No no no, nothing I just feel When I came to this industry The era of individual heroism had already passed So there are no heroes
Sometimes you even think old-era heroes are a bit stupid Ah right So really there's nothing Who do you think is quite stupid Let's not talk about this No comment hahahaha Right, I think it's Different from doing physics I think when doing physics There were still some People I think really much smarter than me Like me When I was doing my PhD my young advisor was
I think he, Douglas Stanford I think he's just much smarter than me I think he Maybe also seeing him Made me feel in that field Not very useful With him around what do they need me for Right haha You came to AI to do a dimensionality reduction attack right Not a dimensionality reduction attack But anyway it feels like AI this thing But anyway it feels like AI this thing Doesn't really need brains Doesn't really need brains
Really doesn't need brains Then what does it need I think this The most important trait in this industry Is being reliable Doing things carefully And being responsible for what you do This is the most important trait You say how much brains those things need, I think They're all things undergraduates can do But you say AI has no individual heroism Now an AI researcher is priced so high
Like a star player transfer I don't know if it's a good thing or bad thing For me personally Of course I'm very happy I benefit from this Right hehehe But um Actually speaking I don't know if this thing Is a good thing Why do you think the price has become so high I think maybe on one hand Everyone thinks this thing is scarce But actually it might not be that scarce Because training a person
Although this thing isn't that hard But training a person requires an environment You need to have that opportunity to be exposed to this thing To learn this thing Without that opportunity No matter how smart you are it's useless Maybe in the past people who could encounter that opportunity Weren't that many So in the market it might be relatively scarce From this perspective Mm-hmm But I think another aspect is also
Maybe the hype about people is a bit excessive Right Really like to mythologize individuals Now Right I think Really Just say it again This is a collectivist thing haha Then many people are also very curious Because Maybe many companies also want to recruit AI people Then you think the most important thing is still being reliable What metrics are there for this
How can you quickly judge whether a person is reliable Whether they do things carefully Everyone has some methods they use to measure I of course also have some of my own tricks It's just that I I used to design an interview question Let me briefly explain it This It shouldnt be confidential So I should be able to talk about it Um So the interview question is actually quite simple I need this person to, within 24 hours,
complete a reinforcement learning project from scratch They have to choose on their own what kind of model I tell them what resources are available and they choose what model to use what data to use what algorithm to use and train the model Within 24 hours I give them 24 hours to get this done And after the 24 hours are up they'll have a one-hour discussion with me So this thing
isn't that hard in the AI era Without AI this would be impossible No one could do it in 24 hours But with AI, it's actually quite easy Because AI can do the whole thing for you But why still do this?
There are two reasons There are many reasons Among them Two reasons why it was designed this way One reason is that I think in this era, evaluating someone evaluating someone like whether they write good code is actually useless Because most people don't need to write code themselves anymore What's more important is whether they can effectively leverage AI So that's one aspect of evaluating this The second aspect is that there's a trap here
If you let AI do everything but you don't really try to understand what AI did for you you'll be exposed during that one-hour discussion That's a That's where people fail So the other thing this tests is whether you've truly formed a collaboration with AI Or if you just completely handed it off That's something I personally value very much That also reflects whether this person is someone reliable Of course, this
The design of this question itself also has some rather dark cleverness to it Like why it was designed as 24 hours is to see how much this person values this opportunity Can they stay up all night Right hahaha If they're willing to pull an all-nighter they can survive these 24 hours If they can't make it then it just means they probably don't value this opportunity that much Haha So for people younger than you
Do you think AI is still a blue ocean a place with lots of opportunities I think purely working on language models is no longer a blue ocean I think it's too late — the last train has already left The last train has already left Which last train is that?
I feel like I got in on that last train And there might have been some people after I got in some new people But I think they won't have the opportunity to encounter such good opportunities Like being able to do something in a relatively small team Chances to encounter such opportunities will be rare Right, and then But I think AI is a very vast field Language models are just a tiny, tiny part of it
A very small part There are many other things Like the multimodal generation we just mentioned There may still be many opportunities there Robotics probably has even more opportunities And even more extreme, there's like whether you can use AI to help with real scientific problems Like helping with quantum control and things like that Then it might be more blue ocean Those are all blue sky things Right so
I think for People young enough Maybe doing the hottest thing right now Is not the right choice Doing things no one has done now Might be more of a good choice Right How will you develop in the future Will you be at Google for a long time I think probably not Hahahaha Saying this so publicly I think probably not I think I will still try to challenge myself Right and Need to torture myself
Right need to torture myself But I just might need to find something Worth torturing myself for If AI is not fundamentally difficult Won't you find it boring Where is your challenge Although it's not difficult But knowing and not knowing There is still a gap From completely not knowing the details To slowly understanding the details Understanding how it works and such These things I think still require spending time and effort
And after you understand I think this thing will also be helpful for your future Like whether you do product related Or develop toward other AI directions I think all In the long term Will be helpful Where do you want to develop in the future I think anything is possible Haha haven't figured out how to torture myself You probably won't jump to another big company again Probably not Mm-hmm
What differences do you feel between what you learned at Anthropic And what you learned at Google DeepMind I think they're quite different I think Anthropic Is where you can understand one thing One line, language model Every aspect of this line very thoroughly It gives you that opportunity And at Google It's more horizontal It has many different aspects Many different people And you can also see different perspectives Also see different research directions
You can see all of them Right Anthropic is because it bets firmly enough So you can understand more vertically Right Have you thought about using AI to solve physics problems (Your theoretical physics) Someone is doing it So I don't think I need to do it haha You don't have essential interest in this I think this thing First Currently it's not the highest priority for me I think if one day
I think I solve the highest priority thing on my hands And haven't found anything else to do I might go do this thing What is your highest priority now My highest priority now is To push the two things I just mentioned Oh ML coding and long horizon To at least a Where colleagues can Push it to a relatively I think relatively stable state That I think is my highest priority
Of course there might be other priorities later But Using AI to do physics I think is something Many people are already trying to do One more of me is not too many One less of me is not too few Might as well let others do it first Do you have any physicists you particularly admire Not really Yes, but there are quite a few Don't know where to start Hahahaha Physicists yes AI scientists No
But this is related to a person's growth experience I think Like An adult finds it hard to truly worship a person A child might Who have you worshipped I think in physics Actually there are many who are really quite strong But those everyone talks about People from 100 years ago let's not talk about Like Einstein Heisenberg and such let's not talk about And including everyone later knows
Like Frank Yang Chen-Ning Yang and such let's also not talk about And Like when I was doing topology before Actually there was someone who later also won the Nobel Prize That Haldane You'll find these people Have some abnormal foresight They seemed out of place in their era But look at Haldane When he first did Haldane model and these fractional
Quantum Hall effect related things It was decades away from when everyone finally figured out these topological states Many decades later Mm-hmm At that time he could feel this thing was important And kept pushing this thing himself I think this is not easy Of course I think If you really want to find a similar person in AI I think maybe Geoffrey Hinton When everyone felt this thing Was optional or not that certain
He kept working in this direction Then I think This might be a hero-level figure After him AI after that I think I think there might also be some heroic collectives Like for example Transformer Noam And those That Ashish Niki and them That might be a heroic collective You said something that made a very deep impression on me I don't have any mentors in this industry
Don't have any old friends I can criticize whoever I want This might be the benefit of not doing AI Hahaha the benefit of not coming from AI Right like Really have no burden No old-timer is your relative So if you think he's stupid He is stupid Can just say he's stupid directly It doesn't matter Were you like this before too I think I was quite restrained when I was a student
Oh But later I found restraint useless No benefit to myself No benefit to others either Better to be more direct Expressing your own ideas is the most critical thing I think directly expressing your own ideas Is something where in the short term people will definitely hate you But in the long term everyone will appreciate Who have you heard speaking particularly stupidly recently Bleep out that name
Thank you I think XXX has always been quite stupid, haha And consistently stupid, haha Could he possibly be the right person I think what he says In Pauli's words is not even wrong Because it's not well-defined It's hard to say whether what he says is right or wrong Right, like one day Maybe a different paradigm happens
He can jump out and say hey I said this this this this back then But then you discover Maybe if the paradigm were another state He could also say the same thing This is why I hate this kind of very vague Very vague people Because a thing being vague is meaningless Why do you think he speaks very vaguely No correct definition Like It's kind of ambiguous If it has a proper definition
I can explain why it's properly defined But if it doesn't have a proper definition I have no way to explain Why it isn't properly defined Because it really isn't properly defined Hahaha What about XXX I think at least I think XXX is still a well-defined thing Like, it's trying to do XXX And their approach might lean more toward this More traditional kind of This neural network model approach Rather than a more end-to-end approach
I think at least it's well-defined As for whether it's right or wrong I think that's something the future will test Most old geezers are actually fine I think I think when people get old They don't necessarily turn into old geezers When people get old, they split into two types One type is the venerable elder They might stop nitpicking so much And actually put effort into mentoring young people The other type is the old geezer
They don't know what they're talking about Yet love to nitpick and boss people around Yeah, so getting old doesn't necessarily make you an old geezer Hey, who got you all riled up I don't even know who got me riled up But I've definitely met plenty of old geezers Hahaha When did you change Like, becoming so direct when you speak You stopped holding back—you've always thought this way But you didn't say it
I think in the past I might have been pretty direct too But not this direct But after getting into AI, I became even more direct So it's like nothing holding you back, right One, there's nothing holding me back Two, this field is objective enough Like You don't really have to worry too much About offending people with your opinions As long as your views are internally consistent Like, you have a coherent framework for your views
You're not just randomly trashing people That would definitely offend people You have your own understanding of things I think people will actually respect you for it Because ultimately, how well you do in this field Is judged by objective standards Every guest we have recommends a life-changing book It has to be a book that genuinely had a major impact on you What book would you say This is the hardest question of the day
I feel like you're overestimating my cultural sophistication Hahahahaha Honestly, I don't really have a life-changing book Okay, I read a book recently Recently Last time Ji Yichao mentioned 'The Line Puppy' The book I recently read is Yukawa's autobiography Hideki Yukawa's (1949 Nobel Prize in Physics winner) autobiography 'Tabibito' (The Traveler) And then If I had to say, books that left an impression First of all, I genuinely don't like reading
I feel like I'm not very well-read And the books I read Other than professional ones All feel like leisure reading to me Like Yukawa's autobiography It's essentially leisure reading too But I found it quite interesting Like You get to see A scientist who later seemed so successful Struggling in his youth Very authentic
And then maybe some other leisure reads Like novels and stuff There's a novel I really like 'From the New World'—it's a Japanese novel Yeah, if you really force me to recommend some leisure reading I could recommend that one Have you watched any movies or anything lately TV shows, or played any games Nothing at all Hahaha
A favorite food from anywhere in the world Sushi probably A favorite place anywhere in the world A favorite place anywhere in the world I—I think if you really force me to choose I'd probably choose Hawaii Because I really love the ocean Yeah, but it's hard to say for sure Because after I visit more coastal places I might have a new favorite Hahaha Something not many people know But probably should
Don't trust old timers, does that count? Hahaha
Have you ever been superstitious?
Hmm I I haven't, fundamentally But I think Sometimes superstition can be a way to comfort yourself I meant, have you ever been superstitious about old timers?
Oh, superstitious about old timers Never?
Really never But I probably didn't hate old timers this much before Then I started hating them more and more Why?
Maybe it's just that When you develop more judgment of your own Stupid people just look even stupider But they haven't hurt you So why hate them?
It's just stupidity intolerance Everyone has stupidity intolerance Hey, what's your MBTI?
No idea Why has there been, in recent years, I mean, among young people Toward older people Such an unfriendly term emerging?
Where does it come from?
No idea No no no Haven't looked into it Could ask Gemini Have it do a Deep Research See where the term "laodeng" comes from So what are the papers that have influenced AI progress the most, in your mind?
Sequence-to-sequence is one And then that I think language models At the peak of the feature engineering era And then Scaling Laws is one The one by Jared Kaplan Their Scaling Laws paper at OpenAI is also one It's a paper that introduced this systematic research methodology Into the field A paper
Of course, the actual methods in Scaling Laws May not have been exactly right But it was the first To introduce this idea I think that's crucial Based on your current understanding What's a key important bet?
Long horizon (long-horizon tasks) Hahaha Our studio is called Language is World Studio When you first heard this name What were you thinking?
I think this name is a bit...
Too normal, too mediocre Hahahaha, fair enough, hahahaha I think this name is something that Maybe ten years ago Was a very unique perspective But now there's just too much consensus I think ten years ago it really was Maybe it's been more than ten years now Sorry, I feel like I'm getting old too Maybe it's been more than ten years Like around 2014, 2015
Everyone thought vision was the most important thing Back then I think realizing That language is an important carrier of intelligence Was probably something different But I don't think our name Was meant in an AI context Hmm Hmm Hahaha Well then that's worth deep thought, hahaha
Loading video analysis...