LongCut logo

Jeff Dean Says AI’s Biggest Opportunity Is Still Largely Untouched

By Laude Institute

Summary

Topics Covered

  • TPUs: From Internal Need to Global Accelerator Market
  • Co-designing Hardware and AI Models for Future Innovation
  • The Crucial Role of Academic Research in AI Advancements
  • Moonshot Grants: Targeting AI's Societal Impact in 3-5 Years
  • AI in Healthcare: Learning from Past Decisions for Future Care

Full Transcript

[music] Jeff Dean, thank you for joining us here in Sunday San Diego, right in front of the nerves conference center. You are

the chief scientist at Google, Kotech lead, and you all recently made an announcement about a version of the new TPU chip.

>> Yeah.

>> Let's talk about it. The seventh

generation of TPUs.

>> Yeah. Y

>> what's special about it? Uh I mean like every next generation of TPU it's better than the previous one and you know it has uh quite a lot of new capabilities uh it has you know it's connected

together into these very large configurations that we call pods um I think it's 9216 chips or something like that per pod um and it has you know much

higher performance especially for lower precision floatingoint formats like FP4 um so that's going to be really useful for training large models uh for inference for a lot of things like that.

So, we're pretty excited about it.

>> Nice. If zooming out, Google started building TPUs for their own internal needs. Google's pre-minent AI

needs. Google's pre-minent AI applications company, AI research organization in the world, and the need for control of the full vertically integrated stack was the original

motivation as I understand it, if I've read about it, and then eventually externalizing access to those to be a global competitor in ecosystem of accelerator people who build and sell

accelerators. And now there's a lot of

accelerators. And now there's a lot of people excited about the opportunity for there to be a a a massive market for TPUs. How do you relate with your role

TPUs. How do you relate with your role at Google to the objectives of Google's internal use of TPUs versus the marketplace that you're competing in to to kind of compete and enable millions

and billions of people outside of Google to get the advantages through reselling TPUs in the competitive space? Yeah, I

mean uh the origin of the TPU program was really for our own internal needs initially focused on inference. So in

even in as far back as 2013, you know, we saw that uh kind of deep learning methods were going to be very successful and every time we trained a slightly larger model with more data, the results

got better in things like speech and vision. And uh I started to do some back

vision. And uh I started to do some back of the envelope calculations of like what would happen if we actually wanted to serve this much better speech model that's more comput intensive to say 100

million users for a few minutes a day.

And [snorts] the compute requirements got quite scary. Uh we would actually need to double the number of computers Google had overall in order to just roll out this improved speech model. Wow.

>> Um if we wanted to do it on CPUs. And so

that was really the the genesis of hey if we build specialized hardware that is tailored for these kinds of ML uh computations you know essentially dense

low precision linear algebra um we could actually do be way more efficient and that was borne out the first TPU ended up being 30 to 70 times more energy

efficient than contemporary CPUs or GPUs and 15 to 30 times faster >> and that was 2015 you said >> yeah So we started uh the thought experiment was 2013. The chips that

landed in our data center in 2015 and we wrote a paper about >> pre-transformer architecture >> pre-transformer. Yeah. So uh we actually

>> pre-transformer. Yeah. So uh we actually were focused on speech recognition and kind of vision convolutional models at the time. We squeezed in a little bit of

the time. We squeezed in a little bit of design change at the last minute into the TPUv1 to make it support LSTMs as well. uh which were kind of a in vogue

well. uh which were kind of a in vogue at the time for language modeling and that also enabled us to support uh language translation tasks and then subsequent versions of TPUs have focused

much more on much larger scale systems that are not just a single PCIe card but are you know a whole machine learning supercomputer including the latest Ironwood one >> um and you know every generation has

been a big improvement in both energy efficiency performance per dollar all these things that we that we care about um and enable us to scale much larger

training jobs much you know more serving of of requests to lots of users >> and of course the transformer architecture itself born at Google

pretty similar timeline but with the TPU invented before that and then transform architecture happening do you think there was serendipity in terms of co- uh

design between the applications of the transform architecture as they've grown up to change the world as we know it now and Google's access to this vertically integrated hardware stack.

>> Yeah, I mean every every generation of TPU we really try to take advantage of the code design opportunities we have with, you know, having a lot of researchers thinking about where are,

you know, ML computations we're going to want to run, you know, two and a half to six years from now going, which is the the exercise you have as a hardware designer is like trying to predict a

very fastmoving field. It's not a very easy thing, but having a lot of people kind of seeing where the field is going or you know this kind of thing might be interesting. We're not quite sure yet,

interesting. We're not quite sure yet, but we could put in this kind of hardware feature or this particular kind of capability and if we did that and this turned out to be important, then we

could have the hardware support there ready when that thing, you know, hopefully bears out, you know, that it is an important thing. And if it doesn't pay off, then sometimes you've just

devoted maybe a small area piece of the chip area to this thing that turned out to be less important than you thought.

But you really do want to be prepared for if this thing matters a lot, your hard work can support it.

>> Yeah.

>> So it's a interesting forecasting exercise is forecasting the whole ML field and trying to guess what we want.

Well, if we could let one person do it, uh, Chuck Norris of Computer Sciences would get my vote and has has obviously enough votes that that you are doing it at Google. And with your track record at

at Google. And with your track record at Google, there's a legacy of inventing things for Google's internal needs.

Google being the world's best systems building company for the applications Google's built, which have now become many head-ed map producing Google file system, something you did inside, you co-invented or invented inside of

Google. And then eventually you have

Google. And then eventually you have been able to witness that what Google built and demonstrates to the world as value and then publishes with the TPU architecture obviously the transformer

the ideas in the transformer are paper themselves but now do you think that there's a tipping point with iron wood for the rest of the world to sort of have access to the advantages that Google has had and I would imagine if I put myself in your shoes this experience

where it's like that was awesome and we did it at Google and we paved the way and now like a holy moly the rest of the world is also getting all the benefits researchers think about impact and that

feels like the moment we live for to be able to have it and if you're if you feel like you're at the tipping point on the TPU moment. Yeah, I mean I think obviously we've been using TPUs now for

more than a decade or about a decade and been really happy with them and the codeesigned properties really make them sort of useful for the kinds of machine learning computations we want to run and we've also been renting them externally

through our cloud TPU program for a number of years and so many many customers are using them for all kinds of things. Uh we've built a bunch of

of things. Uh we've built a bunch of software layers on top of TPUs that make them sort of quite convenient and easy to use. So you have I mean the most

to use. So you have I mean the most well-worn path for TPUs is Jacks on top of Pathways which is an internal system we've built that uh we're sort of

working to see if cloud customers would want uh access to >> on top of XLA which is a compiler ML compiler with a TPU backend and so what

this um tends to mean at least for pathways you know all of our Gemini development and research and training large scale training jobs run on top of that stack and pathways is this nice

system that we we built u I guess starting about seven years ago that gives you the illusion of a single system image across you know thousands

or tens of thousands of chips and so you can have like a single Python process running your Jax code and instead of it showing up as four devices where you're

running on a single TPU node it shows up as your Jax process has access to 20,000 devices And you it just sort of naturally works and figures out underneath the covers

exactly what transfer mechanisms to use and which you know which network to use.

It should use the within a pod a TPU pod it should use the high-speed interconnect and across pod boundaries it'll use the data center network across metropolitan areas it'll use longdistance links and so on. Um, so we

actually run, you know, very large scale training jobs where we have a single Python process driving multiple TPU pods in multiple cities.

>> Nice. Great. Well, maybe we can shift topics. Sure. You've been talking a lot,

topics. Sure. You've been talking a lot, I think, lately about the state of funding for academic research.

>> What's your message?

Yeah, I mean uh actually my colleagues Hoza and Partha Rangadath and and I along with Magda Balazinski at the University of Washington recently published uh one article in a whole

special issue of the computer communications ACM that was devoted to you know uh the impact of um you know academic research and in in our section

we discussed all the academic research that Google as a company was built on you know all the things that we relied

on in terms of like TCP IP and you know uh you know advanced risk processors uh and you know the internet and uh the Stanford digital library project which

is what sort of uh provided the funding for the original version of page rank at Stanford.

>> Oh yeah. And my colleague Dave Patterson also had an article on that uh that issue about all the amazing things that have come out of his um and his Berkeley colleagues many different five-year

labs. And so it's just really important

labs. And so it's just really important I feel to have you know a vibrant academic uh research uh ecosystem uh in the US and also in the world because

that often those early stage creative ideas are the things that lead to major major uh breakthroughs and innovations.

You know the whole of the deep learning revolution actually built on academic research from 30 40 years ago. you know,

the inventions of neural networks and back propagation and things like that are all, you know, central to what what we're doing even today and have been really important in the world. So, you

know, I I advocate that we should have a vibrant uh academic funding model for academic research because the returns are quite large to society.

>> Yeah. Excellent. And you and I and Dave Patterson and Joel Pin know are on the board of law institute which was born in part out of a paper that you and Dave and I and a bunch of seven other authors

published called shaping AI's impact on billions of lives where we advocated for the ways that AI research might impact society in areas like civic discourse

and healthcare and science and job reskilling and journalism and more policy. And then we also advocated that

policy. And then we also advocated that there in addition to things like 10xing down on NSF style funding, we can

explore and uh prototype other types of funding. So L institute raises money

funding. So L institute raises money from successful technologists who donate to a non law institute which is a nonprofit 501c3 which then in turn is

running a moonshot grant program specifically dedicated to funding research labs 3 to 5year research labs with 3 to 5 PIs 30 to 50 PhD students

targeting uh AI's impact on society in those areas I just said scientific progress healthcare job reskilling and civic discourse and you've been an an advocate hit for these alternative funding models as well in addition to

the traditional ones.

>> It was a lot of fun working on that paper with you and and Dave and uh the many other co-authors we had. You know,

I think the um the the thing I liked about that paper is we looked at a bunch of different areas where AI would have an impact and some of them you know if we get it right will be amazingly

positive impact and other areas you know is a little less clear. there might be some uh negative consequences of AI and what can we do overall across all these

different areas to maximize the potential upside of AI both from a technical computer science research ML perspective but also in con conjunction

with policy makers and with you know people in those fields like health or education or scientists and then also looked at the way in which we could all work together to sort of maximize those

benefits and and minimize the downside >> and specifically with research efforts that are in the 3 to 5 year time horizon that fit into a lab which is in contrast to a lot of the hype we hear in AI right

now that like this like pursuing AGI or super intelligence contrasted to trying to help with medical you know like success with frontline healthcare can you mitigate the the drudgery that

typical doctors feel or eliminate obstacles that radiologists might have to actually using the technology that already exists so I think it made made it feel very much much more real and specific and achievable.

>> I really like the 3 to 5 year time horizon kind of thing with a ambitious sort of set of people around a particular kind of thing they're trying to achieve because I feel like um often

that gets lots of different people working together with a a mix of skills in order to sort of really push forward something. uh and it's not so distant

something. uh and it's not so distant that it won't have impact, but it's not so short a time period that you can't conceive of doing something ambitious, right? Even in my own career, I've

right? Even in my own career, I've tended to think of like when I start on a new project, what could we do in 3 to 5 years? And I think that's a a

5 years? And I think that's a a delightful time range to to consider.

>> Nice. Yeah. And I'm wondering if you could share maybe some of your favorites. One thing I found while

favorites. One thing I found while working with you on that paper was always delightful is just how well connected you are to seems like dozens and dozens of bleeding edge projects by some of the most innovative thinkers and

researchers and builders in the world.

You both angel invest in them and you you know are generous in donating your time and energy to advise ambitious impactful research projects that want to go make a difference from climate to

science and specific discourse in healthcare. I think healthcare is one of

healthcare. I think healthcare is one of your passions on the program committee that we're buil we we built for the moonshot grant program which we now have all of our applications including touring award winners and Nobel laureates and coverage from the top

universities. So everything's working

universities. So everything's working according to plan so far for funding some research that actually moves the needle on these areas of society.

Curious with your background and with so exposure to so many active projects if you could just share some of your like one or two of your favorites. Yeah, I

mean I think I am quite passionate about the application of AI to health in various ways and I think the the moonshot if you like would be how can we as society use every past decision

that's been made in health to inform every future decision, right? And that's

a super hard goal because there's all kinds of uh impediments to doing that.

There's like very real privacy concerns.

There's complica complicated regulatory requirements that differ for every jurisdiction. But I think if we kind of

jurisdiction. But I think if we kind of aspirationally try to say what could we do so that we can learn from every past decision that's been made in a way that

helps us have every clinician and every person themselves be informed and make better decisions in the future. That

would be like a awesome amazing goal.

And I think you know a three to five year moonshot around that might be able to make some progress to that. Probably

can't get all the way there but it would be pretty amazing even if it made made it partway to that. Is your sense that with the current capabilities of AI

systems, the challenge for that would be more in the fitting the the adapting the existing health medical health records, legal considerations and what the

lawyers for insurance providers and the comp the hospitals themselves. That

might make it all sound sounds very hard like more of an implementation challenge than the capabilities. or do you think that the the capabilities have a ways to go before we would get the benefits?

>> Yeah, I mean I think there's a bunch of interesting technical researchy questions in there, but there are a bunch of kind of grungy how would you get the data in the right form to be

able to learn from it because it's in every different healthare system. It's

in slightly different forms and so on.

You probably have to use things like privacy preserving machine learning or federated learning or things like that.

So, how would you make that work on a technical perspective? Um, because

technical perspective? Um, because you're not going to be able to move healthcare data from where it sits.

Instead, you're going to need to be able to learn on the data in a privacy preserving way in a whole bunch of different, you know, environments. So,

there are real technical challenges, but there's also, as you, as you say, legal and regulatory kinds of challenges as well. But, you know, I think that's part

well. But, you know, I think that's part of why you want to have a whole group of people thinking about these issues with different kinds of expertise, right?

Like you need some people with machine learning expertise and, you know, computer systems building expertise as well as legal and policy and regulatory expertise.

>> Yeah, makes a lot of sense. Any other

projects that come to mind as a as a favorite? You know, I'm kind of enamored

favorite? You know, I'm kind of enamored these days about how can we make our computing systems even more efficient than the late latest cutting edge TPUs

or GPUs. I feel like there's room there

or GPUs. I feel like there's room there for interesting and innovative approaches for you know much lower cost uh say inference which seems like it's

going to be a a major thing in the world more than it already is. going back to even to the original 2013 napkin sketch for why TPUs should be born in the first place.

>> Yeah. I mean, if you redo that napkin sketch now, you're going to realize that we want, you know, first much lower latency systems than we have today. Uh,

as well as much more throughput and performance per watt is going to be a really important thing. So what can we do that would make way lower uh you know energy systems that still provide the

the same quality and performance. Mhm.

How do you see the relationship between all of the like the massive amount of research happening inside the Gemini team, inside deep mind more at large and in the now zooming out one more layer to

the AI ecosystem beyond Google the relationship for the academic research and research happening beyond Google's bounds and what happens inside Google.

Traditionally things like the transformer paper like map produce you've had these channels of exporting innovation outside of Google. I imagine

you also have you imported innovation and built on the shoulders of the giants outside and that's why like you gave examples already. Um have you do you see

examples already. Um have you do you see that evolving these days as Google with such a massive investment and such a leadership position in Gemini and in the

hardware kind of up and down the entire stack. has it evolved and does it need

stack. has it evolved and does it need to continue to evolve especially as we as we continue to face the we're trying to innovate for funding models for the others but it it's not looking good

[laughter] in my opinion and I can't speak for you but uh curious your thoughts on that that dynamic that relationship at the bound of Google and innovation happening in traditional mechanisms

>> besides I mean I I think there's obviously continual evolution about uh you know publishing models and so on or publishing uh you know characteristics ICS. So I think in this current

ICS. So I think in this current competitive dynamic we tend to not publish the secret sauce inside our architecture of our Gemini model say but

we do publish a lot of stuff in the sort of earlier stage research uh aspects of you know here are interesting new kinds of model architectures that we haven't proven out but we've experimented with

at small scale to publish them so that the rest of the ecosystem can you know pick up those ideas and explore them as well or build on them. And we also kind of look at the broader publishing

happening in the rest of the the community and sort of look at uh you know how could we adapt some of those to some of the problems we're seeing. Um

and I also don't think publishing has to be a we publish it or we don't kind of thing. There's really a continuum there

thing. There's really a continuum there about when do we publish and what do we publish. So I'll give you an example in

publish. So I'll give you an example in the computational photography work that Google research has been doing for many many years. We have awesome researchers

many years. We have awesome researchers in that field. Uh they often well almost annually come up with a really cool new thing that can go into the pixel camera

pip software pipeline. So things like night sight or astrophotography or magic eraser where you can erase like that person who wandered in front of your photo that you didn't want in the photo

>> in the first place. Um, and so what we tend to do there is we put it out into the Pixel the next Pixel N plus1 phone

that's coming out and then we sort of wait a little while and then we submit a SIGRAPH paper about the innovations that went into that feature. So it's sort of a little bit of a delay. We take

advantage of it in our products first and then we sort of let the rest of the community know about, you know, what is happening underneath the covers um and they can build on it. So I think that's a pretty nice thing and there's this

nice continuum of you know not just the end points being being choices. Can you

think of any off the top of your head examples of papers or ideas in that category you just mentioned of kind of the earlier more experimental stuff where you are publishing is either here at Nurips or happened recently that

you're finding exciting and have been engaging with the the non like outside Google and I'd love to talk a little bit more about how it's been how challenging it has it been or has it been to organize inside Google because it's it's

a it's an extremely impressive large organization and I can imagine internal conferences even happening that would be a little tiny any version of Nurips or something like that. But before you answer that one, yeah, I'm curious if you have any specific examples.

>> Yeah, I mean just one off the top of my head. Uh there was a paper published by

head. Uh there was a paper published by some Google research Google researchers here >> uh on kind of a hybrid between the transformer and recurrent models that's called Titan I think is the name. So

it's sort of looking at how can you have much longer context by using a recurrence relation uh but using chunks of tokens rather than individual uh small tokens and learning to kind of

compress the the sort of very porky representation of every token into something that's a little more compact and then have a whole sequence of those that you use recurrent steps on. So

that's just a good example of you know that is not in our Gemini models. It it

could be in the future, but it does seem like an interesting idea for to explore.

Uh, one more thing on the uh internal every every >> Somebody told me a call about a conference. They're like going to a

conference. They're like going to a Google conference and I was like, "Oh, that sounds so cool."

>> We have a Google research conference. It

has like 6,000 attendees every year. uh

and I know there's a sentiment if you talk to the PhD students here that the Google research conference might have papers that feel a year ahead of the papers you're seeing at at Nurips just

because there there is a gap between what's happening in the open and what's happening at Google. So, um I'm wondering besides making a conference, how have you found it to sort of be able

to build an organization that is so innovative and is able to generate the the frontier uh the state-of-the-art progress that that we're all >> Yeah. I mean, I think one of the one of

>> Yeah. I mean, I think one of the one of the reasons the internal research conference might feel a little bit like that is often, you know, for an external thing, you have to be quite far along in

your research idea to get it accepted and published. and the internal

and published. and the internal conference, you know, there's a whole range of of maturity of the work. And so

people are perfectly willing to have lightning sessions of like cool early stage results that aren't really fully baked yet. And I you get like 10 of

baked yet. And I you get like 10 of those in an hour session. So I think part of that is yes, it hasn't been published externally, but also part of it is is just trying to, you know,

circulate some of the ideas that are being explored with your colleagues and it has to be a little less fully poly.

>> Yeah. No, I'm I'm inspired by that. I

feel like Nurips is really impressive, >> very large, and maybe there's room for that architecture of a conference to be exported as well in terms of innovation.

>> I mean, the workshop stays here feel a little bit more like that because it's earlier stage work and so on, but it's >> still a fairly traditional thing of like

a PDF of some paper-like artifact. And

here are often these things are just talks with a few slides or not necessarily a full paper that someone had to write up.

>> Cool. Okay. Well, I think that's a wrap for us. Thanks for taking the time.

for us. Thanks for taking the time.

Appreciate all your thoughts.

>> Thank you. Appreciate it.

>> Enjoy the rest of Ner. And

>> it's beautiful here. It is.

Loading...

Loading video analysis...