Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

By Snowflake Inc.

Summary

## Key takeaways - **AI is the new electricity for builders**: Andrew Ng likens AI to electricity, a general-purpose technology that, while hard to define specific uses for, unlocks countless new applications and opportunities for builders. [00:28] - **Generative AI accelerates application development**: Generative AI drastically reduces development time for certain AI applications, enabling teams to prototype and deploy in days instead of months, shifting the path to invention towards rapid experimentation. [02:01] - **Agentic workflows outperform zero-shot prompting**: Agentic workflows, which involve iterative steps like outlining, research, drafting, and critiquing, deliver significantly better results than simple zero-shot prompting, as demonstrated by improved performance on coding benchmarks. [07:19] - **Four agentic workflow design patterns**: Key design patterns for agentic workflows include reflection (self-critique and improvement), tool use (API calls, web search), planning (sequencing actions), and multi-agent collaboration (simulating diverse roles). [10:03] - **Visual AI agents unlock unstructured data**: Agentic workflows are being extended to visual AI, enabling agents to process images and videos to extract information, generate metadata, and build applications that unlock value from previously difficult-to-access unstructured visual data. [15:15] - **Data engineering for unstructured data is crucial**: The rise of generative AI in processing text, images, and video increases the importance of data engineering, especially for managing unstructured data and its metadata to create value. [24:58]

Topics Covered

AI's Biggest Opportunity: Applications, Not Just Foundation Models.
Generative AI Accelerates Invention Through Fast Experimentation.
Move Fast, Be Responsible: The New AI Development Mantra.
Agentic AI is the Most Important Breakthrough.
Visual AI Agents Unlock Value from Unstructured Data.

Full Transcript

please welcome Andrew

[Applause]

in thank you it's such a good time to be

a builder I'm excited to be back here at

snowfake

build what i' like to do today is share

you where I think are some of ai's

biggest

opportunities you may have heard me say

that I think AI is the new electricity

that's because a has a general purpose

technology like electricity if I ask you

what is electricity good for it's always

hard to answer because it's good for so

many different things and new AI

technology is creating a huge set of

opportunities for us to build new

applications that weren't possible

before people often ask me hey Andrew

where are the biggest AI opportunities

this is what I think of as the AI stack

at the lowest level is the

semiconductors and then on top of that

lot of the cloud infr to including of

Course Snowflake and then on top of that

are many of the foundation model

trainers and models and it turns out

that a lot of the media hype and

excitement and social media Buzz has

been on these layers of the stack kind

of the new technology layers when if

there's a new technology like generative

AI L the buzz is on these technology

layers and there's nothing wrong with

that but I think that almost by

definition there's another layer of the

stack that has to work out even better

and that's the applic apption layer

because we need the applications to

generate even more value and even more

Revenue so that you know to really

afford to pay the technology providers

below so I spend a lot of my time

thinking about AI applications and I

think that's where lot of the best

opportunities will be to build new

things one of the trends that has been

growing for the last couple years in no

small pop because of generative AI is

fast and faster machine learning model

development um and in particular

generative AI is letting us build things

faster than ever before take the problem

of say building a sentiment cost vario

taking text and deciding is this a

positive or negative sentiment for

reputation monitoring say typical

workflow using supervised learning might

be that will take a month to get some

label data and then you know train AI

model that might take a few months and

then find a cloud service or something

to deploy on that'll take another few

months and so for a long time very

valuable AI systems might take good AI

teams six to 12 months to build right

and there's nothing wrong with that I

think many people create very valuable

AI systems this way but with generative

AI there's certain cles of applications

where you can write a prompt in days and

then deploy it in you know again maybe

days and what this means is there are a

lot of applications that used to take me

and used to take very good AI teams

months to build that today you can build

in maybe 10 days or so and this opens up

the opportunity to experiment with build

new prototypes and and ship new AI

products that's certainly the

prototyping aspect of it and these are

some of the consequences of this trend

which is fast experimentation is

becoming a more promising path to

invention previously if it took six

months to build something then you know

we better study it make sure there user

demand have product managers we look at

it document it and and then spend all

that effort to build in it hopefully it

turns out to be

worthwhile but now for fast moving AI

teams I see a design pattern where you

can say you know what it take us a

weekend to throw together prototype

let's build 20 prototypes and see what

SS and if 18 of them don't work out

we'll just stitch them and stick with

what works so fast iteration and fast

experimentation is becoming a new path

to inventing new user

experiences um one of interesting

implication is that evaluations or evals

for short are becoming a bigger

bottleneck for how we build things so it

turns out back in supervised learning

world if you're collecting 10,000 data

points anyway to trade a model then you

know if you needed to collect an extra

1,000 data points for testing it was

fine whereas extra 10% increase in cost

but for a lot of large language Mel

based apps if there's no need to have

any trading data if you made me slow

down to collect a thousand test examples

boy that seems like a huge bottleneck

and so the new Dev velopment workflow

often feels as if we're building and

collecting data more in parallel rather

than sequentially um in which we build a

prototype and then as it becomes import

more important and as robustness and

reliability becomes more important then

we gradually build up that test St here

in parallel but I see exciting

Innovations to be had still in how we

build evals um and then what I'm seeing

as well is the prototyping of machine

learning has become much faster but

building a software application has lots

of steps does the product work you know

the design work does the software

integration work a lot of Plumbing work

um then after deployment Dev Ops and L

Ops so some of those other pieces are

becoming faster but they haven't become

faster at the same rate that the machine

learning modeling pot has become faster

so you take a process and one piece of

it becomes much faster um what I'm

seeing is prototyping is not really

really fast but sometimes you take a

prototype into robust reliable

production with guard rails and so on

those other steps still take some time

but the interesting Dynamic I'm seeing

is the fact that the machine learning p

is so fast is putting a lot of pressure

on organizations to speed up all of

those other parts as well so that's been

exciting progress for our few and in

terms of how machine learning

development um is speeding things up I

think the Mantra moved fast and break

things got a bad rep because you know it

broke things um I think some people

interpret this to mean we shouldn't move

fast but I disagree with that I think

the better mindra is move fast and be

responsible I'm seeing a lot of teams

able to prototype quickly evaluate and

test robustly so without shipping

anything out to The Wider world that

could you know cause damage or cause um

meaningful harm I'm finding smart teams

able to build really quickly and move

really fast but also do this in a very

responsible way and I find this

exhilarating that you can build things

and ship things and responsible way much

faster than ever

before now there's a lot going on in Ai

and of all the things going on AI um in

terms of technical Trend the one Trend

I'm most excited about is agentic AI

workflows and so if you to ask what's

the one most important AI technology to

pay attention to I would say is agentic

AI um I think when I started saying this

you know near the beginning of this year

it was a bit of a controversial

statement but now the word AI agents has

is become so widely used uh by by

Technical and non-technical people is

become you know little bit of a hype

term uh but so let me just share with

you how I view AI agents and why I think

they're important approaching just from

a technical

perspective the way that most of us use

large language models today is with what

something is called zero shot prompting

and that roughly means we would ask it

to uh give it a prompt write an essay or

write an output for us and it's a bit

like if we're going to a person or in

this case going to an AI and asking it

to type out an essay for us by going

from the first word writing from the

first word to the last word all in one

go without ever using backspac just

right from start to finish like that and

it turns out people you know we don't do

our best writing this way uh but despite

the difficulty of being forced to write

this way a Lish models do you know not

bad pretty

well here's what an agentic workflow

it's like uh to gener an essay we ask an

AI to First write an essay outline and

ask you do you need to do some web

research if so let's download some web

pages and put into the context of the

large H model then let's write the first

draft and then let's read the first

draft and critique it and revise the

draft and so on and this workflow looks

more like um doing some thinking or some

research and then some revision and then

going back to do more thinking and more

research and by going round this Loop

over and over um it takes longer but

this results in a much better work

output so in some teams I work with we

apply this agentic workflow to

processing complex tricky legal

documents or to um do Health Care

diagnosis Assistance or to do very

complex compliance with government

paperwork so many times I'm seeing this

drive much better results than was ever

possible and one thing I'm want to focus

on in this presentation I'll talk about

later is devise of visual AI where

agentic repal are letting us process

image and video data

but to get back to that later um it

turns out that there are benchmarks that

show seem to show a gentic workflows

deliver much better results um this is

the human eval Benchmark which is a

benchmark for open AI that measures

learning out lar rage model's ability to

solve coding puzzles like this one and

um my team collected some data turns out

that um on this Benchmark I think it was

POS K Benchmark POS K metric GB 3.5 got

48% right on this coding Benchmark gb4

huge Improvement you know

67% but the improvement from GB 3.5 to

gbd4 is dwarf by the improvement from

gbt 3.5 to GB 3.5 using an agentic

workflow um which gets over up to about

95% and gbd4 with an agentic workflow

also does much better um and so it turns

out that in the way Builders built

agentic reasoning or agentic workflows

in their applications there are I want

to say four major design patterns which

are reflection two use planning and

multi-agent collaboration and to

demystify agentic workflows a little bit

let me quickly step through what these

workflows mean um and I find that

agentic workflows sometimes seem a

little bit mysterious until you actually

read through the code for one or two of

these go oh that's it you know that's

really cool but oh that's all it takes

but let me just step through

um to for for concreteness what

reflection with ls looks like so I might

start off uh prompting an L there a

coder agent l so maybe an assistant

message to your roles to be a coder and

write code um so you can tell you know

please write code for certain tasks and

the L May generate codes and then it

turns out that you can construct a

prompt that takes the code that was just

generated and copy paste the code back

into the prompt and ask it you know he

some code intended for a Tas examine

this code and critique it right and it

turns out you prompt the same Elum this

way it may sometimes um find some

problems with it or make some useful

suggestions out proofy code then you

prompt the same LM with the feedback and

ask you to improve the code and become

with with a new version and uh maybe

foreshadowing two use you can have the

LM run some unit tests and give the

feedback of the unit test back to the LM

then that can be additional feedback to

help it iterate further to further

improve the code and it turns out that

this type of reflection workflow is not

magic doesn't solve all problems um but

it will often take the Baseline level

performance and lift it uh to to better

level performance and it turns out also

with this type of workflow where we're

think of prompting an LM to critique his

own output use it own criticism to

improve it this may be also foreshadows

multi-agent planning or multi-agent

workflows where you can prompt one

prompt an LM to sometimes play the role

of a coder and sometimes prom on to play

the role of a CR of a Critic um to

review the code so such the same

conversation but we can prompt the LM

you know differently to tell sometimes

work on the code sometimes try to make

helpful suggestions and this same

results in improved performance so this

is a reflection design pattern um and

second major design pattern is to use uh

in which a lar language model can be

prompted to generate a request for an

API call to have it decide when it needs

to uh search the web or execute code or

take a the task like um issue a customer

refund or send an email or pull up a

calendar entry so to use is a major

design pattern that is letting large

language models make function calls and

I think this is expanding what we can do

with these agentic workflows um real

quick here's a planning or reasoning

design pattern in which if you were to

give a fairly complex request you know

generate image or where girls reading a

book and so on then an LM this example

adapted from the hugging GTP paper an LM

can look at the picture and decide to

first use a um open pose model to detect

the pose and then after that gener

picture of a girl um after that you'll

describe the image and after that use

sex the spe or TTS to generate the audio

but so in planning you an L look at a

complex request and pick a sequence of

actions execute in order to deliver on a

complex task um and lastly multi Asian

collaboration is that design pattern

alluded to where instead of prompting an

LM to just do one thing you prompt the

LM to play different roles at different

points in time so the different agents

simulate agents interact with each other

and come together to solve a task and I

know that some people may may wonder you

know if you're using one why do you need

to make this one play the role with

multip multiple agents um many teams

have demonstrated significant improved

performance for a variety of tasks using

this design pattern and it turns out

that if you have an LM sometimes

specialize on different tasks maybe one

at a time have it interact many teams

seem to really get much better results

using this I feel like maybe um there's

an analogy to if you're running jobs on

a processor on a CPU you why do we need

multiple processes it's all the same

process there you know at the end of the

day but we found that having multiple FS

of processes is a useful extraction for

developers to take a task and break it

down to subtask and I think multi-agent

collaboration is a bit like that too if

you were big task then if you think of

hiring a bunch of agents to do different

pieces of task then interact sometimes

that helps the developer um build

complex systems to deliver a good

result so I think with these four major

agentic design patterns agentic

reasoning workflow design patterns um it

gives us a huge space to play with to

build Rich agents to do things that

frankly were just not possible you know

even a year ago um and I want to one

aspect of this I'm particularly excited

about is the rise of not not just large

language model B agents but large

multimodal based a large multimodal

model based agents so um give an image

like this if you were wanted to uh use a

lmm large multimodal model you could

actually do zero shot PR and that's a

bit like telling it you know take a

glance at the image and just tell me the

output and for simple image thoughts

that's okay you can actually have it you

know look at the image and uh right give

you the numbers of the runners or

something but it turns out just as with

large language modelbased agents SL

multi modelbased model based agents can

do better with an itative workflow where

you can approach this problem step by

step so detect the faces detect the

numbers put it together and so with this

more irrit workflow uh you can actually

get an agent to do some planning testing

right code plan test right code and come

up with a most complex plan as

articulated expressing code to deliver

on more complex thoughts so what I like

to do is um show you a demo of some work

that uh Dan Malone and I and the H AI

team has been working on on building

agentic workflows for visual AI

tasks so if we switch to my

laptop

um let me have an image here of a uh

soccer game or football game and um I'm

going to say let's see counts the

players in the vi oh and just so fun if

you're not how to prompt it after

uploading an image This little light

bulb here you know gives some suggested

prompts you may ask for this uh but let

me run this so count players on the

field right and what this kicks off is a

process that actually runs for a couple

minutes um to Think Through how to write

code uh in order to come up a plan to

give an accurate result for uh counting

the number of players in the few this is

actually a little bit complex because

you don't want the players in the

background just be in the few I already

ran this earlier so we just jumped to

the result um but it says the Cod has

selected seven players on the field and

I think that should right 1 2 3 4 5 six

seven

um and if I were to zoom in to the model

output Now 1 2 3 4 five six seven I

think that's actually right and the part

of the output of this is that um it has

also generated code uh that you can run

over and over um actually generated

python code uh

that if you want you can run over and

over on the large collection of images

es and I think this is exciting because

there are a lot of companies um and

teams that actually have a lot of visual

AI data have a lot of images um have a

lot of videos kind of stored somewhere

and until now it's been really difficult

to get value out of this data so for a

lot of the you know small teams or large

businesses with a lot of visual data

visual AI capabilities like the vision

agent lets you take all this data

previously shove somewhere in BL storage

and and you know get real value out of

this I think this is a big

transformation for AI um here's another

example you know this says um given a

video split this another soccer game or

football

game so given video split the video

clips of 5 Seconds find the clip where

go is being scored display a frame so

output so Rand is already because takes

a little the time to run then this will

generate code evaluate code for a while

and this is the output and it says true

1015 so it think those a go St you know

around here around between

the right and there you go that's the go

and also as instructed you know

extracted some of the frames associated

with this so really useful for

processing um video data and maybe

here's one last example uh of of of the

vision agent which is um you can also

ask it FR program to split the input

video into small video chunks every 6

seconds describe each chunk andore the

information at Panda's data frame along

with clip name s and end time return the

Panda's data frame so this is a way to

look at video data that you may have and

generate metadata for this uh that you

can then store you know in snow fake or

somewhere uh to then build other

applications on top of but just to show

you the output of this um so you know

clip name start time end time and then

there actually written code um here

right wrot code that you can then run

elsewhere if you want uh let me put in a

stream the tab or something that you can

then use to then write a lot of you know

text descriptions for this um and using

this capability of the vision agent to

help write code my team at Landing AI

actually built this little demo app that

um uses code from the vision agent so

instead of us sing the write code have

the Vision agent write the code to build

this metadata and then um indexes a

bunch of videos so let's see I say

browsing so skar airborne right I

actually ran this earlier hope it works

so what this demo shows is um we already

ran the code to take the video split in

chunks store the metadata and then when

I do a search for skier Airborne you

know it shows the clips uh that have

high

similarity right right oh marked here

with the green has high similarity well

this is getting my heart rate out seeing

do that oh here's another one whoa all

right all right and and the green parts

of the timeline show where the skier is

Airborne let's see gray wolf at night I

actually find it pretty fun yeah when

when you have a collection of video to

index it and then just browse through

right here's a gray wolf at night and

this timeline in green shows what a gr

wolf and Knight is and if I actually

jump to different part of the video

there's a bunch of other stuff as well

right there that's not a g wolf at night

so I that's pretty cool

um let's see just one last example so

um yeah if I actually been on the road a

lot uh but if sear if your luggage this

black luggage right

um there this but it turns out turns out

there actually a lot of black Luggage So

if you want your luggage let's say black

luggage with

rainbow strap this there a lot of black

luggage out

there

then you know there right black luggage

with rainbow strap so a lot of fun

things to do um and I think the nice

thing about this is uh the work needed

to build applications like this is lower

than ever before so let's go back to the

slides

um

and in terms of AI opportunities I spoke

a bit about agentic workflows and um how

that is changing the AI stack is as

follows it turns out that in addition to

this stack I show there's actually a new

emerging um agentic orchestration layer

and there little orchestration layer

like L chain that been around for a

while that are also becoming

increasingly agentic through langra for

example and this new agentic

orchestration layer is also making

easier for developers to build

applications on top uh and I hope that

Landing ai's Vision agent is another

contribution to this to makes it easier

for you to build visual AI applications

to process all this image and video data

that possibly you had but that was

really hard to get value all of um until

until more recently so but fire when I

you what to think are maybe four of the

most important AI Trends there's a lot

going on on AI is impossible to

summarize everything in one slide if you

had to make me pick what's the one most

important Trend I would say is a gentic

AI but here are four of things I think

are worth paying attention to first um

turns out agentic workflows need to read

a lot of text or images and generate a

lot of text so we say that generates a

lot of tokens and their exciting efforts

to speed up token generation including

semiconductor work by Sova Service drop

and others a lot of software and other

types of Hardware work as well this will

make a gentic workflows work much better

second Trend I'm about excited about

today's large language models has

started off being optimized to answer

human questions and human generated

instructions things like you know why

did Shakespeare write mcbath or explain

why Shakespeare wrote Mac beath these

are the types of questions that L

langage models are often as answer on

the internet but agentic workflows call

for other operations like to use so the

fact that large language models are

often now tuned explicitly to support

tool use or just a couple weeks ago um

anthropic release a model that can

support computer use I think these

exciting developments are create a lot

of lift rate create a much higher

ceiling for what we can now get atic

workloads to do with L langage models

that tune not just to answer human

queries but to tune EXA explicitly to

fit into these erative agentic workflows

um third

data engineering's importance is rising

particularly with unstructured data it

turns out that a lot of the value of

machine learning was a Structure data

kind of tables of numbers but with geni

we're much better than ever before at

processing text and images and video and

maybe audio and so the importance of

data engineering is increasing in terms

of how to manage your unstructured data

and the metad DAT for that and

deployment to get the unstructured data

where it needs to go to create value so

that that would be a major effort for a

lot of large businesses and then lastly

um I think we've all seen that the text

processing revolution has already

arrived the image processing Revolution

is in a slightly early phase but it is

coming and as it comes many people many

businesses um will be able to get a lot

more value out of the visual data than

was possible ever before and I'm excited

because I think that will significantly

increase the space of applications we

can build as well so just wrap up this

is a great time to be a builder uh gen

is learning us experiment faster than

ever a gentic AI is expanding the set of

things that now possible and there just

so many new applications that we can now

build in visual AI or not in visual AI

that just weren't possible ever before

if you're interested in checking out the

uh visual AI demos that I ran uh please

go to va. landing.ai the exact demos

that I ran you better try out yourself

online and get the code and uh run code

yourself in your own applications so

with that let me say thank you all very

much and please also join me in

welcoming Elsa back onto the stage thank

you

Loading...

Loading video analysis...