Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote
By Snowflake Inc.
Summary
## Key takeaways - **AI is the new electricity for builders**: Andrew Ng likens AI to electricity, a general-purpose technology that, while hard to define specific uses for, unlocks countless new applications and opportunities for builders. [00:28] - **Generative AI accelerates application development**: Generative AI drastically reduces development time for certain AI applications, enabling teams to prototype and deploy in days instead of months, shifting the path to invention towards rapid experimentation. [02:01] - **Agentic workflows outperform zero-shot prompting**: Agentic workflows, which involve iterative steps like outlining, research, drafting, and critiquing, deliver significantly better results than simple zero-shot prompting, as demonstrated by improved performance on coding benchmarks. [07:19] - **Four agentic workflow design patterns**: Key design patterns for agentic workflows include reflection (self-critique and improvement), tool use (API calls, web search), planning (sequencing actions), and multi-agent collaboration (simulating diverse roles). [10:03] - **Visual AI agents unlock unstructured data**: Agentic workflows are being extended to visual AI, enabling agents to process images and videos to extract information, generate metadata, and build applications that unlock value from previously difficult-to-access unstructured visual data. [15:15] - **Data engineering for unstructured data is crucial**: The rise of generative AI in processing text, images, and video increases the importance of data engineering, especially for managing unstructured data and its metadata to create value. [24:58]
Topics Covered
- AI's Biggest Opportunity: Applications, Not Just Foundation Models.
- Generative AI Accelerates Invention Through Fast Experimentation.
- Move Fast, Be Responsible: The New AI Development Mantra.
- Agentic AI is the Most Important Breakthrough.
- Visual AI Agents Unlock Value from Unstructured Data.
Full Transcript
please welcome Andrew
[Applause]
in thank you it's such a good time to be
a builder I'm excited to be back here at
snowfake
build what i' like to do today is share
you where I think are some of ai's
biggest
opportunities you may have heard me say
that I think AI is the new electricity
that's because a has a general purpose
technology like electricity if I ask you
what is electricity good for it's always
hard to answer because it's good for so
many different things and new AI
technology is creating a huge set of
opportunities for us to build new
applications that weren't possible
before people often ask me hey Andrew
where are the biggest AI opportunities
this is what I think of as the AI stack
at the lowest level is the
semiconductors and then on top of that
lot of the cloud infr to including of
Course Snowflake and then on top of that
are many of the foundation model
trainers and models and it turns out
that a lot of the media hype and
excitement and social media Buzz has
been on these layers of the stack kind
of the new technology layers when if
there's a new technology like generative
AI L the buzz is on these technology
layers and there's nothing wrong with
that but I think that almost by
definition there's another layer of the
stack that has to work out even better
and that's the applic apption layer
because we need the applications to
generate even more value and even more
Revenue so that you know to really
afford to pay the technology providers
below so I spend a lot of my time
thinking about AI applications and I
think that's where lot of the best
opportunities will be to build new
things one of the trends that has been
growing for the last couple years in no
small pop because of generative AI is
fast and faster machine learning model
development um and in particular
generative AI is letting us build things
faster than ever before take the problem
of say building a sentiment cost vario
taking text and deciding is this a
positive or negative sentiment for
reputation monitoring say typical
workflow using supervised learning might
be that will take a month to get some
label data and then you know train AI
model that might take a few months and
then find a cloud service or something
to deploy on that'll take another few
months and so for a long time very
valuable AI systems might take good AI
teams six to 12 months to build right
and there's nothing wrong with that I
think many people create very valuable
AI systems this way but with generative
AI there's certain cles of applications
where you can write a prompt in days and
then deploy it in you know again maybe
days and what this means is there are a
lot of applications that used to take me
and used to take very good AI teams
months to build that today you can build
in maybe 10 days or so and this opens up
the opportunity to experiment with build
new prototypes and and ship new AI
products that's certainly the
prototyping aspect of it and these are
some of the consequences of this trend
which is fast experimentation is
becoming a more promising path to
invention previously if it took six
months to build something then you know
we better study it make sure there user
demand have product managers we look at
it document it and and then spend all
that effort to build in it hopefully it
turns out to be
worthwhile but now for fast moving AI
teams I see a design pattern where you
can say you know what it take us a
weekend to throw together prototype
let's build 20 prototypes and see what
SS and if 18 of them don't work out
we'll just stitch them and stick with
what works so fast iteration and fast
experimentation is becoming a new path
to inventing new user
experiences um one of interesting
implication is that evaluations or evals
for short are becoming a bigger
bottleneck for how we build things so it
turns out back in supervised learning
world if you're collecting 10,000 data
points anyway to trade a model then you
know if you needed to collect an extra
1,000 data points for testing it was
fine whereas extra 10% increase in cost
but for a lot of large language Mel
based apps if there's no need to have
any trading data if you made me slow
down to collect a thousand test examples
boy that seems like a huge bottleneck
and so the new Dev velopment workflow
often feels as if we're building and
collecting data more in parallel rather
than sequentially um in which we build a
prototype and then as it becomes import
more important and as robustness and
reliability becomes more important then
we gradually build up that test St here
in parallel but I see exciting
Innovations to be had still in how we
build evals um and then what I'm seeing
as well is the prototyping of machine
learning has become much faster but
building a software application has lots
of steps does the product work you know
the design work does the software
integration work a lot of Plumbing work
um then after deployment Dev Ops and L
Ops so some of those other pieces are
becoming faster but they haven't become
faster at the same rate that the machine
learning modeling pot has become faster
so you take a process and one piece of
it becomes much faster um what I'm
seeing is prototyping is not really
really fast but sometimes you take a
prototype into robust reliable
production with guard rails and so on
those other steps still take some time
but the interesting Dynamic I'm seeing
is the fact that the machine learning p
is so fast is putting a lot of pressure
on organizations to speed up all of
those other parts as well so that's been
exciting progress for our few and in
terms of how machine learning
development um is speeding things up I
think the Mantra moved fast and break
things got a bad rep because you know it
broke things um I think some people
interpret this to mean we shouldn't move
fast but I disagree with that I think
the better mindra is move fast and be
responsible I'm seeing a lot of teams
able to prototype quickly evaluate and
test robustly so without shipping
anything out to The Wider world that
could you know cause damage or cause um
meaningful harm I'm finding smart teams
able to build really quickly and move
really fast but also do this in a very
responsible way and I find this
exhilarating that you can build things
and ship things and responsible way much
faster than ever
before now there's a lot going on in Ai
and of all the things going on AI um in
terms of technical Trend the one Trend
I'm most excited about is agentic AI
workflows and so if you to ask what's
the one most important AI technology to
pay attention to I would say is agentic
AI um I think when I started saying this
you know near the beginning of this year
it was a bit of a controversial
statement but now the word AI agents has
is become so widely used uh by by
Technical and non-technical people is
become you know little bit of a hype
term uh but so let me just share with
you how I view AI agents and why I think
they're important approaching just from
a technical
perspective the way that most of us use
large language models today is with what
something is called zero shot prompting
and that roughly means we would ask it
to uh give it a prompt write an essay or
write an output for us and it's a bit
like if we're going to a person or in
this case going to an AI and asking it
to type out an essay for us by going
from the first word writing from the
first word to the last word all in one
go without ever using backspac just
right from start to finish like that and
it turns out people you know we don't do
our best writing this way uh but despite
the difficulty of being forced to write
this way a Lish models do you know not
bad pretty
well here's what an agentic workflow
it's like uh to gener an essay we ask an
AI to First write an essay outline and
ask you do you need to do some web
research if so let's download some web
pages and put into the context of the
large H model then let's write the first
draft and then let's read the first
draft and critique it and revise the
draft and so on and this workflow looks
more like um doing some thinking or some
research and then some revision and then
going back to do more thinking and more
research and by going round this Loop
over and over um it takes longer but
this results in a much better work
output so in some teams I work with we
apply this agentic workflow to
processing complex tricky legal
documents or to um do Health Care
diagnosis Assistance or to do very
complex compliance with government
paperwork so many times I'm seeing this
drive much better results than was ever
possible and one thing I'm want to focus
on in this presentation I'll talk about
later is devise of visual AI where
agentic repal are letting us process
image and video data
but to get back to that later um it
turns out that there are benchmarks that
show seem to show a gentic workflows
deliver much better results um this is
the human eval Benchmark which is a
benchmark for open AI that measures
learning out lar rage model's ability to
solve coding puzzles like this one and
um my team collected some data turns out
that um on this Benchmark I think it was
POS K Benchmark POS K metric GB 3.5 got
48% right on this coding Benchmark gb4
huge Improvement you know
67% but the improvement from GB 3.5 to
gbd4 is dwarf by the improvement from
gbt 3.5 to GB 3.5 using an agentic
workflow um which gets over up to about
95% and gbd4 with an agentic workflow
also does much better um and so it turns
out that in the way Builders built
agentic reasoning or agentic workflows
in their applications there are I want
to say four major design patterns which
are reflection two use planning and
multi-agent collaboration and to
demystify agentic workflows a little bit
let me quickly step through what these
workflows mean um and I find that
agentic workflows sometimes seem a
little bit mysterious until you actually
read through the code for one or two of
these go oh that's it you know that's
really cool but oh that's all it takes
but let me just step through
um to for for concreteness what
reflection with ls looks like so I might
start off uh prompting an L there a
coder agent l so maybe an assistant
message to your roles to be a coder and
write code um so you can tell you know
please write code for certain tasks and
the L May generate codes and then it
turns out that you can construct a
prompt that takes the code that was just
generated and copy paste the code back
into the prompt and ask it you know he
some code intended for a Tas examine
this code and critique it right and it
turns out you prompt the same Elum this
way it may sometimes um find some
problems with it or make some useful
suggestions out proofy code then you
prompt the same LM with the feedback and
ask you to improve the code and become
with with a new version and uh maybe
foreshadowing two use you can have the
LM run some unit tests and give the
feedback of the unit test back to the LM
then that can be additional feedback to
help it iterate further to further
improve the code and it turns out that
this type of reflection workflow is not
magic doesn't solve all problems um but
it will often take the Baseline level
performance and lift it uh to to better
level performance and it turns out also
with this type of workflow where we're
think of prompting an LM to critique his
own output use it own criticism to
improve it this may be also foreshadows
multi-agent planning or multi-agent
workflows where you can prompt one
prompt an LM to sometimes play the role
of a coder and sometimes prom on to play
the role of a CR of a Critic um to
review the code so such the same
conversation but we can prompt the LM
you know differently to tell sometimes
work on the code sometimes try to make
helpful suggestions and this same
results in improved performance so this
is a reflection design pattern um and
second major design pattern is to use uh
in which a lar language model can be
prompted to generate a request for an
API call to have it decide when it needs
to uh search the web or execute code or
take a the task like um issue a customer
refund or send an email or pull up a
calendar entry so to use is a major
design pattern that is letting large
language models make function calls and
I think this is expanding what we can do
with these agentic workflows um real
quick here's a planning or reasoning
design pattern in which if you were to
give a fairly complex request you know
generate image or where girls reading a
book and so on then an LM this example
adapted from the hugging GTP paper an LM
can look at the picture and decide to
first use a um open pose model to detect
the pose and then after that gener
picture of a girl um after that you'll
describe the image and after that use
sex the spe or TTS to generate the audio
but so in planning you an L look at a
complex request and pick a sequence of
actions execute in order to deliver on a
complex task um and lastly multi Asian
collaboration is that design pattern
alluded to where instead of prompting an
LM to just do one thing you prompt the
LM to play different roles at different
points in time so the different agents
simulate agents interact with each other
and come together to solve a task and I
know that some people may may wonder you
know if you're using one why do you need
to make this one play the role with
multip multiple agents um many teams
have demonstrated significant improved
performance for a variety of tasks using
this design pattern and it turns out
that if you have an LM sometimes
specialize on different tasks maybe one
at a time have it interact many teams
seem to really get much better results
using this I feel like maybe um there's
an analogy to if you're running jobs on
a processor on a CPU you why do we need
multiple processes it's all the same
process there you know at the end of the
day but we found that having multiple FS
of processes is a useful extraction for
developers to take a task and break it
down to subtask and I think multi-agent
collaboration is a bit like that too if
you were big task then if you think of
hiring a bunch of agents to do different
pieces of task then interact sometimes
that helps the developer um build
complex systems to deliver a good
result so I think with these four major
agentic design patterns agentic
reasoning workflow design patterns um it
gives us a huge space to play with to
build Rich agents to do things that
frankly were just not possible you know
even a year ago um and I want to one
aspect of this I'm particularly excited
about is the rise of not not just large
language model B agents but large
multimodal based a large multimodal
model based agents so um give an image
like this if you were wanted to uh use a
lmm large multimodal model you could
actually do zero shot PR and that's a
bit like telling it you know take a
glance at the image and just tell me the
output and for simple image thoughts
that's okay you can actually have it you
know look at the image and uh right give
you the numbers of the runners or
something but it turns out just as with
large language modelbased agents SL
multi modelbased model based agents can
do better with an itative workflow where
you can approach this problem step by
step so detect the faces detect the
numbers put it together and so with this
more irrit workflow uh you can actually
get an agent to do some planning testing
right code plan test right code and come
up with a most complex plan as
articulated expressing code to deliver
on more complex thoughts so what I like
to do is um show you a demo of some work
that uh Dan Malone and I and the H AI
team has been working on on building
agentic workflows for visual AI
tasks so if we switch to my
laptop
um let me have an image here of a uh
soccer game or football game and um I'm
going to say let's see counts the
players in the vi oh and just so fun if
you're not how to prompt it after
uploading an image This little light
bulb here you know gives some suggested
prompts you may ask for this uh but let
me run this so count players on the
field right and what this kicks off is a
process that actually runs for a couple
minutes um to Think Through how to write
code uh in order to come up a plan to
give an accurate result for uh counting
the number of players in the few this is
actually a little bit complex because
you don't want the players in the
background just be in the few I already
ran this earlier so we just jumped to
the result um but it says the Cod has
selected seven players on the field and
I think that should right 1 2 3 4 5 six
seven
um and if I were to zoom in to the model
output Now 1 2 3 4 five six seven I
think that's actually right and the part
of the output of this is that um it has
also generated code uh that you can run
over and over um actually generated
python code uh
that if you want you can run over and
over on the large collection of images
es and I think this is exciting because
there are a lot of companies um and
teams that actually have a lot of visual
AI data have a lot of images um have a
lot of videos kind of stored somewhere
and until now it's been really difficult
to get value out of this data so for a
lot of the you know small teams or large
businesses with a lot of visual data
visual AI capabilities like the vision
agent lets you take all this data
previously shove somewhere in BL storage
and and you know get real value out of
this I think this is a big
transformation for AI um here's another
example you know this says um given a
video split this another soccer game or
football
game so given video split the video
clips of 5 Seconds find the clip where
go is being scored display a frame so
output so Rand is already because takes
a little the time to run then this will
generate code evaluate code for a while
and this is the output and it says true
1015 so it think those a go St you know
around here around between
the right and there you go that's the go
and also as instructed you know
extracted some of the frames associated
with this so really useful for
processing um video data and maybe
here's one last example uh of of of the
vision agent which is um you can also
ask it FR program to split the input
video into small video chunks every 6
seconds describe each chunk andore the
information at Panda's data frame along
with clip name s and end time return the
Panda's data frame so this is a way to
look at video data that you may have and
generate metadata for this uh that you
can then store you know in snow fake or
somewhere uh to then build other
applications on top of but just to show
you the output of this um so you know
clip name start time end time and then
there actually written code um here
right wrot code that you can then run
elsewhere if you want uh let me put in a
stream the tab or something that you can
then use to then write a lot of you know
text descriptions for this um and using
this capability of the vision agent to
help write code my team at Landing AI
actually built this little demo app that
um uses code from the vision agent so
instead of us sing the write code have
the Vision agent write the code to build
this metadata and then um indexes a
bunch of videos so let's see I say
browsing so skar airborne right I
actually ran this earlier hope it works
so what this demo shows is um we already
ran the code to take the video split in
chunks store the metadata and then when
I do a search for skier Airborne you
know it shows the clips uh that have
high
similarity right right oh marked here
with the green has high similarity well
this is getting my heart rate out seeing
do that oh here's another one whoa all
right all right and and the green parts
of the timeline show where the skier is
Airborne let's see gray wolf at night I
actually find it pretty fun yeah when
when you have a collection of video to
index it and then just browse through
right here's a gray wolf at night and
this timeline in green shows what a gr
wolf and Knight is and if I actually
jump to different part of the video
there's a bunch of other stuff as well
right there that's not a g wolf at night
so I that's pretty cool
um let's see just one last example so
um yeah if I actually been on the road a
lot uh but if sear if your luggage this
black luggage right
um there this but it turns out turns out
there actually a lot of black Luggage So
if you want your luggage let's say black
luggage with
rainbow strap this there a lot of black
luggage out
there
then you know there right black luggage
with rainbow strap so a lot of fun
things to do um and I think the nice
thing about this is uh the work needed
to build applications like this is lower
than ever before so let's go back to the
slides
um
and in terms of AI opportunities I spoke
a bit about agentic workflows and um how
that is changing the AI stack is as
follows it turns out that in addition to
this stack I show there's actually a new
emerging um agentic orchestration layer
and there little orchestration layer
like L chain that been around for a
while that are also becoming
increasingly agentic through langra for
example and this new agentic
orchestration layer is also making
easier for developers to build
applications on top uh and I hope that
Landing ai's Vision agent is another
contribution to this to makes it easier
for you to build visual AI applications
to process all this image and video data
that possibly you had but that was
really hard to get value all of um until
until more recently so but fire when I
you what to think are maybe four of the
most important AI Trends there's a lot
going on on AI is impossible to
summarize everything in one slide if you
had to make me pick what's the one most
important Trend I would say is a gentic
AI but here are four of things I think
are worth paying attention to first um
turns out agentic workflows need to read
a lot of text or images and generate a
lot of text so we say that generates a
lot of tokens and their exciting efforts
to speed up token generation including
semiconductor work by Sova Service drop
and others a lot of software and other
types of Hardware work as well this will
make a gentic workflows work much better
second Trend I'm about excited about
today's large language models has
started off being optimized to answer
human questions and human generated
instructions things like you know why
did Shakespeare write mcbath or explain
why Shakespeare wrote Mac beath these
are the types of questions that L
langage models are often as answer on
the internet but agentic workflows call
for other operations like to use so the
fact that large language models are
often now tuned explicitly to support
tool use or just a couple weeks ago um
anthropic release a model that can
support computer use I think these
exciting developments are create a lot
of lift rate create a much higher
ceiling for what we can now get atic
workloads to do with L langage models
that tune not just to answer human
queries but to tune EXA explicitly to
fit into these erative agentic workflows
um third
data engineering's importance is rising
particularly with unstructured data it
turns out that a lot of the value of
machine learning was a Structure data
kind of tables of numbers but with geni
we're much better than ever before at
processing text and images and video and
maybe audio and so the importance of
data engineering is increasing in terms
of how to manage your unstructured data
and the metad DAT for that and
deployment to get the unstructured data
where it needs to go to create value so
that that would be a major effort for a
lot of large businesses and then lastly
um I think we've all seen that the text
processing revolution has already
arrived the image processing Revolution
is in a slightly early phase but it is
coming and as it comes many people many
businesses um will be able to get a lot
more value out of the visual data than
was possible ever before and I'm excited
because I think that will significantly
increase the space of applications we
can build as well so just wrap up this
is a great time to be a builder uh gen
is learning us experiment faster than
ever a gentic AI is expanding the set of
things that now possible and there just
so many new applications that we can now
build in visual AI or not in visual AI
that just weren't possible ever before
if you're interested in checking out the
uh visual AI demos that I ran uh please
go to va. landing.ai the exact demos
that I ran you better try out yourself
online and get the code and uh run code
yourself in your own applications so
with that let me say thank you all very
much and please also join me in
welcoming Elsa back onto the stage thank
you
Loading video analysis...