CS50 Fall 2025 - Artificial Intelligence (live, unedited)
By CS50
Summary
## Key takeaways - **AI can be a virtual rubber duck**: The CS50 duck, initially responding with quacks, now uses AI to guide students to solutions rather than providing direct answers, mimicking a less helpful version of ChatGPT. [02:54], [03:29] - **Prompt engineering is asking good questions**: Prompt engineering is less about engineering and more about asking detailed questions with context to guide AI responses, utilizing system prompts for personality and user prompts for specific queries. [08:44], [09:12] - **AI amplifies programmer capabilities**: Tools like GitHub Copilot can significantly speed up coding by suggesting solutions, allowing programmers to focus on overarching problems rather than tedious details. [12:54], [16:14] - **Machine learning learns from data patterns**: Instead of direct solutions, machine learning models learn from vast amounts of data to identify patterns and apply them to new problems, as seen in reinforcement learning where rewards and penalties guide behavior. [30:16], [31:31] - **Large Language Models use attention and statistics**: LLMs like GPT break down sentences into word representations and use 'attention' to determine word relationships, statistically predicting the most probable next word to generate coherent text. [45:21], [46:37] - **AI can 'hallucinate' incorrect information**: Despite advanced training, AI can still generate incorrect answers, known as hallucinations, due to flawed training data or probabilistic errors in the model's predictions. [46:53], [47:05]
Topics Covered
- The Evolution of CS50's Rubber Duck Debugging Tool
- AI's Uncanny Ability to Mimic Reality
- AI's Uncanny Ability to Mimic Human Text
- AI Learns to Flip Pancakes Through Reinforcement Learning
- The Explore vs. Exploit Dilemma in AI Decision Making
Full Transcript
All
right,
this is CS 50,
and this is our lecture on artificial intelligence or AI,
particularly for all of those family members who are here
in the audience with us for the first time.
In fact,
uh,
for those students among us,
maybe a round of applause for all of the family
members who have come here today to join you.
Nice.
So nice to see everyone.
And as CS 50 students already know,
it's sort of a thing in programming circles to have a rubber duck on your desk.
Indeed,
a few weeks back we gave one to all CS 50 students,
and the motivation is to have someone,
something to talk to in the presence of a bug or mistake in
your code or confusion you're having when it comes to solving some problem.
And the idea is that
in the absence of having a friend,
family member,
TA of whom you can ask questions is to literally verbalize your confusion,
your question to this.
object on your desk and in that process
of verbalizing your own confusion and explaining yourself,
quite often does that proverbial light bulb go off over your head and voila,
problem is solved.
Now as CS50 students also know,
we sort of virtualized that rubber duck over the past few years and most recently
in the form of this guy here.
So in student's programming environment within CS 50,
a tool called Visual Studio Code at a URL of CS50.dv,
they have a virtual rubber duck available.
Available to them at all times and early on
in the very first version of this rubber duck,
it was a chat window that looked like this and if students had a question,
they could simply type into the chat window something like,
I'm hoping you can help me solve a problem.
And for multiple years,
all the CS 50 duck did
was respond with 12 or 3 quacks.
We have anecdotal evidence to suggest that that
alone was enough for answering students' questions because
it was in that process of like actually typing out the confusion that you realize,
oh,
I'm.
Doing something silly and you figure it out on your own.
But of course now that we live in an age of chatchi
BT and Claude and Gemini and all of these other AI-based tools
came as no surprise perhaps when in 2023
the same duck started responding to students in English and
that now is the tool that they have available,
which is in effect meant to be a less helpful version of Chat GPT,
one that doesn't just spoil answers outright,
but tries to guide them to solutions akin to any good teacher or tutor.
And so today's
is indeed on just that and the underlying building blocks that make possible that
they're rubber duck in all of the AI with which we're all increasingly familiar,
namely generative artificial intelligence using this technology known as AI
to generate something,
whether that's images or sounds or video or text.
And in fact,
what we thought we'd do to get everyone involved early on
is if you have a phone by your side,
if you'd like to go ahead and scan this QR QR code here,
and that's going to lead you.
To a polling station where you can buzz in with some answers.
CS 50's preceptor Kelly is going to kindly join
me here on stage to help run the keyboard,
and what we're about to do is play a little game and
see just how good we humans are right now at distinguishing AI from
reality.
And so we'll borrow some data from The New York Times,
which a couple of years back actually published some examples of AI and not AI,
and we'll see just how good this this technology has gotten.
So here we have two photographs on the screen.
In a moment,
you'll be asked on your phone if you were successful in scanning that code,
which one of these is AI?
Left
or right?
So hopefully on your phone here,
if you want to go ahead and swipe to the next screen,
we'll activate the poll here.
In a moment,
you should see on your phone a prompt
inviting you to select left
or
right.
And
Feel free to raise your hand if you're not seeing that,
but it looks like the responses are coming in,
and at the risk of spoiling,
it looks like 70%+ of you think it is the answer on the right.
And if Kelly,
maybe we could swipe back to the two photographs in this particular case,
yes,
it was in fact the one on the right,
maybe.
Looked a little too good or maybe a little too unreal,
maybe let's see,
maybe a couple of other examples.
So same QR code,
no need to re-scan.
Let's go ahead and pull up these two examples now 2 photographs,
same question.
Which of these is AI?
Left
or right?
Left
Or right
All
right,
I want to take a look at the chart,
see what the responses are coming in a little closer in this case,
but a majority of you think the answer is in fact left here,
though 5% of you are truthfully admitting that you're unsure,
but Kelly,
if you want to swipe back to the photos,
the answer this time was in fact a trick question.
They were both in fact AI,
which
perhaps speaks to just how good this technology is already getting.
Neither of these faces exists in the real world.
It was synthesized based on lots of training data,
so two photographs that look like humans but do not in fact exist.
How about one more this time focusing on text,
which will be the focus of course underlying our duck.
Did a 4th grader write this or the new chatbot?
Here are two final examples same code as before,
so no need to re-scan.
And here are the texts.
SA 1.
I'd like to bring a yummy sandwich and a cold juice box for lunch.
Sometimes I'll even pack a tasty piece of fruit or a bag of crunchy chips.
As we eat,
we chat and laugh and catch up on each other's day.sa 2.
My mother packs me a sandwich,
a drink,
fruit,
and a treat.
When I get into a lunchroom,
I find an empty table and sit there and eat my lunch.
My friends come and sit down with me.
The question now lastly is which of these is AI?
1 or 2?
SA 1 or 2.
The bars here are duking themselves out.
Looks like a majority of you say SA 1.
Let's go back to the text and some of you who's one of you
who says SA 1 Y if you want to raise a quick hand,
why SA 1,
yeah.
OK,
and uh it's so SA2 looks more like you would write and kind of ask what grade you are in?
A 5th grader.
So,
is this a new 5th grader or not?
The answer here,
in fact,
is that SA 1 is the AI because indeed SA 2 is more akin to what 1/4 or if I may,
a 5th grader would write,
and I dare say there are maybe some telltale signs.
I'm not sure a typical 4th grader or 5th grader would catch up
on each other's day in the vernacular that we see in SA 1.
But suffice it to say this game is not something we can play for in the
years to come because it's just going to get too hard to discern something that's.
I generated or not.
And so among our goals for today is really to give you a better sense of not just
how technologies like this duck and these games that
we've played here with images and text work,
but really what are the underlying principles of artificial
intelligence that frankly have been with us and have been
developing for decades and have really now come to a
head in recent years thanks to advances in research,
thanks to all the more cloud computing,
thanks to all the more memory and disk space and information,
sheer.
The volume thereof that we have at our disposal that
can be used to train all of these here technologies.
So that their duck is built on a fairly complicated architecture that looks a
little something like this where here's a student using one of CS 50's tools.
Here's a website with which CS 50 students are familiar called CS50.AI where we,
the staff,
wrote a bunch of code that actually talks to what are called APIs,
application programming interfaces,
third party services by companies like Microsoft and OpenAI.
really have been doing the hard work of developing
these models as well as some local sweets,
some local sauce that we CS50 add into the mix to
make it specific the duck's answers to CS 50 itself.
But what we've essentially been doing is something that
with which you might be familiar in part,
prompt engineering,
which has started popping up for better
or for worse on LinkedIn profiles everywhere,
and prompt engineering really it's not so much a form of
engineering as it is a form of asking good questions.
And being detailed in your question giving context to the underlying AI
so that the answer with high probability is what you want back.
And so there's two terms in this world
of prompt engineering that are worth knowing about.
So in CS 50 has leveraged both of these to implement that duck.
We for instance wrote what's called a system prompt,
which are instructions written by us humans,
often in English,
that sort of nudge the underlying AI technology to
have a certain personality or a specific domain.
Of expertise.
For instance,
we CS 50 have written a system prompt essentially that looks like this.
In reality,
it's like a lot of lines long nowadays,
but the essence of it
is this.
You are a friendly and supportive teaching assistant for CS 50.
You are also a rubber duck,
and that is sufficient to turn an AI into a rubber duck.
It turns out.
Answer student questions only about CS50 in the field of computer science.
Do not answer questions about unrelated topics.
Do not provide full answers to problems sets,
as this would violate academic honesty.
Answer this question colon.
And after that preamble,
if you will,
aka system prompt,
we effectively copy paste whatever question a student has typed in,
otherwise known as a user prompt,
and that is why the duck.
Behaves like a duck in our case and not a cat or a dog or a PhD,
but rather something that's been attenuated to the
particular goals we have pedagogically in the course.
And in fact,
those of you who are CS 50 students might recall from quite some weeks
ago in week zero when we first introduced the course uh to the class,
we had code that we whipped up that day
that ultimately looked a little something like this,
and I'll walk through it briefly line by line,
but now on the heels of having Studied some Python in CS50 this year code
that I whipped up in the first lecture might make now a bit more sense.
In that first lecture,
we imported OpenAI's own library code that a third party company wrote to make
it possible for us to implement code on top of theirs.
We created a variable called client in week zero,
and this gave us access to the OpenAI client,
that is software that they wrote for us.
We then defined in week 0 a user prompt which came
from the user using the input function with which the 50
students are now familiar and then we defined this system prompt
that day where I said limit your answer to one sentence,
pretend you're a cat,
I think was the persona of the day.
And then we used some bit more arcane code here,
but in essence we created a variable called response which
was meant to represent the response from OpenAI server.
We used client.respons.create,
which is a function or method that OpenAI gives us that allows
us to pass in three arguments the input from the user,
that is the.
User prompt the instructions from us,
that is the system prompt,
and then the specific model or version of AI that we wanted to
use and the last thing we did that day was print out response.
output_text and that's how we were able to answer
questions like what is CS 50 or the like.
So we've seen all of that before,
but we didn't talk about that week exactly how it was
working or what more we could actually do with it.
And so in fact what I thought we'd do today.
I peel back a layer that we've not allowed into
the course up until now and indeed you still cannot
use this feature until the very end of the class
in CS 50 when you get to your final projects,
at which point you are welcome and encouraged to use VS code in this particular way.
So here again is VS code for those unfamiliar,
this is the programming environment we use here with students,
and let me open up some code that was assigned to students a couple of weeks back,
namely a spell checker that they had to implement in C.
So I came in advance.
with a folder called speller,
and inside of this folder I had code that day and all students had that week
called dictionary.c.
And in this file,
which will not look familiar to many of you if
you've not taken week 0 through 7 up until now,
we did have some placeholders for students.
So long story short,
students had to answer a few questions that is write code to do this to do,
this to do,
this to do,
and one more.
There were 4 functions or blanks that students needed to fill in with code.
And I dare say it took most students
5 hours,
10 hours,
15 hours,
something in that very broad range.
Let me show you now how using AI
you soon the aspiring programmers can start to write code all the more quickly,
not by just choosing a different language,
but by using these AI based technologies beyond the duck itself.
So what I've done here on the right hand side of VS code is enable.
Feature that CS 50 disables for all students
from the start of the course called co-pilot.
This is very similar in spirit to products from
Google and Anthropic and other companies as well,
but this is the one that comes from Microsoft and in turned GitHub here,
and it too gives us me sort of a chat window here and this is just one of its features.
For instance,
if I wanted to implement
to get started the check function,
I could just ask it to do that,
Ilement
the check function.
And
uh how about using a hash table in C?
I'm going to go ahead and click Enter.
Now it's going to work.
It's using as reference,
that is context,
the very file that I've opened,
which is dictionary.c here,
copilot in general,
as well as a lot of AI tools are familiar with CS50
itself because it's been freely available as open courseware for years.
What you see
here doing is essentially thinking,
though that's a bit of an overstatement.
It's not really thinking,
it's trying to find patterns in what the problem is I
want to solve among all of its training data that's.
Seen before and come up with a pretty good answer.
So for today's purposes,
I'm going to wave my hand at the chat GPT like explanation of what to do
that was appeared at right,
but what's juiciest to look at here is on the left,
if I now scroll down,
is highlighted in green is all of the
suggested code for implementing this here check function.
Now it might not be the way you implemented it yourself,
but I do dare say this has hints of exactly
what You probably did when it came to implementing a hash
hash table and in fact I can go ahead and keep all of this code if I like how it looks.
Let's assume that's all correct there.
It might be the case that I want to now implement the load function.
So how about now implement load
function enter
as simple as that,
and what data is being used?
Well,
a few different things.
It says one reference,
so it's indeed using this one file,
but there's also what are called comments in the
code with which All students are now familiar with these
slash slash commands in gray that are giving English hints
as to what this function is supposed to do.
There's implicit information as to what the inputs to these functions,
otherwise known as arguments are meant to be,
what the outputs are meant to be.
So the underlying AI
called copilot here kind of has a decent number of hits,
hints,
and much like a good TA or a good software engineer,
that's enough context to figure out how to fill in those blanks.
And so here too,
if I scroll down now.
We'll see in green
some suggested code via which
it could uh solve that same problem as well.
The load function,
and I dare say I've been talking for far fewer minutes
than CS 50 students spent actually coding the solution from scratch
to this here problem.
So I'll go ahead and click keep.
I'll assume that it's correct,
but that's actually quite a big assumption.
And those of you wondering like why have we been learning off all this
if I could just ask in English it to do my homework for me,
I mean there's a lot to be said for the muscle memory
that hopefully you feel you've been developing over the past several weeks.
The reality is if you don't have an eye for what you're looking at,
there's no way you're going.
Able to troubleshoot an issue in here,
explain it to someone else,
make marginal changes or the like,
and yet what's incredibly exciting even to someone like me,
all of the staff,
friends of mine in the industry is that this kind of functionality
in AI amplifies your capabilities as a programmer sort of overnight.
Once you have that vocabulary,
that muscle memory for doing it yourself,
the AI can just take it from there and get rid of all of the tedium,
allow you to focus at the whiteboard with the other humans on sort of.
Overarching problems that you want to solve and leave it to this AI to actually solve
problems for you.
A fun exercise too might be to go back at term's
end and try solving any number of the courses assignments.
For instance,
let me go ahead and do this.
In my terminal window here.
I'm going to go back to my main directory.
I'm going to create an empty file called Mario.c
that has nothing in it,
and I'm going to go ahead in my chat window
here and say please implement a program in C.
That prints a left aligned pyramid.
Of bricks using hash symbols for bricks and use the CS 50 library to
ask the user for a non-negative height as an integer period.
I dare say that's essentially the English description of what was for CS
50 this year problem set one to implement a program called Mario.c.
This too is sort of doing its thing.
It's using one reference.
It's working,
it knows.
As a hint that this file is called Mario.c and it's
seen a lot of those in its training data over time.
There's an English explanation of what I should do and those
CS 50 students in the room probably recognize the sort of
basic structure here of using a do while loop to prompt
the user for a height using the CS 50 library,
which has been included,
print a left of line pyramid using some kind of loop,
and
boom,
we are done.
And these are fairly bite-sized problems as you'll see as
you get to term's end with your final project,
which is a fairly open ended.
to apply your newfound knowledge and savvy with
programming itself to a problem of interest,
it will allow you to implement far grander projects,
far greater projects than has been possible to date,
certainly in just a few weeks we have to do
it because of this amplification of your own abilities.
So with that promise,
let's talk about how in the heck any of this is actually working.
I clearly just generated a whole lot of stuff and
that's how we began the story with the generation.
Those images and those two essays by kids,
but what is generative artificial intelligence or really what is AI itself?
And these are some of the underlying
building blocks that aren't going anywhere anytime soon
and indeed have led us as a progression to the capabilities you just saw.
So spam,
we sort of take for granted now that in our Gmail inboxes,
our Outlook inboxes,
most of the spam just ends up in a folder.
Well,
there's not some human at Microsoft or Google sort of manually labeling
the messages as they come in deciding spam or not spam.
They're figuring out using code and nowadays using AI
that looks like spam and therefore I'm going to put it in the spam folder,
which is probably correct 99% of the time,
but indeed there's potentially a failure rate.
Other applications might include handwriting recognition.
Certainly Microsoft and Google doesn't know the handwriting style
of all of us here in this room,
but it's been trained on enough other humans'
handwriting styles that odds are your handwriting and mine
looks similar to someone else's.
And so with very high probability they could recognize something
like Hello world here as indeed that same digital text.
All of us are into streaming services nowadays,
Netflix and the like.
Well,
they're getting pretty darn good at knowing if I watched X,
I might also like Y.
Why?
Well,
because of other things I've I've watched before and maybe upvoted and downvoted,
maybe because of other things people have watched who
like similar movies or TV shows to me,
so that too.
AI.
There's no if
if
if construct for every movie or TV show in their database.
It's sort of figuring out much more organically,
dynamically what you and I might like.
And then all these voice assistants today Siri,
Alexa,
Google Assistant,
and the like,
those two don't recognize your voice or necessarily
know what questions you're going to ask it.
There's no massive if else if that has all possible questions in
the world just waiting for you or me to ask it.
That too,
of course,
is dynamically generated.
But that's getting a bit ahead of ourselves.
Let's like rewind in time and some of the
parents in the audience might remember this year's game,
among the first arcade games in the world,
namely Pong.
And so this was a black and white game whereby there's two players,
a paddle on the left,
a paddle on the right,
and then using some kind of joystick or track
ball they can move their paddles up and down,
and the goal is to bounce the ball back and forth and ideally catch it every time,
otherwise you lose a point.
This is just an
animated GIF,
so there's nothing really dramatic to watch.
It's going to stay at 15 against 12,
just looping again and again.
Nothing interesting is going to happen,
but this is a nice example
of a game that lends itself to solving it with code.
And indeed it's been in our vernacular for
years to play against not just the computer,
but the CPU,
the central processing unit,
or really the AI.
And yet AI does not need to be nearly.
Sophisticated as the tools we now see.
For instance,
here's a successor to Pong known as Breakout,
similar in spirit,
but there's just one paddle and one ball,
and the goal is to bounce the ball off of these colorful bricks and you
get more and more points depending on how high up you can get the ball.
All of us as humans,
even if you've never played this old school game,
probably have an instinct as to where we should move the
paddle if the ball just left it going this way,
which direction should I move the paddle.
I probably to the left and indeed that'll catch it on the way down.
So you and I just made a decision that's fairly instinctive,
but it's been ingrained in us,
but we could sort of take all the fun out of the game and
start to quantify it or describe it a little more algorithmically step by step.
In fact,
decision trees are a concept from economics,
strategic thinking,
computer science as well.
That's one way of solving this problem
in such a way that you will always
play this game
well if you just follow this algorithm.
So for instance,
how might we implement code
or decision making process for something like breakout?
Well,
you ask yourself first,
is the ball to the left of the paddle?
If so,
you know where we're going,
then go ahead and move the paddle left.
But what if the answer were no,
in fact?
Well,
you don't just blindly move the paddle to the right,
probably,
what should you then ask?
Are you right below the ball.
The ball's coming right at you.
You don't want to just naively go to the right and then risk missing it.
So there's another question to ask.
Is the ball to the right of the paddle?
And that's a yes no question.
If yes,
well then,
OK,
move it to the right,
but if not,
you should probably stay exactly where you are and don't move the paddle.
All right,
so that's fairly deterministic,
if you will,
um,
and we can map it to code using pseudo code in,
uh,
say a class like CS 50.
We can say in a loop,
well,
while the game is ongoing,
if the ball is still left of the paddle.
Then move the paddle left,
uh,
if the balls to the right of the paddle,
sorry for the typo there,
move the paddle right.
uh Es just don't move the paddle.
And so these decision trees as we drew it,
have a perfect mapping to code or really pseudo code in this particular case,
which is to say that's how people who implemented the breakout game or
the Pom game who implemented a computer player surely coded it up.
It was as straightforward as that.
But how about something like tic tac toe,
which some of you might have played on the way
in for just a moment on the scraps of paper,
um.
That you might have had,
uh,
here we have a tic tac toe board with two O's and two Xs.
For those unfamiliar,
this game tic tac toe,
otherwise known as knights and Crosses,
is a matter of going back and forth X's and O's between two people,
and the goal is to get 3 O's in a row or 3 X's in a row,
either vertically,
horizontally,
or diagonally.
So this is a game.
in mid progress.
Well,
let's consider how you could solve the game of tic tac toe like a computer,
like an AI might.
Well,
you could ask yourself,
can I get 3 in a row on this turn?
Well,
if yes,
we'll play in the square to get 3 in a row.
It's as straightforward as that.
If you can't though,
what should you ask?
Well,
can my opponent get 3 in a row on their next turn?
Because if so,
you should probably at least block
their move next so at least you don't lose now.
But this game tic tac toe is relatively simple as it is,
gets a little harder to play
when it's not obvious where you should go.
Now all of us as humans,
if you grew up playing this game,
probably had heuristics you use,
like you really like the middle or you like the top corner or something like that,
so we probably
can make our next move quickly.
But is it optimal?
And I dare say if back in childhood or more
recently you've ever lost a game of tic tac toe,
like you're just bad at tic tac toe because logically there's no reason you
should ever lose a game of tic tac toe if you're playing optimally.
At worst you should force a tie,
but at best you should win the game.
So think of that the next time you play tic
tac toe and lose like you're doing something wrong,
but
in your defense it's because.
Question mark is sort of not obvious.
Like how do I answer it when the answer is not right in
front of me to move for the win or move for the block?
Well,
one algorithm you could have been using all of these years is called Mini Max,
and as the name suggests,
it's all about minimizing something and or maximizing something else.
So here too,
let's take a bit of fun out of the game and turn it into some math,
but relatively simple math.
So here we have 3 representative.
To boards O has won here.
X has one here,
and the middle is a tie.
It doesn't matter how we score these boards,
but we need a consistent system.
So I'm going to propose that anytime O wins,
the score of the game is -1.
Anytime X wins,
the score of the game is a positive one,
and anytime nobody wins,
the score is 0.
So at this point,
each of these boards have these values -1,
0,
and 1.
So
the goal therefore,
in this game of tic tac toe now is for X to maximize its score because
one is the biggest value available and O's goal in life is to minimize its score.
So that's how we take the fun out of the game.
We turn it into math where one
player just wants to maximize,
one player just wants to minimize their score.
All right,
so a quick,
uh,
sanity check here.
Here's a board.
It's not color coded.
What is the value of this board?
One,
because X
has in fact 1
straight there down the middle.
So X is 1000 is -1,
otherwise a tie.
So now let's see how we go about with those principles in place,
figuring out where we should play in tic tac toe.
Now here's a fairly easy configuration.
There's only 2 moves left.
It's not hard to figure out how to win or tie this game,
but let's use it for
for simplicity.
O's turn for instance.
So where can O go?
Well,
that invites the question,
Well,
what is the value of the board or how do we how
do we minimize the value of the board for O to win?
Well,
O can go in one of two places top left or bottom middle.
Which way should O go?
Well,
if O goes in top left,
we should consider what's the value of this board.
Is it minimal?
Well,
let's see.
Uh,
if O goes here,
X is obviously.
Go here X is therefore going to win,
so the value of this board is going to be a 1.
Now since there's only one way logically to get from this configuration to this one,
we might as well call the value of this board by transitivity 1.
And so O probably doesn't want to go there because that's a
pretty maximal score and O wants to minimize over here though,
if O goes bottom middle,
well then X is gonna go top left and now no one has one,
so the value of this board is thus.
0,
we might as well treat this as 0 because that's the only way to get there logically.
So now O more mathematically and logically can decide do I
want an end point of 1 or an endpoint of 0.
Well,
0 is probably the better option because that's less than 1,
and thus it's the minimal possibility.
So O
is going to go ahead in the bottom middle and at least force a tie.
And so that's where you.
The evidence where if you humans are ever losing the game of tic tac toe,
you have not followed that their logic,
but you could probably do it if there's just 2 moves left.
But the catch is,
let's go ahead and sort of rewind to 3 moves left.
Here there are 3 blanks,
and I've kind of zoomed out.
The catch is that the decision tree gets a lot
bigger the more and more moves that are left.
It gets sort of bigger and bushier in that it's essentially doubling in size.
And with,
and that's great if you have the luxury of writing it down on a piece of paper,
but if you're doing this on your head while playing against 1/5 grader,
if I may,
you're probably not drawing out all of the various
boards and configurations trying to play it optimally.
You're going with some instinct,
and your instincts might not be aligned with an algorithm
that is tried and true mini Max that will ideally get you to win the game,
but at least will get you to force a tie if you can't win.
But
Tic Tac is not that hard.
I mean,
how many different ways are there to play Tic Tac Toe?
We could write a computer program to pretty much play Tic Tac Toe optimally.
Um,
we could use code like this if the player is X for each possible move,
calculate the score for the board at that point in time,
and then choose the move with the highest score.
So you just try all possibilities mathematically and then you make the decision.
Most of us in our heads are not doing that,
but we could,
else if the players though,
essentially do the same thing but choose the minimal
possible score.
So that's the code for implementing tic tac toe.
How many ways are there to play tic tac toe though?
Well,
255,168,
which means if we were to draw that tree,
it would be pretty darn big and it.
Take you quite a bit of time to sort of think through all those possibilities.
So in your defense,
you're maybe not that bad at tic tac toe,
it's just harder than you thought as a game.
But what about games with which we might as adults be more familiar?
Well,
what about the game of chess,
which is often used as a measure of like how smart a computer is,
whether it's Watson back in the day playing against it or something else?
Well,
if we consider even just the 1st 4 moves of tic tac toe,
whereby I mean black goes and white goes,
and then they each go 3 more times,
so 4 pair wise moves,
how many different ways are there to play chess?
Well,
it turns out
85 billion just to get the game started,
and that's a lot of decisions to consider and then make.
How about the game of Goa familiar?
Consider the 1st 4 266 quintillion possibilities.
And this is where we sort of as humans and
even with our modern PCs and Macs and phones kind
of have to throw up our hands because I don't
have this many bits of memory in my computer.
I don't have this many hours in my life left to actually
crunch all of those numbers and figure out the solution and so.
So where AI comes in is where it's no longer as simple as just writing
if else's and loops and no longer as simple as just trying all possibilities.
You instead need to write code that doesn't solve
the problem directly but in some sense indirectly.
You write code so that the computer figures out how to win,
perhaps by showing it configurations of the board that are a
good place to be in that is promising and maybe showing
it boards that it doesn't want to find itself in the
configuration of because that's going to lead it to lose.
In other words,
you train.
but not necessarily as exhaustive,
and this is what we mean nowadays by machine learning,
writing code
via which machines
learn how to solve problems generally by
being trained on massive amounts of data and
then in new problems looking for patterns by
which they can apply those past training data
to the problem at hand.
And reinforcement learning is one way to think about this.
In fact,
in fact,
we as humans use
reinforcement learning,
which is a type of machine learning sort of
all of the time in fact.
A fun demonstration to watch here involves these pancakes.
So in fact,
let me go ahead and pull up a short recording here of an actual researcher in a lab
who's trying to teach a robot how to make,
how to flip pancakes.
So we'll see here in this video
that there's a robot that has an arm that can go up,
down,
left,
right.
This of course is the human,
the researcher,
and he's just going to show the robot one or more times like how to flip a pancake.
And crosses his fingers and OK,
seems to have done it well,
does it again,
not quite the same.
But pretty good.
And now he's going to let the robot just try to figure out how
to flip that pancake after having just trained it a few different times.
The first few times,
odds are
the robot's not going to do super well because it really doesn't
understand what the human just did or what the whole purpose of.
But,
and here's the key detail with reinforcement learning behind the scenes,
the human is probably rewarding the robot when it does a good job,
like better and better.
Flips the more it gets rewarded as by like hitting a key and giving it a point,
for instance,
or giving it the digital equivalent of a cookie
or conversely,
every time the robot screws up and drops the pancake on the floor,
sort of a proverbial slap on the wrist,
a punishment so that it does less of that behavior the next time.
And any of you who are parents,
which by definition today many of you are,
odds are,
whether it's not this or maybe just verbal approval or reprimands,
have you probably trained children at some point to do more of one thing.
And less of another.
And what you're seeing in the backdrop there is
now just a quantization of the movements X,
Y,
and Z coordinates so that it can do more of the X's and the Y's and the Z that led it to
some kind of reward.
And now after you're up to some 50 trials,
the robot seems to be getting better and better
such that like a good human,
we'll see if I can do this without embarrassing myself,
can flip the thing.
That's pretty good.
That's pretty,
I've been doing this a long time,
OK.
So
we've seen then how you might reinforce
learning through that kind of domain.
Let's take an example that's familiar to those of you who are gamers.
Any time you've played a game where there's some kind of
map or a world that you need to explore up,
down,
left,
right,
maybe you're trying to get to the exit.
So
here simplistically is the player at the yellow dot.
Here,
for instance,
in green is the exit of the map and you want to get to that
point and maybe somewhere else in this world there's a lot of like lava pits.
You don't want to fall into the lava pit because you lose a life
or you lose a point or there's some penalty or punishment associated with that.
Well,
we,
with this bird's eye view can obviously see how to get to the green dot,
but if you're playing a game like Zelda or something like that,
all you can do is move up,
down,
left,
right and sort of hope for the best.
So let's do just that.
Suppose the yellow dot just randomly chooses a direction and goes to the right.
Well,
now we can sort of take away a life,
take away a point or effectively punish it
so that it knows don't do that.
And so long as the player has a bit of memory,
either the human player or the code that's implementing this,
just with a dark red line,
that means don't do that again because that didn't lead to a good outcome.
So maybe the next time the yellow dot goes this way and this way,
and then,
ah,
I didn't realize that that's actually the same lava pit,
but that's fine.
Use a little bit more memory and remind me,
don't do that because I just lost a second life in this story.
And maybe it goes this way next time,
ah,
now I need to remember,
don't do that.
But effectively I'm either being punished.
Doing the wrong thing,
ah,
or as we'll soon see,
being rewarded for doing more of the successful
thing and just by chance maybe I finally make
my way to the exit in this way and so I can be rewarded for that.
Now I got 100 points or whatever it is,
the high score.
So now as for these green lines,
I can just follow that path again and again and I can always win this game,
kind of like me nowadays,
like 30 years later playing Super Mario Brothers
because they can get through all the warp levels
because I know where everything is because for
some reason that's still stored in my brain.
Is this the best way to play?
Am I as good at Super Mario Brothers as I might think?
What's bad about this solution?
Yeah.
Yeah.
Exactly,
yeah,
I've moved many more times than I need to and just for fun today.
What grade are you in?
7th grade.
Wonderful.
So now 7th grade observation is like exactly that,
that we
could have taken a shorter path,
which is essentially that way,
albeit
uh making some straight moves.
And so we're never gonna find that shorter path.
We're never going to get the highest score possible if
I just keep naively following my well trodden path.
And so how do we break out of that mold?
And you can see this even in the real world,
other sort of.
No example is I'm the type of person for some reason
where if I go to a restaurant for the first time,
I choose a dish off the menu and I really like it,
I will never again order anything else off that menu
other than that dish because I know it is good,
but there could be something even better on the menu,
but I'm never going to explore that because I'm sort of fixed in my ways,
as some of you from the smiles might be too.
But what if
we took advantage of exploring just a little bit?
And there's this principle of exploring versus exploiting when
it comes to using artificial intelligence to solve problems.
Up until now,
I've just been exploiting knowledge I already have.
Don't go through the red walls,
do go through the green walls.
Exploit,
exploit,
exploit,
and I will get to a final solution.
But what if I just sprinkle in a little bit of randomness along the
way and maybe 10% of the time as represented by this epsilon variable.
I,
as the computer in the story,
generate a random number between 0 and 1,
and if it's less than that,
which is going to happen 10% of the time,
I'm going to make a random move instead of one
that I know will get me closer to the exit.
Otherwise I'll indeed make the move with the highest value.
Now this isn't going to necessarily win me the game that first time,
but if I play it enough and enough and enough and insert some of this randomness,
I might very well find a better solution.
And therefore be a better player,
a better winner
overall.
If I just 10% of the time ordered something else off the menu,
I might find that there's an amazing dish out there
that otherwise I wouldn't have discovered.
And so indeed using that approach,
can we finally find a more optimal path through the maze,
as was shorter there,
presumably therefore maximizing our score and doing
even better than we might have.
By just exploiting the same knowledge,
so you can see this even in the game of Breakout,
especially if you write a solution in code to play this game for you.
Let me go ahead and pull up another video recording of an AI playing Breakout,
and what this AI is doing is essentially figuring out,
maybe more intelligently than you or I could,
how to play this game
optimally.
And what we'll see here
is that
it's just like uh the pancake flipping robot.
There's some notion of scoring and rewards and penalties here.
So like right now the paddle's just doing random stuff.
It doesn't really know how to play the game yet,
but it realizes after 200 episodes that oh my score goes up if
I hit the ball and it goes down equivalently if I miss it,
and it's still a little twitchy.
It doesn't quite understand what it's supposed to do and why,
but if you do it again.
and again and again and it's rewarded and or punished enough,
you'll see that it starts to get
pretty good and closer to what a good human might do.
But here's where the algorithm gets a little creepy.
If you let it play long enough or if you and I,
the humans play long enough,
you might find a certain trick to the game.
I dare say the AI becomes a bit scarily sentient.
In that turns out
if you're smart enough to break through that top row,
you can let the game just play itself for you
and maximize your score without even touching the ball,
something that I do find a little creepy that I
just figured out how to do that without being told,
but it's just a logical continuation of rewarding it for good behavior
and punishing it
for
bad behavior.
So that next time you have an occasion to play Breakout,
consider that kind of strategy as opposed to doing more of the work yourself,
let the computer do it for you instead.
Well,
what else is there to consider in this world of AI in the context of machine learning?
Well,
there's specifically a category of learning that's supervised,
and we've been using this for years.
And in fact,
our first example of spam early on was certainly supervised.
Why?
Because it was you and I who was like putting
the email into the spam folder and to this day,
maybe once a day I hit the keyboard shortcut in Gmail to say,
ah,
this is spam.
You should have caught this,
and that is training Google's algorithm further,
assuming it's not just little old me,
but maybe thousands of people tagging that same kind of.
as spam,
that's supervised learning and that there's a human in the loop doing
at least something so spam detection might be one of those.
But the catch is that labeling data in
that way manually just doesn't scale very well.
That would be akin to having someone at Google or Microsoft labeling every email
or someone at Netflix doing the same for all of the videos out there.
It's expensive in terms of human power,
and there's certainly problems out there with so much data.
It's just not realistic for humans to label millions of pieces of data,
billions of pieces of data.
We've got to move to an
Supervised model.
And so this is where the world starts to consider deep learning,
solving problems
using code whereby you don't even have humans in the loop in quite the same way.
And neural networks inspired by the world of biology are sort
of the inspiration for what is the state of the art,
even underlying today's rubber duck and more generally these things
called large language models like chat GPT and the like.
So here pictured somewhat abstractly is a neuron,
and it's something in the human body that transmits a signal,
say,
from left to right.
Electrically and if you have multiple neurons,
you can intercommunicate among them so that if I think a thought,
then I know how to raise my hand because some kind of
message electrically has gone from my head to this extremity here.
So that's in essence what I remember from 9th grade biology.
But as the computer scientists,
we sort of abstract all of this away.
So instead of calling these two neurons,
drawing them as neurons,
let's just start drawing neurons as these little circles,
and if they have connective tissue between them of sorts,
we'll just draw a straight line,
an edge between them.
So this is what a computer scientist would.
A graph.
If you have two such neurons over here leading to 11 neuron here,
you can think of this as being like maybe two inputs to a problem
and now one output there too.
We can represent the notion of problem solving,
which is what CS 50 and intercourses more generally are all about.
So let's solve a problem with a neural
network without necessarily training it in advance,
just letting it figure out how to answer this question.
Here's a very simple two-dimensional world,
XY grid,
and here are two dots,
and the dots in this world are either blue or they are.
Red,
but I have no idea yet
what makes a dot blue or red.
However,
if you train me on those two dots,
I bet I could come up with predictions,
especially if you let me label this world
in terms of X coordinates on the horizontal,
Y coordinates on the vertical,
and then you know what we can think of this
neural network very simply as representing the X coordinate here,
the Y coordinate here,
and the answer I want to get is quote unquote red or blue or 0 or 1 or true or false,
however you want to think
of the representation.
So how do I get from a specific.
XY coordinate to a prediction of color if I only know the coordinates.
Well,
up from the get-go,
maybe the best I can do is just divide the world into
blue dots on the left and red dots on the right,
a best fit line,
if you will,
based on very minimal data.
Of course,
if you give me a third dot,
it's going to be pretty easy to dize that I was a little too hasty.
That line is not vertical,
so maybe we pivot the line this way and now I'm back in business.
Now I can predict with higher probability based on xy what color the next dot will be.
You give me enough of these dots.
I can come up with a pretty good best fit line.
It's not perfect,
but here's a hint at why AI is not perfect.
But 99% of the time maybe I'll be able to predict correctly,
and I can do even better if you let me squiggle the
line a little bit and maybe make it more than just a simple
slope.
So what is it we're really doing
with implementing this neural network,
albeit simplistically with just 3 neurons?
Well,
essentially we're trying to come up with 3 values,
3 parameters an A,
a B,
and a C.
And what do those represent?
Well,
really just a solution to this formula that their line we drew
can be represented if you think back to like high school math
with a formula along these lines whereby it's ax x plus B
times Y plus some constant C and we can just arbitrarily conclude that
if that value mathematically gives me a number greater than 0,
predict it's going to be blue.
otherwise predict it's going to be red.
We can sort of map our mathematics.
Just like with tic tac toe,
to the actual problem we care about by defining the world in this way.
And so if you give me enough data points and enough data points,
I can come up with answers for that A,
B,
C,
the so-called parameters in neural networks.
Now in reality,
neural networks are not composed of like 3 neurons and a couple of edges.
They look a little something more like this,
and in practice they've got billions of these things here on the screen.
In which case pretty much every one of these edges represents some mathematical
value that was contrived based on lots and lots of training data,
and whereas I,
the computer scientist,
might know what these neurons over here represent because those are my inputs,
3 in this case,
and I,
the computer scientists know what this one represents at the end.
If you sort of took the hood off of this thing and looked inside the neural.
even though there'd be millions,
billions of numbers going on there,
I can't tell you what this neuron represents or why this edge has this weight.
It's because of the massive amount of training data
that that's just how the math works out.
And if you feed me more data,
I might change some of those parameters more.
So the graph ultimately might look quite different,
but my inputs and my outputs are going to be what I use to solve that problem.
So if you want to predict like rainfall from humidity or pressure,
you can have two inputs giving that one output,
an advertising dollar spent in a given month that might predict
sales by just having trained again on such volumes of data.
And when we get now full circle to something like CS 50's rubber
duck and large language models like Claude and Gemini and Chat GPT,
what's really happening,
and this is all hot off the press in recent years,
screenshotted here are some of the recent research papers that
have driven a lot of this advancement in recent years.
You have From OpenAI say a generative pre-trained transformer,
which is a lot to say,
but there's the GPT in chat GPT,
and essentially this is a neural network that's
been trained on large volumes of textual information
that gives us the interactive chat feature that we have in the class,
and we all have more generally in chat BT itself.
So an example of what
is actually happening underneath the hood of these GPTs.
Well,
here's a paragraph.
That up until recent years was kind of a hard paragraph to end with the
Massachusetts is a state in the New
England region of the northeastern United States.
It borders on the Atlantic Ocean to the east.
The state's capital is.
Now most anyone living in Massachusetts probably knows that answer,
but if this AI has just been trained on lots and lots of data,
there's probably a lot of people who say Massachusetts in part of a sentence,
and then
the answer,
which I won't say yet is in uh the other part of the.
Sentence,
but in this example,
given that the question we're asking is sort of so far
from some of the useful keywords up until recently,
this was a hard problem to solve because there was so much distance.
Moreover,
there's these nouns that are being used to substitute for the
proper noun like we suddenly start calling it a state,
we call it a state down here and it
wasn't necessarily obvious to AIs that we're talking about
the same thing as if it were just city
comma state where you'd have much more proximity.
So.
In a nutshell,
what we now do,
especially to solve problems like these,
is we first break down a sentence or the training data or input
a like into like an array or a list of the words themselves.
We come up with a representation of each of these words.
For instance,
the word Massachusetts,
if you encode it in a certain way,
is going to be represented with an array or vector of numbers,
floating point values,
so many so that the word Massachusetts in one model would use these 1,536.
Floating point numbers to represent Massachusetts
essentially in an N-dimensional space,
so not just an XY plane but somewhere sort of virtually out there.
And then,
and this has been the key to these GPTs,
attention is calculated based on all of that data whereby in this picture
the thicker lines imply more of a relationship between those two words.
So Massachusetts and state is inferred as having a thicker line,
a higher attention from one word to the other,
whereas our.
A's and our and our thes have thinner lines because they're just not as
much signal to the AI as to what the answer to this question is.
Meanwhile,
when you then feed that sentence like the
state's capital is one word per neuron here,
the goal is to get the answer to that question.
And even here this is way smaller of a
representation than the actual neural network would be.
But in effect,
all these LLMs,
large language models are,
are just statistic.
Models like what is the highest probability word that it should spit out at the
end of this paragraph based on all of the Reddit posts and Google search results
and encyclopedias and Wikipedias that it's found and trained on online?
Well,
the answer hopefully will be
Boston.
But of course 1% of the time,
maybe less than that,
the answer might not be correct and even CS 50's own duck is fallible,
even though we've written lots of code to
try to put downward pressure on those mistakes,
and those mistakes are what we'll call.
Lastly,
hallucinations where the AI just make
something up perhaps because some crazy human
on the internet made something up and it was interpreted as authoritative or
just by bad luck because of a bit of that exploration 10% of the time,
1% of the time,
the AI sort of veered this way in the large language model in the
neural network and spit out an answer that just in fact is not correct.
And so I thought I'd end for today on this final note.
A poem with which many of us might have grown up from Shel Silverstein here about
the homework machine which years ago somehow sort
of predicted the state we would be in
with these AI machines.
He said,
the homework machine,
oh,
the homework machine,
most perfect contraption that's ever been seen.
Just put in your homework,
then drop in a dime,
snap on the switch,
and in 10 seconds' time,
your homework comes out quick and clean as can be.
Here it is,
9 + 4,
and the answer is 33?
Oh me,
I guess it's not as perfect.
As I thought it would be
This then was CS 50.
See you next time.
Loading video analysis...