CS50 Fall 2025 - Artificial Intelligence (live, unedited)

By CS50

Summary

## Key takeaways - **AI can be a virtual rubber duck**: The CS50 duck, initially responding with quacks, now uses AI to guide students to solutions rather than providing direct answers, mimicking a less helpful version of ChatGPT. [02:54], [03:29] - **Prompt engineering is asking good questions**: Prompt engineering is less about engineering and more about asking detailed questions with context to guide AI responses, utilizing system prompts for personality and user prompts for specific queries. [08:44], [09:12] - **AI amplifies programmer capabilities**: Tools like GitHub Copilot can significantly speed up coding by suggesting solutions, allowing programmers to focus on overarching problems rather than tedious details. [12:54], [16:14] - **Machine learning learns from data patterns**: Instead of direct solutions, machine learning models learn from vast amounts of data to identify patterns and apply them to new problems, as seen in reinforcement learning where rewards and penalties guide behavior. [30:16], [31:31] - **Large Language Models use attention and statistics**: LLMs like GPT break down sentences into word representations and use 'attention' to determine word relationships, statistically predicting the most probable next word to generate coherent text. [45:21], [46:37] - **AI can 'hallucinate' incorrect information**: Despite advanced training, AI can still generate incorrect answers, known as hallucinations, due to flawed training data or probabilistic errors in the model's predictions. [46:53], [47:05]

Topics Covered

The Evolution of CS50's Rubber Duck Debugging Tool
AI's Uncanny Ability to Mimic Reality
AI's Uncanny Ability to Mimic Human Text
AI Learns to Flip Pancakes Through Reinforcement Learning
The Explore vs. Exploit Dilemma in AI Decision Making

Full Transcript

All

right,

this is CS 50,

and this is our lecture on artificial intelligence or AI,

particularly for all of those family members who are here

in the audience with us for the first time.

In fact,

uh,

for those students among us,

maybe a round of applause for all of the family

members who have come here today to join you.

Nice.

So nice to see everyone.

And as CS 50 students already know,

it's sort of a thing in programming circles to have a rubber duck on your desk.

Indeed,

a few weeks back we gave one to all CS 50 students,

and the motivation is to have someone,

something to talk to in the presence of a bug or mistake in

your code or confusion you're having when it comes to solving some problem.

And the idea is that

in the absence of having a friend,

family member,

TA of whom you can ask questions is to literally verbalize your confusion,

your question to this.

object on your desk and in that process

of verbalizing your own confusion and explaining yourself,

quite often does that proverbial light bulb go off over your head and voila,

problem is solved.

Now as CS50 students also know,

we sort of virtualized that rubber duck over the past few years and most recently

in the form of this guy here.

So in student's programming environment within CS 50,

a tool called Visual Studio Code at a URL of CS50.dv,

they have a virtual rubber duck available.

Available to them at all times and early on

in the very first version of this rubber duck,

it was a chat window that looked like this and if students had a question,

they could simply type into the chat window something like,

I'm hoping you can help me solve a problem.

And for multiple years,

all the CS 50 duck did

was respond with 12 or 3 quacks.

We have anecdotal evidence to suggest that that

alone was enough for answering students' questions because

it was in that process of like actually typing out the confusion that you realize,

oh,

I'm.

Doing something silly and you figure it out on your own.

But of course now that we live in an age of chatchi

BT and Claude and Gemini and all of these other AI-based tools

came as no surprise perhaps when in 2023

the same duck started responding to students in English and

that now is the tool that they have available,

which is in effect meant to be a less helpful version of Chat GPT,

one that doesn't just spoil answers outright,

but tries to guide them to solutions akin to any good teacher or tutor.

And so today's

is indeed on just that and the underlying building blocks that make possible that

they're rubber duck in all of the AI with which we're all increasingly familiar,

namely generative artificial intelligence using this technology known as AI

to generate something,

whether that's images or sounds or video or text.

And in fact,

what we thought we'd do to get everyone involved early on

is if you have a phone by your side,

if you'd like to go ahead and scan this QR QR code here,

and that's going to lead you.

To a polling station where you can buzz in with some answers.

CS 50's preceptor Kelly is going to kindly join

me here on stage to help run the keyboard,

and what we're about to do is play a little game and

see just how good we humans are right now at distinguishing AI from

reality.

And so we'll borrow some data from The New York Times,

which a couple of years back actually published some examples of AI and not AI,

and we'll see just how good this this technology has gotten.

So here we have two photographs on the screen.

In a moment,

you'll be asked on your phone if you were successful in scanning that code,

which one of these is AI?

Left

or right?

So hopefully on your phone here,

if you want to go ahead and swipe to the next screen,

we'll activate the poll here.

In a moment,

you should see on your phone a prompt

inviting you to select left

or

right.

And

Feel free to raise your hand if you're not seeing that,

but it looks like the responses are coming in,

and at the risk of spoiling,

it looks like 70%+ of you think it is the answer on the right.

And if Kelly,

maybe we could swipe back to the two photographs in this particular case,

yes,

it was in fact the one on the right,

maybe.

Looked a little too good or maybe a little too unreal,

maybe let's see,

maybe a couple of other examples.

So same QR code,

no need to re-scan.

Let's go ahead and pull up these two examples now 2 photographs,

same question.

Which of these is AI?

Left

or right?

Left

Or right

All

right,

I want to take a look at the chart,

see what the responses are coming in a little closer in this case,

but a majority of you think the answer is in fact left here,

though 5% of you are truthfully admitting that you're unsure,

but Kelly,

if you want to swipe back to the photos,

the answer this time was in fact a trick question.

They were both in fact AI,

which

perhaps speaks to just how good this technology is already getting.

Neither of these faces exists in the real world.

It was synthesized based on lots of training data,

so two photographs that look like humans but do not in fact exist.

How about one more this time focusing on text,

which will be the focus of course underlying our duck.

Did a 4th grader write this or the new chatbot?

Here are two final examples same code as before,

so no need to re-scan.

And here are the texts.

SA 1.

I'd like to bring a yummy sandwich and a cold juice box for lunch.

Sometimes I'll even pack a tasty piece of fruit or a bag of crunchy chips.

As we eat,

we chat and laugh and catch up on each other's day.sa 2.

My mother packs me a sandwich,

a drink,

fruit,

and a treat.

When I get into a lunchroom,

I find an empty table and sit there and eat my lunch.

My friends come and sit down with me.

The question now lastly is which of these is AI?

1 or 2?

SA 1 or 2.

The bars here are duking themselves out.

Looks like a majority of you say SA 1.

Let's go back to the text and some of you who's one of you

who says SA 1 Y if you want to raise a quick hand,

why SA 1,

yeah.

OK,

and uh it's so SA2 looks more like you would write and kind of ask what grade you are in?

A 5th grader.

So,

is this a new 5th grader or not?

The answer here,

in fact,

is that SA 1 is the AI because indeed SA 2 is more akin to what 1/4 or if I may,

a 5th grader would write,

and I dare say there are maybe some telltale signs.

I'm not sure a typical 4th grader or 5th grader would catch up

on each other's day in the vernacular that we see in SA 1.

But suffice it to say this game is not something we can play for in the

years to come because it's just going to get too hard to discern something that's.

I generated or not.

And so among our goals for today is really to give you a better sense of not just

how technologies like this duck and these games that

we've played here with images and text work,

but really what are the underlying principles of artificial

intelligence that frankly have been with us and have been

developing for decades and have really now come to a

head in recent years thanks to advances in research,

thanks to all the more cloud computing,

thanks to all the more memory and disk space and information,

sheer.

The volume thereof that we have at our disposal that

can be used to train all of these here technologies.

So that their duck is built on a fairly complicated architecture that looks a

little something like this where here's a student using one of CS 50's tools.

Here's a website with which CS 50 students are familiar called CS50.AI where we,

the staff,

wrote a bunch of code that actually talks to what are called APIs,

application programming interfaces,

third party services by companies like Microsoft and OpenAI.

really have been doing the hard work of developing

these models as well as some local sweets,

some local sauce that we CS50 add into the mix to

make it specific the duck's answers to CS 50 itself.

But what we've essentially been doing is something that

with which you might be familiar in part,

prompt engineering,

which has started popping up for better

or for worse on LinkedIn profiles everywhere,

and prompt engineering really it's not so much a form of

engineering as it is a form of asking good questions.

And being detailed in your question giving context to the underlying AI

so that the answer with high probability is what you want back.

And so there's two terms in this world

of prompt engineering that are worth knowing about.

So in CS 50 has leveraged both of these to implement that duck.

We for instance wrote what's called a system prompt,

which are instructions written by us humans,

often in English,

that sort of nudge the underlying AI technology to

have a certain personality or a specific domain.

Of expertise.

For instance,

we CS 50 have written a system prompt essentially that looks like this.

In reality,

it's like a lot of lines long nowadays,

but the essence of it

is this.

You are a friendly and supportive teaching assistant for CS 50.

You are also a rubber duck,

and that is sufficient to turn an AI into a rubber duck.

It turns out.

Answer student questions only about CS50 in the field of computer science.

Do not answer questions about unrelated topics.

Do not provide full answers to problems sets,

as this would violate academic honesty.

Answer this question colon.

And after that preamble,

if you will,

aka system prompt,

we effectively copy paste whatever question a student has typed in,

otherwise known as a user prompt,

and that is why the duck.

Behaves like a duck in our case and not a cat or a dog or a PhD,

but rather something that's been attenuated to the

particular goals we have pedagogically in the course.

And in fact,

those of you who are CS 50 students might recall from quite some weeks

ago in week zero when we first introduced the course uh to the class,

we had code that we whipped up that day

that ultimately looked a little something like this,

and I'll walk through it briefly line by line,

but now on the heels of having Studied some Python in CS50 this year code

that I whipped up in the first lecture might make now a bit more sense.

In that first lecture,

we imported OpenAI's own library code that a third party company wrote to make

it possible for us to implement code on top of theirs.

We created a variable called client in week zero,

and this gave us access to the OpenAI client,

that is software that they wrote for us.

We then defined in week 0 a user prompt which came

from the user using the input function with which the 50

students are now familiar and then we defined this system prompt

that day where I said limit your answer to one sentence,

pretend you're a cat,

I think was the persona of the day.

And then we used some bit more arcane code here,

but in essence we created a variable called response which

was meant to represent the response from OpenAI server.

We used client.respons.create,

which is a function or method that OpenAI gives us that allows

us to pass in three arguments the input from the user,

that is the.

User prompt the instructions from us,

that is the system prompt,

and then the specific model or version of AI that we wanted to

use and the last thing we did that day was print out response.

output_text and that's how we were able to answer

questions like what is CS 50 or the like.

So we've seen all of that before,

but we didn't talk about that week exactly how it was

working or what more we could actually do with it.

And so in fact what I thought we'd do today.

I peel back a layer that we've not allowed into

the course up until now and indeed you still cannot

use this feature until the very end of the class

in CS 50 when you get to your final projects,

at which point you are welcome and encouraged to use VS code in this particular way.

So here again is VS code for those unfamiliar,

this is the programming environment we use here with students,

and let me open up some code that was assigned to students a couple of weeks back,

namely a spell checker that they had to implement in C.

So I came in advance.

with a folder called speller,

and inside of this folder I had code that day and all students had that week

called dictionary.c.

And in this file,

which will not look familiar to many of you if

you've not taken week 0 through 7 up until now,

we did have some placeholders for students.

So long story short,

students had to answer a few questions that is write code to do this to do,

this to do,

this to do,

and one more.

There were 4 functions or blanks that students needed to fill in with code.

And I dare say it took most students

5 hours,

10 hours,

15 hours,

something in that very broad range.

Let me show you now how using AI

you soon the aspiring programmers can start to write code all the more quickly,

not by just choosing a different language,

but by using these AI based technologies beyond the duck itself.

So what I've done here on the right hand side of VS code is enable.

Feature that CS 50 disables for all students

from the start of the course called co-pilot.

This is very similar in spirit to products from

Google and Anthropic and other companies as well,

but this is the one that comes from Microsoft and in turned GitHub here,

and it too gives us me sort of a chat window here and this is just one of its features.

For instance,

if I wanted to implement

to get started the check function,

I could just ask it to do that,

Ilement

the check function.

And

uh how about using a hash table in C?

I'm going to go ahead and click Enter.

Now it's going to work.

It's using as reference,

that is context,

the very file that I've opened,

which is dictionary.c here,

copilot in general,

as well as a lot of AI tools are familiar with CS50

itself because it's been freely available as open courseware for years.

What you see

here doing is essentially thinking,

though that's a bit of an overstatement.

It's not really thinking,

it's trying to find patterns in what the problem is I

want to solve among all of its training data that's.

Seen before and come up with a pretty good answer.

So for today's purposes,

I'm going to wave my hand at the chat GPT like explanation of what to do

that was appeared at right,

but what's juiciest to look at here is on the left,

if I now scroll down,

is highlighted in green is all of the

suggested code for implementing this here check function.

Now it might not be the way you implemented it yourself,

but I do dare say this has hints of exactly

what You probably did when it came to implementing a hash

hash table and in fact I can go ahead and keep all of this code if I like how it looks.

Let's assume that's all correct there.

It might be the case that I want to now implement the load function.

So how about now implement load

function enter

as simple as that,

and what data is being used?

Well,

a few different things.

It says one reference,

so it's indeed using this one file,

but there's also what are called comments in the

code with which All students are now familiar with these

slash slash commands in gray that are giving English hints

as to what this function is supposed to do.

There's implicit information as to what the inputs to these functions,

otherwise known as arguments are meant to be,

what the outputs are meant to be.

So the underlying AI

called copilot here kind of has a decent number of hits,

hints,

and much like a good TA or a good software engineer,

that's enough context to figure out how to fill in those blanks.

And so here too,

if I scroll down now.

We'll see in green

some suggested code via which

it could uh solve that same problem as well.

The load function,

and I dare say I've been talking for far fewer minutes

than CS 50 students spent actually coding the solution from scratch

to this here problem.

So I'll go ahead and click keep.

I'll assume that it's correct,

but that's actually quite a big assumption.

And those of you wondering like why have we been learning off all this

if I could just ask in English it to do my homework for me,

I mean there's a lot to be said for the muscle memory

that hopefully you feel you've been developing over the past several weeks.

The reality is if you don't have an eye for what you're looking at,

there's no way you're going.

Able to troubleshoot an issue in here,

explain it to someone else,

make marginal changes or the like,

and yet what's incredibly exciting even to someone like me,

all of the staff,

friends of mine in the industry is that this kind of functionality

in AI amplifies your capabilities as a programmer sort of overnight.

Once you have that vocabulary,

that muscle memory for doing it yourself,

the AI can just take it from there and get rid of all of the tedium,

allow you to focus at the whiteboard with the other humans on sort of.

Overarching problems that you want to solve and leave it to this AI to actually solve

problems for you.

A fun exercise too might be to go back at term's

end and try solving any number of the courses assignments.

For instance,

let me go ahead and do this.

In my terminal window here.

I'm going to go back to my main directory.

I'm going to create an empty file called Mario.c

that has nothing in it,

and I'm going to go ahead in my chat window

here and say please implement a program in C.

That prints a left aligned pyramid.

Of bricks using hash symbols for bricks and use the CS 50 library to

ask the user for a non-negative height as an integer period.

I dare say that's essentially the English description of what was for CS

50 this year problem set one to implement a program called Mario.c.

This too is sort of doing its thing.

It's using one reference.

It's working,

it knows.

As a hint that this file is called Mario.c and it's

seen a lot of those in its training data over time.

There's an English explanation of what I should do and those

CS 50 students in the room probably recognize the sort of

basic structure here of using a do while loop to prompt

the user for a height using the CS 50 library,

which has been included,

print a left of line pyramid using some kind of loop,

and

boom,

we are done.

And these are fairly bite-sized problems as you'll see as

you get to term's end with your final project,

which is a fairly open ended.

to apply your newfound knowledge and savvy with

programming itself to a problem of interest,

it will allow you to implement far grander projects,

far greater projects than has been possible to date,

certainly in just a few weeks we have to do

it because of this amplification of your own abilities.

So with that promise,

let's talk about how in the heck any of this is actually working.

I clearly just generated a whole lot of stuff and

that's how we began the story with the generation.

Those images and those two essays by kids,

but what is generative artificial intelligence or really what is AI itself?

And these are some of the underlying

building blocks that aren't going anywhere anytime soon

and indeed have led us as a progression to the capabilities you just saw.

So spam,

we sort of take for granted now that in our Gmail inboxes,

our Outlook inboxes,

most of the spam just ends up in a folder.

Well,

there's not some human at Microsoft or Google sort of manually labeling

the messages as they come in deciding spam or not spam.

They're figuring out using code and nowadays using AI

that looks like spam and therefore I'm going to put it in the spam folder,

which is probably correct 99% of the time,

but indeed there's potentially a failure rate.

Other applications might include handwriting recognition.

Certainly Microsoft and Google doesn't know the handwriting style

of all of us here in this room,

but it's been trained on enough other humans'

handwriting styles that odds are your handwriting and mine

looks similar to someone else's.

And so with very high probability they could recognize something

like Hello world here as indeed that same digital text.

All of us are into streaming services nowadays,

Netflix and the like.

Well,

they're getting pretty darn good at knowing if I watched X,

I might also like Y.

Why?

Well,

because of other things I've I've watched before and maybe upvoted and downvoted,

maybe because of other things people have watched who

like similar movies or TV shows to me,

so that too.

AI.

There's no if

if

if construct for every movie or TV show in their database.

It's sort of figuring out much more organically,

dynamically what you and I might like.

And then all these voice assistants today Siri,

Alexa,

Google Assistant,

and the like,

those two don't recognize your voice or necessarily

know what questions you're going to ask it.

There's no massive if else if that has all possible questions in

the world just waiting for you or me to ask it.

That too,

of course,

is dynamically generated.

But that's getting a bit ahead of ourselves.

Let's like rewind in time and some of the

parents in the audience might remember this year's game,

among the first arcade games in the world,

namely Pong.

And so this was a black and white game whereby there's two players,

a paddle on the left,

a paddle on the right,

and then using some kind of joystick or track

ball they can move their paddles up and down,

and the goal is to bounce the ball back and forth and ideally catch it every time,

otherwise you lose a point.

This is just an

animated GIF,

so there's nothing really dramatic to watch.

It's going to stay at 15 against 12,

just looping again and again.

Nothing interesting is going to happen,

but this is a nice example

of a game that lends itself to solving it with code.

And indeed it's been in our vernacular for

years to play against not just the computer,

but the CPU,

the central processing unit,

or really the AI.

And yet AI does not need to be nearly.

Sophisticated as the tools we now see.

For instance,

here's a successor to Pong known as Breakout,

similar in spirit,

but there's just one paddle and one ball,

and the goal is to bounce the ball off of these colorful bricks and you

get more and more points depending on how high up you can get the ball.

All of us as humans,

even if you've never played this old school game,

probably have an instinct as to where we should move the

paddle if the ball just left it going this way,

which direction should I move the paddle.

I probably to the left and indeed that'll catch it on the way down.

So you and I just made a decision that's fairly instinctive,

but it's been ingrained in us,

but we could sort of take all the fun out of the game and

start to quantify it or describe it a little more algorithmically step by step.

In fact,

decision trees are a concept from economics,

strategic thinking,

computer science as well.

That's one way of solving this problem

in such a way that you will always

play this game

well if you just follow this algorithm.

So for instance,

how might we implement code

or decision making process for something like breakout?

Well,

you ask yourself first,

is the ball to the left of the paddle?

If so,

you know where we're going,

then go ahead and move the paddle left.

But what if the answer were no,

in fact?

Well,

you don't just blindly move the paddle to the right,

probably,

what should you then ask?

Are you right below the ball.

The ball's coming right at you.

You don't want to just naively go to the right and then risk missing it.

So there's another question to ask.

Is the ball to the right of the paddle?

And that's a yes no question.

If yes,

well then,

OK,

move it to the right,

but if not,

you should probably stay exactly where you are and don't move the paddle.

All right,

so that's fairly deterministic,

if you will,

um,

and we can map it to code using pseudo code in,

uh,

say a class like CS 50.

We can say in a loop,

well,

while the game is ongoing,

if the ball is still left of the paddle.

Then move the paddle left,

uh,

if the balls to the right of the paddle,

sorry for the typo there,

move the paddle right.

uh Es just don't move the paddle.

And so these decision trees as we drew it,

have a perfect mapping to code or really pseudo code in this particular case,

which is to say that's how people who implemented the breakout game or

the Pom game who implemented a computer player surely coded it up.

It was as straightforward as that.

But how about something like tic tac toe,

which some of you might have played on the way

in for just a moment on the scraps of paper,

um.

That you might have had,

uh,

here we have a tic tac toe board with two O's and two Xs.

For those unfamiliar,

this game tic tac toe,

otherwise known as knights and Crosses,

is a matter of going back and forth X's and O's between two people,

and the goal is to get 3 O's in a row or 3 X's in a row,

either vertically,

horizontally,

or diagonally.

So this is a game.

in mid progress.

Well,

let's consider how you could solve the game of tic tac toe like a computer,

like an AI might.

Well,

you could ask yourself,

can I get 3 in a row on this turn?

Well,

if yes,

we'll play in the square to get 3 in a row.

It's as straightforward as that.

If you can't though,

what should you ask?

Well,

can my opponent get 3 in a row on their next turn?

Because if so,

you should probably at least block

their move next so at least you don't lose now.

But this game tic tac toe is relatively simple as it is,

gets a little harder to play

when it's not obvious where you should go.

Now all of us as humans,

if you grew up playing this game,

probably had heuristics you use,

like you really like the middle or you like the top corner or something like that,

so we probably

can make our next move quickly.

But is it optimal?

And I dare say if back in childhood or more

recently you've ever lost a game of tic tac toe,

like you're just bad at tic tac toe because logically there's no reason you

should ever lose a game of tic tac toe if you're playing optimally.

At worst you should force a tie,

but at best you should win the game.

So think of that the next time you play tic

tac toe and lose like you're doing something wrong,

but

in your defense it's because.

Question mark is sort of not obvious.

Like how do I answer it when the answer is not right in

front of me to move for the win or move for the block?

Well,

one algorithm you could have been using all of these years is called Mini Max,

and as the name suggests,

it's all about minimizing something and or maximizing something else.

So here too,

let's take a bit of fun out of the game and turn it into some math,

but relatively simple math.

So here we have 3 representative.

To boards O has won here.

X has one here,

and the middle is a tie.

It doesn't matter how we score these boards,

but we need a consistent system.

So I'm going to propose that anytime O wins,

the score of the game is -1.

Anytime X wins,

the score of the game is a positive one,

and anytime nobody wins,

the score is 0.

So at this point,

each of these boards have these values -1,

0,

and 1.

So

the goal therefore,

in this game of tic tac toe now is for X to maximize its score because

one is the biggest value available and O's goal in life is to minimize its score.

So that's how we take the fun out of the game.

We turn it into math where one

player just wants to maximize,

one player just wants to minimize their score.

All right,

so a quick,

uh,

sanity check here.

Here's a board.

It's not color coded.

What is the value of this board?

One,

because X

has in fact 1

straight there down the middle.

So X is 1000 is -1,

otherwise a tie.

So now let's see how we go about with those principles in place,

figuring out where we should play in tic tac toe.

Now here's a fairly easy configuration.

There's only 2 moves left.

It's not hard to figure out how to win or tie this game,

but let's use it for

for simplicity.

O's turn for instance.

So where can O go?

Well,

that invites the question,

Well,

what is the value of the board or how do we how

do we minimize the value of the board for O to win?

Well,

O can go in one of two places top left or bottom middle.

Which way should O go?

Well,

if O goes in top left,

we should consider what's the value of this board.

Is it minimal?

Well,

let's see.

Uh,

if O goes here,

X is obviously.

Go here X is therefore going to win,

so the value of this board is going to be a 1.

Now since there's only one way logically to get from this configuration to this one,

we might as well call the value of this board by transitivity 1.

And so O probably doesn't want to go there because that's a

pretty maximal score and O wants to minimize over here though,

if O goes bottom middle,

well then X is gonna go top left and now no one has one,

so the value of this board is thus.

0,

we might as well treat this as 0 because that's the only way to get there logically.

So now O more mathematically and logically can decide do I

want an end point of 1 or an endpoint of 0.

Well,

0 is probably the better option because that's less than 1,

and thus it's the minimal possibility.

So O

is going to go ahead in the bottom middle and at least force a tie.

And so that's where you.

The evidence where if you humans are ever losing the game of tic tac toe,

you have not followed that their logic,

but you could probably do it if there's just 2 moves left.

But the catch is,

let's go ahead and sort of rewind to 3 moves left.

Here there are 3 blanks,

and I've kind of zoomed out.

The catch is that the decision tree gets a lot

bigger the more and more moves that are left.

It gets sort of bigger and bushier in that it's essentially doubling in size.

And with,

and that's great if you have the luxury of writing it down on a piece of paper,

but if you're doing this on your head while playing against 1/5 grader,

if I may,

you're probably not drawing out all of the various

boards and configurations trying to play it optimally.

You're going with some instinct,

and your instincts might not be aligned with an algorithm

that is tried and true mini Max that will ideally get you to win the game,

but at least will get you to force a tie if you can't win.

But

Tic Tac is not that hard.

I mean,

how many different ways are there to play Tic Tac Toe?

We could write a computer program to pretty much play Tic Tac Toe optimally.

Um,

we could use code like this if the player is X for each possible move,

calculate the score for the board at that point in time,

and then choose the move with the highest score.

So you just try all possibilities mathematically and then you make the decision.

Most of us in our heads are not doing that,

but we could,

else if the players though,

essentially do the same thing but choose the minimal

possible score.

So that's the code for implementing tic tac toe.

How many ways are there to play tic tac toe though?

Well,

255,168,

which means if we were to draw that tree,

it would be pretty darn big and it.

Take you quite a bit of time to sort of think through all those possibilities.

So in your defense,

you're maybe not that bad at tic tac toe,

it's just harder than you thought as a game.

But what about games with which we might as adults be more familiar?

Well,

what about the game of chess,

which is often used as a measure of like how smart a computer is,

whether it's Watson back in the day playing against it or something else?

Well,

if we consider even just the 1st 4 moves of tic tac toe,

whereby I mean black goes and white goes,

and then they each go 3 more times,

so 4 pair wise moves,

how many different ways are there to play chess?

Well,

it turns out

85 billion just to get the game started,

and that's a lot of decisions to consider and then make.

How about the game of Goa familiar?

Consider the 1st 4 266 quintillion possibilities.

And this is where we sort of as humans and

even with our modern PCs and Macs and phones kind

of have to throw up our hands because I don't

have this many bits of memory in my computer.

I don't have this many hours in my life left to actually

crunch all of those numbers and figure out the solution and so.

So where AI comes in is where it's no longer as simple as just writing

if else's and loops and no longer as simple as just trying all possibilities.

You instead need to write code that doesn't solve

the problem directly but in some sense indirectly.

You write code so that the computer figures out how to win,

perhaps by showing it configurations of the board that are a

good place to be in that is promising and maybe showing

it boards that it doesn't want to find itself in the

configuration of because that's going to lead it to lose.

In other words,

you train.

but not necessarily as exhaustive,

and this is what we mean nowadays by machine learning,

writing code

via which machines

learn how to solve problems generally by

being trained on massive amounts of data and

then in new problems looking for patterns by

which they can apply those past training data

to the problem at hand.

And reinforcement learning is one way to think about this.

In fact,

in fact,

we as humans use

reinforcement learning,

which is a type of machine learning sort of

all of the time in fact.

A fun demonstration to watch here involves these pancakes.

So in fact,

let me go ahead and pull up a short recording here of an actual researcher in a lab

who's trying to teach a robot how to make,

how to flip pancakes.

So we'll see here in this video

that there's a robot that has an arm that can go up,

down,

left,

right.

This of course is the human,

the researcher,

and he's just going to show the robot one or more times like how to flip a pancake.

And crosses his fingers and OK,

seems to have done it well,

does it again,

not quite the same.

But pretty good.

And now he's going to let the robot just try to figure out how

to flip that pancake after having just trained it a few different times.

The first few times,

odds are

the robot's not going to do super well because it really doesn't

understand what the human just did or what the whole purpose of.

But,

and here's the key detail with reinforcement learning behind the scenes,

the human is probably rewarding the robot when it does a good job,

like better and better.

Flips the more it gets rewarded as by like hitting a key and giving it a point,

for instance,

or giving it the digital equivalent of a cookie

or conversely,

every time the robot screws up and drops the pancake on the floor,

sort of a proverbial slap on the wrist,

a punishment so that it does less of that behavior the next time.

And any of you who are parents,

which by definition today many of you are,

odds are,

whether it's not this or maybe just verbal approval or reprimands,

have you probably trained children at some point to do more of one thing.

And less of another.

And what you're seeing in the backdrop there is

now just a quantization of the movements X,

Y,

and Z coordinates so that it can do more of the X's and the Y's and the Z that led it to

some kind of reward.

And now after you're up to some 50 trials,

the robot seems to be getting better and better

such that like a good human,

we'll see if I can do this without embarrassing myself,

can flip the thing.

That's pretty good.

That's pretty,

I've been doing this a long time,

OK.

So

we've seen then how you might reinforce

learning through that kind of domain.

Let's take an example that's familiar to those of you who are gamers.

Any time you've played a game where there's some kind of

map or a world that you need to explore up,

down,

left,

right,

maybe you're trying to get to the exit.

So

here simplistically is the player at the yellow dot.

Here,

for instance,

in green is the exit of the map and you want to get to that

point and maybe somewhere else in this world there's a lot of like lava pits.

You don't want to fall into the lava pit because you lose a life

or you lose a point or there's some penalty or punishment associated with that.

Well,

we,

with this bird's eye view can obviously see how to get to the green dot,

but if you're playing a game like Zelda or something like that,

all you can do is move up,

down,

left,

right and sort of hope for the best.

So let's do just that.

Suppose the yellow dot just randomly chooses a direction and goes to the right.

Well,

now we can sort of take away a life,

take away a point or effectively punish it

so that it knows don't do that.

And so long as the player has a bit of memory,

either the human player or the code that's implementing this,

just with a dark red line,

that means don't do that again because that didn't lead to a good outcome.

So maybe the next time the yellow dot goes this way and this way,

and then,

ah,

I didn't realize that that's actually the same lava pit,

but that's fine.

Use a little bit more memory and remind me,

don't do that because I just lost a second life in this story.

And maybe it goes this way next time,

ah,

now I need to remember,

don't do that.

But effectively I'm either being punished.

Doing the wrong thing,

ah,

or as we'll soon see,

being rewarded for doing more of the successful

thing and just by chance maybe I finally make

my way to the exit in this way and so I can be rewarded for that.

Now I got 100 points or whatever it is,

the high score.

So now as for these green lines,

I can just follow that path again and again and I can always win this game,

kind of like me nowadays,

like 30 years later playing Super Mario Brothers

because they can get through all the warp levels

because I know where everything is because for

some reason that's still stored in my brain.

Is this the best way to play?

Am I as good at Super Mario Brothers as I might think?

What's bad about this solution?

Yeah.

Yeah.

Exactly,

yeah,

I've moved many more times than I need to and just for fun today.

What grade are you in?

7th grade.

Wonderful.

So now 7th grade observation is like exactly that,

that we

could have taken a shorter path,

which is essentially that way,

albeit

uh making some straight moves.

And so we're never gonna find that shorter path.

We're never going to get the highest score possible if

I just keep naively following my well trodden path.

And so how do we break out of that mold?

And you can see this even in the real world,

other sort of.

No example is I'm the type of person for some reason

where if I go to a restaurant for the first time,

I choose a dish off the menu and I really like it,

I will never again order anything else off that menu

other than that dish because I know it is good,

but there could be something even better on the menu,

but I'm never going to explore that because I'm sort of fixed in my ways,

as some of you from the smiles might be too.

But what if

we took advantage of exploring just a little bit?

And there's this principle of exploring versus exploiting when

it comes to using artificial intelligence to solve problems.

Up until now,

I've just been exploiting knowledge I already have.

Don't go through the red walls,

do go through the green walls.

Exploit,

exploit,

exploit,

and I will get to a final solution.

But what if I just sprinkle in a little bit of randomness along the

way and maybe 10% of the time as represented by this epsilon variable.

I,

as the computer in the story,

generate a random number between 0 and 1,

and if it's less than that,

which is going to happen 10% of the time,

I'm going to make a random move instead of one

that I know will get me closer to the exit.

Otherwise I'll indeed make the move with the highest value.

Now this isn't going to necessarily win me the game that first time,

but if I play it enough and enough and enough and insert some of this randomness,

I might very well find a better solution.

And therefore be a better player,

a better winner

overall.

If I just 10% of the time ordered something else off the menu,

I might find that there's an amazing dish out there

that otherwise I wouldn't have discovered.

And so indeed using that approach,

can we finally find a more optimal path through the maze,

as was shorter there,

presumably therefore maximizing our score and doing

even better than we might have.

By just exploiting the same knowledge,

so you can see this even in the game of Breakout,

especially if you write a solution in code to play this game for you.

Let me go ahead and pull up another video recording of an AI playing Breakout,

and what this AI is doing is essentially figuring out,

maybe more intelligently than you or I could,

how to play this game

optimally.

And what we'll see here

is that

it's just like uh the pancake flipping robot.

There's some notion of scoring and rewards and penalties here.

So like right now the paddle's just doing random stuff.

It doesn't really know how to play the game yet,

but it realizes after 200 episodes that oh my score goes up if

I hit the ball and it goes down equivalently if I miss it,

and it's still a little twitchy.

It doesn't quite understand what it's supposed to do and why,

but if you do it again.

and again and again and it's rewarded and or punished enough,

you'll see that it starts to get

pretty good and closer to what a good human might do.

But here's where the algorithm gets a little creepy.

If you let it play long enough or if you and I,

the humans play long enough,

you might find a certain trick to the game.

I dare say the AI becomes a bit scarily sentient.

In that turns out

if you're smart enough to break through that top row,

you can let the game just play itself for you

and maximize your score without even touching the ball,

something that I do find a little creepy that I

just figured out how to do that without being told,

but it's just a logical continuation of rewarding it for good behavior

and punishing it

for

bad behavior.

So that next time you have an occasion to play Breakout,

consider that kind of strategy as opposed to doing more of the work yourself,

let the computer do it for you instead.

Well,

what else is there to consider in this world of AI in the context of machine learning?

Well,

there's specifically a category of learning that's supervised,

and we've been using this for years.

And in fact,

our first example of spam early on was certainly supervised.

Why?

Because it was you and I who was like putting

the email into the spam folder and to this day,

maybe once a day I hit the keyboard shortcut in Gmail to say,

ah,

this is spam.

You should have caught this,

and that is training Google's algorithm further,

assuming it's not just little old me,

but maybe thousands of people tagging that same kind of.

Email

as spam,

that's supervised learning and that there's a human in the loop doing

at least something so spam detection might be one of those.

But the catch is that labeling data in

that way manually just doesn't scale very well.

That would be akin to having someone at Google or Microsoft labeling every email

or someone at Netflix doing the same for all of the videos out there.

It's expensive in terms of human power,

and there's certainly problems out there with so much data.

It's just not realistic for humans to label millions of pieces of data,

billions of pieces of data.

We've got to move to an

Supervised model.

And so this is where the world starts to consider deep learning,

solving problems

using code whereby you don't even have humans in the loop in quite the same way.

And neural networks inspired by the world of biology are sort

of the inspiration for what is the state of the art,

even underlying today's rubber duck and more generally these things

called large language models like chat GPT and the like.

So here pictured somewhat abstractly is a neuron,

and it's something in the human body that transmits a signal,

say,

from left to right.

Electrically and if you have multiple neurons,

you can intercommunicate among them so that if I think a thought,

then I know how to raise my hand because some kind of

message electrically has gone from my head to this extremity here.

So that's in essence what I remember from 9th grade biology.

But as the computer scientists,

we sort of abstract all of this away.

So instead of calling these two neurons,

drawing them as neurons,

let's just start drawing neurons as these little circles,

and if they have connective tissue between them of sorts,

we'll just draw a straight line,

an edge between them.

So this is what a computer scientist would.

A graph.

If you have two such neurons over here leading to 11 neuron here,

you can think of this as being like maybe two inputs to a problem

and now one output there too.

We can represent the notion of problem solving,

which is what CS 50 and intercourses more generally are all about.

So let's solve a problem with a neural

network without necessarily training it in advance,

just letting it figure out how to answer this question.

Here's a very simple two-dimensional world,

XY grid,

and here are two dots,

and the dots in this world are either blue or they are.

Red,

but I have no idea yet

what makes a dot blue or red.

However,

if you train me on those two dots,

I bet I could come up with predictions,

especially if you let me label this world

in terms of X coordinates on the horizontal,

Y coordinates on the vertical,

and then you know what we can think of this

neural network very simply as representing the X coordinate here,

the Y coordinate here,

and the answer I want to get is quote unquote red or blue or 0 or 1 or true or false,

however you want to think

of the representation.

So how do I get from a specific.

XY coordinate to a prediction of color if I only know the coordinates.

Well,

up from the get-go,

maybe the best I can do is just divide the world into

blue dots on the left and red dots on the right,

a best fit line,

if you will,

based on very minimal data.

Of course,

if you give me a third dot,

it's going to be pretty easy to dize that I was a little too hasty.

That line is not vertical,

so maybe we pivot the line this way and now I'm back in business.

Now I can predict with higher probability based on xy what color the next dot will be.

You give me enough of these dots.

I can come up with a pretty good best fit line.

It's not perfect,

but here's a hint at why AI is not perfect.

But 99% of the time maybe I'll be able to predict correctly,

and I can do even better if you let me squiggle the

line a little bit and maybe make it more than just a simple

slope.

So what is it we're really doing

with implementing this neural network,

albeit simplistically with just 3 neurons?

Well,

essentially we're trying to come up with 3 values,

3 parameters an A,

a B,

and a C.

And what do those represent?

Well,

really just a solution to this formula that their line we drew

can be represented if you think back to like high school math

with a formula along these lines whereby it's ax x plus B

times Y plus some constant C and we can just arbitrarily conclude that

if that value mathematically gives me a number greater than 0,

predict it's going to be blue.

otherwise predict it's going to be red.

We can sort of map our mathematics.

Just like with tic tac toe,

to the actual problem we care about by defining the world in this way.

And so if you give me enough data points and enough data points,

I can come up with answers for that A,

B,

C,

the so-called parameters in neural networks.

Now in reality,

neural networks are not composed of like 3 neurons and a couple of edges.

They look a little something more like this,

and in practice they've got billions of these things here on the screen.

In which case pretty much every one of these edges represents some mathematical

value that was contrived based on lots and lots of training data,

and whereas I,

the computer scientist,

might know what these neurons over here represent because those are my inputs,

3 in this case,

and I,

the computer scientists know what this one represents at the end.

If you sort of took the hood off of this thing and looked inside the neural.

even though there'd be millions,

billions of numbers going on there,

I can't tell you what this neuron represents or why this edge has this weight.

It's because of the massive amount of training data

that that's just how the math works out.

And if you feed me more data,

I might change some of those parameters more.

So the graph ultimately might look quite different,

but my inputs and my outputs are going to be what I use to solve that problem.

So if you want to predict like rainfall from humidity or pressure,

you can have two inputs giving that one output,

an advertising dollar spent in a given month that might predict

sales by just having trained again on such volumes of data.

And when we get now full circle to something like CS 50's rubber

duck and large language models like Claude and Gemini and Chat GPT,

what's really happening,

and this is all hot off the press in recent years,

screenshotted here are some of the recent research papers that

have driven a lot of this advancement in recent years.

You have From OpenAI say a generative pre-trained transformer,

which is a lot to say,

but there's the GPT in chat GPT,

and essentially this is a neural network that's

been trained on large volumes of textual information

that gives us the interactive chat feature that we have in the class,

and we all have more generally in chat BT itself.

So an example of what

is actually happening underneath the hood of these GPTs.

Well,

here's a paragraph.

That up until recent years was kind of a hard paragraph to end with the

Massachusetts is a state in the New

England region of the northeastern United States.

It borders on the Atlantic Ocean to the east.

The state's capital is.

Now most anyone living in Massachusetts probably knows that answer,

but if this AI has just been trained on lots and lots of data,

there's probably a lot of people who say Massachusetts in part of a sentence,

and then

the answer,

which I won't say yet is in uh the other part of the.

Sentence,

but in this example,

given that the question we're asking is sort of so far

from some of the useful keywords up until recently,

this was a hard problem to solve because there was so much distance.

Moreover,

there's these nouns that are being used to substitute for the

proper noun like we suddenly start calling it a state,

we call it a state down here and it

wasn't necessarily obvious to AIs that we're talking about

the same thing as if it were just city

comma state where you'd have much more proximity.

So.

In a nutshell,

what we now do,

especially to solve problems like these,

is we first break down a sentence or the training data or input

a like into like an array or a list of the words themselves.

We come up with a representation of each of these words.

For instance,

the word Massachusetts,

if you encode it in a certain way,

is going to be represented with an array or vector of numbers,

floating point values,

so many so that the word Massachusetts in one model would use these 1,536.

Floating point numbers to represent Massachusetts

essentially in an N-dimensional space,

so not just an XY plane but somewhere sort of virtually out there.

And then,

and this has been the key to these GPTs,

attention is calculated based on all of that data whereby in this picture

the thicker lines imply more of a relationship between those two words.

So Massachusetts and state is inferred as having a thicker line,

a higher attention from one word to the other,

whereas our.

A's and our and our thes have thinner lines because they're just not as

much signal to the AI as to what the answer to this question is.

Meanwhile,

when you then feed that sentence like the

state's capital is one word per neuron here,

the goal is to get the answer to that question.

And even here this is way smaller of a

representation than the actual neural network would be.

But in effect,

all these LLMs,

large language models are,

are just statistic.

Models like what is the highest probability word that it should spit out at the

end of this paragraph based on all of the Reddit posts and Google search results

and encyclopedias and Wikipedias that it's found and trained on online?

Well,

the answer hopefully will be

Boston.

But of course 1% of the time,

maybe less than that,

the answer might not be correct and even CS 50's own duck is fallible,

even though we've written lots of code to

try to put downward pressure on those mistakes,

and those mistakes are what we'll call.

Lastly,

hallucinations where the AI just make

something up perhaps because some crazy human

on the internet made something up and it was interpreted as authoritative or

just by bad luck because of a bit of that exploration 10% of the time,

1% of the time,

the AI sort of veered this way in the large language model in the

neural network and spit out an answer that just in fact is not correct.

And so I thought I'd end for today on this final note.

A poem with which many of us might have grown up from Shel Silverstein here about

the homework machine which years ago somehow sort

of predicted the state we would be in

with these AI machines.

He said,

the homework machine,

oh,

the homework machine,

most perfect contraption that's ever been seen.

Just put in your homework,

then drop in a dime,

snap on the switch,

and in 10 seconds' time,

your homework comes out quick and clean as can be.

Here it is,

9 + 4,

and the answer is 33?

Oh me,

I guess it's not as perfect.

As I thought it would be

This then was CS 50.

See you next time.

Loading...

Loading video analysis...