CS50x 2026 - Lecture 2 - Arrays
By CS50
Summary
Topics Covered
- Reading Levels Quantify Text Complexity
- Debug50 Reveals Execution Flow
- Compilation Has Four Hidden Steps
- Strings Are Null-Terminated Arrays
- Caesar Cipher Breaks on Brute Force
Full Transcript
1 fish, 2 fish, red fish, blue fish.
Congratulations.
Today is your day.
You're off to great places, you're off and away.
It was a bright cold day in April, and the clocks were striking 13.
Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of victory mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him.
Alright, this is CS 150, and this is week 2, and if we could after this dramatic reading, a round of applause for our volunteers.
So we can now take for granted from week 1 that we now have a new way to express some of the ideas that we first explored in week 0 like functions and conditionals and variables and the like, and now we're doing in see what we used to do in Scratch.
Today what we're going to start to focus on is some real world problems so that we can take for granted that we have that expressiveness.
We have some tools in our toolkit and actually start to solve some real world problems. Problems if representative thereof.
In particular, the real world problem that we're going to start today and this week with is that of reading levels.
Odds are when growing up you read at a certain level based on the age at which you were at.
Maybe it was 1st grade level or 5th grade level or 10th grade level or the like, and that was a function of just how comfortable you were with the words in the book or words on the screen that you were reading.
What you've just.
Thanks to our volunteers are 3 different reading levels that each of these 3 volunteers reads at.
And in fact, why don't we go ahead and hear them again and be a little more thoughtful this time as to assess at what reading level your classmate is reading.
So let's start with Leah if you'd like to introduce yourself first.
Hi, I'm Leah.
I'm a 1st year in Holworthy, and here's my little thing.
1 fish, 2 fish, red fish, blue fish.
So what reading level would you say Leah reads based on her recitation thereof, yeah, in the front.
Kindergarten, OK, so a fairly young age, and what makes you say kindergarten?
She is speaking in very short phrases without much complexity.
OK, very short phrases without much complexity, and indeed, according to one scientific measure that we'll explore in this week's problem set, indeed, we would say that Lea reads before grade one, so kindergarten.
Indeed be apt, but welcome to the stage here.
Let's move on now to Maria if you'd like to introduce yourself.
Yeah, hi, I'm Maria.
I'm in Stoughton thinking of applied math.
Um, congratulations.
Today is your day.
You're off to great places.
You're off and away.
Another familiar phrase perhaps at what reading level would you say Maria is?
Yeah, over here.
And what makes you say 2nd or 3rd grade?
OK.
So now we're starting to introduce complexities like rhyming and a bit more substance to the quote.
And indeed based on that reading that same measure that I described earlier, which will involve a mathematical function that somehow analyzes what it is Maria just said, indeed we would conclude that she read at a 3rd grade level or grade 3.
Finally, Omar, if you'd like to introduce yourself and read once more yours.
OK, um, so hi everyone, I'm Omar.
Um, I'm a freshman at Hurlbut thinking of doing Cosci, and this is my reading.
Um, it was a bright cold day in April, and the clocks were striking 13.
Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of victory mansions, though not quickly enough to prevent the swirl of gritty dust from entering along with him.
All right, sort of escalated quickly.
What reading level is Omar at, would you say?
Someone else.
What might you say or estimate?
Yes, right here in the front.
OK, 8th grade, and what made you say that?
Uh More complex sentences, more complex words, and indeed, according to that same measure, this full paragraph of text now, which indeed has even more grammar when you see it there on the screen, would be said to be at grade 10 because of that added complexity.
So with that said, we're going to need to be able to somehow sort of crunch these numbers to determine, given a body of text at what reading level someone is, but in order to do that and apply any metrics to a body of text, we're going to need to represent that text in memory using something like strings.
Last week, but last week with strings, we could really just print them out or display them wholesale on the screen.
But I think we're going to need to break down these various texts and others like it at a finer grade level.
And indeed among the goals for today is to explore exactly that and also to take the proverbial hood off of the car to take a look underneath and how the computer is actually working, how these things like strings are actually functioning.
So if you could join me one last time in a round of applause for our volunteers, thank you so much for helping out.
Thank you guys.
Thank you.
Thank you to Maria as well.
So among the goals for today beyond exploring a representative problem like this of reading levels is going to be another one which is even more important and more omnipresent than reading levels, namely cryptography, the art of scrambling information or specifically encrypting it so you can send secure communications.
Now you sort of take this for granted increasingly nowadays that when you send a text message or perhaps an email or check out online with a credit card that somehow Other your information is secure and over the coming weeks we're going to explore to what extent that is actually true and why or why not.
Now with cryptography, similarly too, if we want to be able to send messages securely such that if I want to send a message to you, I don't want anyone else in the room to be able to figure out what it is I have said, even if they physically intercept that message, which is all too possible in the digital world, we're going to need to come up with metrics.
And mechanisms for actually scrambling information in a reversible way so that I can write my message, somehow scramble it.
You can receive that message even if after it's passed through many other hands, and you can descramble or decrypt that same message.
So for instance, here on the screen is a message, a fairly simplistic one, that has somehow been encrypted, and we'll see by the end of today and by the end of this week that this encrypted message, and there's a bit of a tell on the end there, actually will be said to decrypt too.
This is CS 50, but why is going to be the underlying question and what additional tools do we need on our toolkit in order to do that.
Another word on tools.
So up until now you've probably experienced some bugs, whether it was in scratch or ever more so in C.
In fact, don't feel too bad if like the very first program you wrote in C like didn't even work.
You couldn't even make it or compile it until you went back and fixed some of the code that you had written.
Well, it turns out that bugs, mistakes, and Programs are ever so commonplace and even though we've already provided you with tools like the virtual rubber duck at CS50.AI also embedded into VS code at cs50.dev, of whom you can ask questions along the way.
Among the goals today are just to give you some lifelong tools at how you can actually debug software yourself when you don't have a duck nearby, when you don't have a TA nearby, let alone any humans at all.
So with debugging, there's going to be a number of techniques that we can use all toward an end of like finding and removing.
Bugs or mistakes from our software and perhaps the person best known for having popularized this term of bugs is that of Dr. Grace Hopper, pictured here, who was a rear admiral in the navy and was one of the original programmers of the so-called Harvard Mark 1, a very early mainframe computer that if you wander across the Charles River over to the science and engineering complex here at Harvard,
you can actually see part of this on display still in the lobby.
It was succeeded by the Harvard Mark II, and on the Harvard Mark 2.
Hopper and her team were known for having put this note in their logbook after having done some number crunching on the system there.
And if we zoom in, they have found a problem with the computer this one day whereby there was literally a bug, a moth inside of the circuitry of the computer, and as was written here, first actual case of bug being found.
And ever since then do we say ever more so the phrase bug and debugging when it comes to finding and eliminating problems in our code.
So let's start.
With just that, in fact, let me go over to VS code and let's deliberately make some mistakes together that might very well be reminiscent of some of the mistakes you've accidentally made thus far, but along the way give you all the more tools for solving those problems as opposed to sort of having to ask someone else, be it virtual or physical, for help, and actually find these mistakes in your own code.
Let me go ahead and consciously in VS code create a program known to be buggy called buggy.c.
And in this program, let's go ahead and do some fairly Familiar code initially.
I'm going to go ahead and start just like we did last week with In Main Void.
More on that today before long inside of my curly braces, I'm going to say print F, hello, comma, world.
That's it.
Now I'm going to go back to my terminal window here.
I'm going to go ahead and do make buggy to make a program from that source code, but before I do, odds are even after just a week of this stuff, you can probably spot a few mistakes I've made, a few bugs.
What do you see wrong already, yeah.
I didn't include standard IO.
H, that so-called header file, which is important because it tells the compiler that I plan to use functions therein like printF, which clearly I'm doing.
So let me go in and include standardio.h.
What else seems to be wrong here?
Yeah.
I'm missing a semicolon at the end of line 5 here, so I'm gonna go ahead and add that in, and this is subtle and arguably not a bug, but maybe an aesthetic detail.
What else have I done arguably wrong, yeah, and back.
Yeah, I forgot my backslash and the new line character just to move the cursor to the next line so that when I get a new prompt, it's on a fresh line of its own.
Again, more of an aesthetic, but certainly a pretty reasonable thing to do.
So let me go ahead now and actually in my terminal window run Ma buggy and indeed compiled.
But up until then had I not fixed those mistakes, I would have triggered a whole bunch of bugs, a whole bunch of error messages as a result.
In fact, let's rewind in time and undo the fixes.
Just made and go back to the original form here and try running again, make buggy enter and we'll see some scary looking messages up here.
Let me scroll up to the top of the output here where we see buggy.c colon 3, which means line 3.
That's where the problem is right now.
error.
Call to undeclared library function, print F with type, and then it starts to get a little more complicated, but I do see clearly that it's calling my attention to print.
So hopefully at some point, if not last week, hopefully this week onward, your instinct will be, oh, all right, I'm an idiot.
I forgot the header file in which print F is actually declared.
It's not a huge deal.
It's going to come with practice, so that's how I might know in more intuitively what in fact the solution here might be.
Now here's another common mistake that I've just gone in and fixed, but I did do something wrong and hopefully none of you actually did this because it's an annual FAQ.
What did I just do accidentally wrong?
So it's not studio.H,
it's standard IO.h. So do kind of ingrain that one for standard input output.
The next though bug that I haven't yet fixed is that semicolon.
So let me clear my screen and rerun Make Buggy.
I should no longer see that first error message anymore, but I now do see another error message online.
5 expected semicolon after expression.
All right, that one's pretty explicit, so I'm going to go ahead and fix this, but notice that up until now my code wouldn't have been able to compile because of those two error messages.
It stopped showing me by showing me these errors, but at this point, if I run make buggy enter.
It did in fact compile and yet it's arguably still buggy because when I run buggy I get my prompt on the wrong line.
So this is a distinction now between a syntax error, something that or a programming error that outright stops my program from compiling.
It's sort of a deal breaker versus something that's maybe more of a logical error.
I actually meant to move the cursor to the next line and so there's different.
Types of errors in the world as we're seeing here of course if I rerun Make Buggy Againslash buggy, now we're back in business hopefully with the intention of having this, uh, display exactly that.
All right, well, let's modify to look a little more like something else from last week.
Recall that last week I started to get someone's name more dynamically, so I said something like name equals get string.
And that was a function we introduced, and I might have said something like this what's your name?
question mark with a space just to move the cursor over.
I know now I definitely need to end my thought with a semicolon.
I could try and compile this.
Make buggy now and I'm seeing a different error message altogether that you might not have seen yet.
So on buggy.C line 5 error use of undeclared identifier name.
What now is the mistake that I've made?
Why does it not know?
Yeah, I forgot to declare the type of this variable, which for those of you with the prior programming experience is not something you have to do in some languages like Python, for instance, but in languages like C, C++, Java, and others you do in fact need to explicitly tell the compiler that you want to instantiate a vari.
create a variable in the computer's memory by telling it it's type, and it's not going to be an in because I don't want an integer, of course, in this case I want text which we now know to be called string instead.
All right, I think this fixes that bug, so let me do make buggy again and hopefully.
Huh, a fatal error this time again indicating that my code did not recompile on line 5 still, I have an error, but this time it says use of undeclared identifier string.
Did I mean standard in.
So this is a bit of a red herring.
The compiler's trying to be helpful in saying did I mean standard in, but I don't think I actually do.
That just is the most similar looking word in the compiler's own memory.
What's the actual mistake that I've made here?
Yeah.
Yeah, I didn't include the CS50 header file because string recall is a feature of the CS 50 library, as is get string and get into and others.
So the solution here is indeed to go up here and just to be nitpicky, I tend to alphabetize my header files.
It's not strictly required technically, but stylistically.
I find it nice to be able to skim the header files alphabetically to see if something is there.
Not I can include CS50.
H in addition to standardio.h and it's in that file CS50.h that not only is get string declared so that the compiler knows that it exists, it turns out so is the word string.
So this is a bit of a white lie, and this is something we do in the early weeks of the class.
We dug up these old training wheels from a bicycle, the whole idea being to sort of keep you up and avoid you having to do.
Too much complexity early on.
The point of these training wheels in the form of the CS 50 library is to let us kind of ignore what a string really is for just another week or two, after which we will then peel back that layer, take off those training wheels, and reveal to you what is actually going on.
So for now, strings exist, but they exist because of the CS 50 library.
In a couple of weeks.
They're still going to exist, but we're going to call them by a different name, as we'll eventually see.
But everyone in the real world, every software developer uses the phrase string.
So this is a concept that exists.
It is not CS 50 specific at all.
It's just that in C, the word string doesn't typically exist unless you make it so as we have.
All right, so I think now if I clear my terminal window.
And rerun make buggy.
Now it should in fact compile and if I runs buggy enter, I should be able to type in my name and now voila, hello.
So this is now not a syntax error because I didn't screw up my code per se like it compiled.
Everything is grammatically correct, so to speak, but logically, intellectually, this is not what I wanted, right?
I wanted it presumably to say hello, David, so let's fix one final bug here.
How do I fix this?
On what line?
How do I get it to say yeah hello David.
Yeah, on line 7 I need to do the string placeholder, the format code, so to speak, percents, and then one more thing, someone else.
What do I do after this?
Yeah, and back Yeah, comma and then add the variable name that contains the value I want to substitute in there, which is indeed name, though I could have called it anything I want.
All right, so now make buggy enter seems to have compiled again buggy.
Now I type in my name once more, and now we're back in business.
So over the course of these few exercises, clearly I meant to make most of all of these bugs, these mistakes, but they demonstrate not only syntax error.
which are just going to stop the compiler in its tracks.
Like you won't even be able to compile your code until you fix those things.
But even after that there could be these latent bugs that seem to not be there until you actually provide input and see what's actually happening at so-called run time when you're running the actual code.
And so here's where it's no longer as easy as just reading the error message and figuring out what it means because there There is no error message that appeared on the screen when it said hello, comma world.
We had to use our own human intellect and realize, OK, that's clearly not what I wanted.
Had you run CS 50's own Check 50 program on something like that, we could have told you that that's not correct by automatically assessing the correctness of it, but the compiler has no idea what you are trying to achieve logically.
It only knows about the language C itself and the requisite syntax for actually.
Uh, writing and compiling code.
So how could we go about solving logical problems in code?
So I would propose that we start to consider this here list whereby when you want to find a logical problem in your code and better understand what's going on or really what's going wrong, printF is going to be your friend.
Up until now we've used printF to literally print on the screen.
Hello David.
Hello Kelly or anything else on the screen, but you can certainly use PrintF temporarily to just print stuff out inside of your program that you might want to better understand.
And then once you understand it and once you've solved some problem, fine, then you can delete those temporary lines of code, recompile, and move on.
So let's use printF as a debugging tool in that sense.
Let me go back over to VS code here and let me in this same program.
Buggy.c sort of delete everything and start over with a different sort of bug.
I'm going to include standard IO.
H at the top.
I'm going to do in Main void after that.
And then inside Maine I'm going to do a simple for loop that just prints out like a stack of 3 bricks like we saw in the world of Mario when Mario needed to, we claimed sort of jump over a stack of bricks.
We want to print out just 3 of those at the moment.
So I'm going to go ahead and say 4 in.
I equals 0.
I is less than or equal to 3 because I want 3 of these I plus plus.
Then inside of this 4 loop I'm going to go ahead and quite simply do print F hash symbol to represent the brick followed by a new line to move the cursor to the next line, semicolon to complete the thought.
Now I've deliberately made a stupid mistake here, but in the context of a simple enough program that we can focus on the debugging technique on not on the obscurity of the bug in question.
Hopefully you'll spot the bug in just a moment if not already.
When I do make buggy now and do buggy, I don't get 3 bricks.
I of course get 1234 total.
So there's a logical bug in this program, and odds are you can already spot what it is.
But let me propose that this program is representative of a type of problem that you can solve a little more diagnostically by poking around and really asking the computer via print up to show you what's really going on, and I would.
that one of the most helpful techniques in a situation like this, if you're trying to wrap your mind around why are there 4 bricks instead of 3, well, clearly this is related to the loop somehow.
So let's look a little more thoughtfully at what the value of I is before we print out each of those bricks, and I might literally do something like this temporarily print F quote unquote I is peri backslashn close quote and then I could just print.
Right here and now, the value of I just so that I can actually see it.
Let me now go down into my terminal window, make buggy again buggy, and now, and I'll full screen my terminal, I'll get some diagnostic information at the same time.
So when I is 1, I get a brick.
When I, when I is 0, I get a brick.
When I is 1, I get another brick.
When I is 2, I get another brick.
When I is 3, I get a 4th brick.
So now I can kind of see that, OK, my loop is working, but I'm.
Going too far.
I'm going too long now.
I can do this even more succinctly for what it's worth.
I don't need a whole new print def statement.
I could just go into my existing print def, put my percent I there, and then maybe just space just to scooch things over, and then print out i in that same line.
If I now do make buggy buggy, OK, now I'm seeing that I'm printing a hash, a brick for each value of I from I equals 012, and also 3.
So the solution of course is that I shouldn't be starting at 0 and iterating less than or equal to 3.
The solution is like, oh, I'm an idiot.
I should have said less than 3, or if I prefer to count starting at 1 like a normal person, I could have said I equal to 1 and then go up 2 and through 3.
But as I claimed last week, the canonical way, the most common way to do this is start counting at 0 and go up 2, but not through the total value that you have in mind.
But there's going to be another technique that's worth knowing here.
Let me go ahead and sort of abstract this away by whipping up a slightly better variant of this as follows.
Let me go ahead and delete this for loop.
Let me assume for the moment that inside of Maine, I'm going to ask the user now for the height of a pyramid, and I'm going to do something like this.
In H equals get in, and let's prompt the user.
for the height value of this pyramid or this wall, and then let's go ahead and assume there exists a function called print column who takes as input a number H, which is how many bricks you want to print.
Now this function does not exist yet print column get in does exist, but I don't have access to it, so let me not make the same mistake twice.
What do I need to add?
At the top of this file, yeah.
I need the CS 50 header file because I'm using the get in function now, which again comes from our library, not C.
So let me go ahead and include CS50.
H.
But now print column, I can invent this function myself.
So let me go ahead and say void print column in height in parentheses.
More on that in just a moment.
And then I'm going to recreate the loop from before 4 in I equals 0.
I is less than.
are equal to the height, so I'm going to deliberately for now make that same mistake as before I + plus, and then inside of this 4 loop I'm going to go ahead and print out a single hash and a new line to represent that there brick.
So now Main can use a function called print column.
It's going to pass in the value of H and then this 4 loop in the print column function is going to take care of printing this thing for me.
So let me do this again.
Make buggy enter.
So far so good.
buggy.
Let's put it in height.
I'm going to say manually height of 3, and I should see 3 bricks, but of course I'm still seeing 4.
Now before we move on, let me hide my terminal and propose that this is just kind of stylistically bad to put anything other than your main function at the top, but recall that if I move my helper function print column, and it's a helper function insofar as I made it to help me solve another problem, I can't recompile and run my code now.
Why?
The compiler won't let me, yeah.
Exactly when the compiler gets to line 7 of my code, it's going to abort compilation because it doesn't know what print column is.
Why?
Because I don't tell it what it is until line 10, and this was the only time I proposed that copy paste is reasonable is to highlight and copy the very first line of that function, paste it above main with a semicolon, and that's a so-called function prototype.
It specifies what the name of it is, what its inputs are, if any, and what its output is, if any.
And more on these inputs and outputs later on.
But now this is just a more complicated but more modularized version of the same program.
Let me do make buggy, still compiles buggy, type in 3, and I still have that same bug.
But the catch now is that my code has gotten more complicated, and the point of my having abstracted away this idea of printing a column into a new function is that there's just more code now to debug.
I could certainly go in there and start adding print.
But at some point printF is going to be a very primitive tool, and you're going to waste more time adding printFs, recompiling your code, running your code, changing the print F, recompiling your code, running your code.
It's going to get very tedious quickly when you have lots of lines of code on the screen.
So can I actually step through my code line by line, maybe like your TA would in a section or a small class, line by line walking through the code?
You can because another tool that you have access to.
Is that called debug 50.
So this is a CS 50 command that will start an industry standard debugger, and a debugger is a piece of software that is used in the real world that literally lets you do that debug your code by letting you slow down or even pause execution and walk through execution of your code line by line.
The only reason we call it debug 50 is because in VS code it's a little annoying.
To start the debugger and so we automated the process of starting the debugger, but everything thereafter has nothing to do with CS 50 and everything to do with real world software engineering techniques.
So how do we use this?
Let me go back to VS code here and let me propose that I want to step through this code line by line just like we might at a whiteboard in a smaller class to figure out why I'm getting 4 instead of 3 hashes.
Well, in my terminal window.
What I'm going to go ahead and do is this debug 50 space buggy.
So debug 50 is the command.
It needs to know what program I want to debug.
So I'm specifying buggy, which is the name of the program I just compiled.
I'm going to get an error though with the first time I run this, as will you if you make the same mistake.
I'm about to see this message here.
Looks like you haven't set any break points.
Set at least one break point by clicking to the left of a line number and then rerun debug 50.
So what is this really telling me?
Well, the debugger has no idea when and where I want to pause execution so as to start walking through my code line by line.
It wants me to tell it where to break, that is where to pause by clicking on a line number.
So let me hide my terminal for just a moment, and you've probably never done this intentionally.
But if you hover over the space to the left of your program's line numbers, you'll see a little red dot, a little stop sign of sorts.
If you actually click on a line number, that red dot will stay there, and you can see the hover here saying click to add breakpoint.
What I'm going to go ahead and do is say click to add a breakpoint at main.
Main is the entry point to my program.
It's the default function that gets called.
Let's break right away so I can step through this code line by line.
All right, let me reopen my terminal window.
And clear it and then run debug 50 again with slash buggy enter and now a whole bunch of stuff is going to happen quickly on the screen and then it's going to clean itself up because once the debugger is running and ready to go, it's going to allow me to start stepping through my code line by line.
So what is going on?
Well, notice nothing has happened in the terminal yet.
Why?
Because my code has been paused inside of main.
In particular, it's been paused in the first real line of code.
So the curly brace is uninteresting.
The first line is just the function's name essentially.
So line 8 is the first juicy.
Of code that could possibly do anything useful.
It's been highlighted here in yellow, and that the fact that this cursor is here means that we have broken execution on this line, but we have not yet executed this line, which is why in the terminal I don't see anything yet.
I definitely don't see heights followed by colon.
Notice what else has happened here.
All of a sudden in the left hand side of the screen where your File Explorer typically is or where the CS 50 duct typically is, we see mention of variables.
You can actually see inside of the debugger what the value of any variable in the computer's memory happens to be.
Now I don't quite understand this right now.
We'll come back to this over time, but weirdly, before line 8 even executes, it seems that H has a default value of 32,700.
64, which seems to have come from nowhere.
As an aside, this is going to be what's called a garbage value, and this is actually why we have Oscar so omnipresently here.
A garbage value tends to be a default value inside of a variable that's the result of that memory having been used previously for something else.
Inside of your computer, you've got all of this memory, random access memory or RAM.
More on that today, and it stands to reason that my computer or whatever cloud server we're using has been running for some time.
So the bits that H is going to use might already have some random switches on and off, some random pattern of bits that happens to give me 32,764.
But the moment this line of code executes, that value is going to get changed to what I actually want it to be, which is what the human is going to type in.
Meanwhile, at the bottom here you'll see a so-called call stack.
More on this too in the weeks to come, but you'll see that we've paused on the function called main in the file called buggy.c.
So how do I do something useful?
Well, at the very top of the debugger, you'll see a whole bunch of color coded icons.
One looks like a play button, and if I click that, it's just going to continue execution of my code as though I don't want to step through it anymore.
So I'm not going to click that just yet.
The second arrow, which is a little curved arrow over a dot, is the so-called step over line, which will mean step over this line and execute it, but only one line at a time.
Let's go ahead and do exactly that.
So I'm going to click the step over icon, the second one, which is the curved arrow with the dot under it.
Click.
Now I see in my terminal window height being prompted.
All right, let's go ahead and type in 3 just like I did before and hit enter.
Now notice what happens.
Execution has paused on line 9 instead of 8.
And you'll see that my variable, a so-called local variable, has the value of 3 as intended.
All right, so far this isn't all that enlightening other than demonstrative of the fact that I can pause execution of my program anytime I want.
So let's now click that step over button again so that we actually print this column.
Click.
And there we have it, 4 hashes at the bottom of the screen.
Now execution is paused at the end of the function.
This is just my opportunity to either stop or restart or continue.
I'm just going to go ahead and click the play button and let it finish executing.
Unfortunately that wasn't really at all in in.
Lightning except to confirm for me that I typed in 3 and 3 is what is in the computer's memory.
Not that interesting though yet.
So let's do this.
Let's leave the break point on line 6 as before.
Let's rerun the debugger by running debug 50buggy.
Let's let it do its startup thing, which looks a little messy at first, but now we've highlighted line 8 again.
I'm going to go ahead and step over this line because I do want to get an N.
I'm going to type in 3 again enter, but this time instead of stepping over line 9 and just letting print column happen, this is where the debugger gets powerful.
Let me step into line 9 and walk through the print column function itself line by line.
So let me go ahead and click not this button, which is the curved arrow over the dot, but the next one, which is the step into button.
Click.
And now you'll see that execution has jumped inside the print column and paused on line 14, at which point I can see at top left what the default value of I is, and this is some crazy garbage value because whatever bits are being used to store I's value have some random garbage from some previous use of that memory.
But as soon as line 14 executes once, I bet I is going to take on a value of 0.
So let's do that.
I'm going to go ahead and click step over.
Because I don't need to step into this because there's no other functions there.
Step over it and immediately at top left I is now 0.
Now line 16 is highlighted.
Let's step over this, OK?
Not in the terminal window what do you see?
The first of our hashes.
Let's step over, step over, second hash, and I is now 1.
Step over, step over.
Now we see a third hash and I is now 2.
Step over, step over.
OK, there's the symptom of the bug.
4 hashes and yet I is 3, but wait a minute, this is going to draw my attention now to line 14 before I continue on or wait a minute.
3 is of course less than or equal to 3, which is why I got that 4th hash on the screen.
So at the end of the day, like you still need to exercise some of your own human intellect to figure out.
And understand what's going on, but the value of this here debugger is that you can pause and work through things at your own pace and poke around inside of your own code and better understand what's happening as opposed to compiling the program, running it, and just now having to infer from the symptoms alone what the source of the problem might be.
So that was a lot.
Let me go ahead here and just let it continue to the end because I know what the problem is now.
I need to change the less than or equal to sign to a simple less than instead.
Questions though on debug 50 or any of these steps, yeah.
Sure, um, could you go over what the break point.
And then my second.
Mhm.
Correct.
So in order of your questions, what again are these break points, the break point or the little red stop sign here just tells the debugger where to pause execution.
So frankly, I didn't have to break pause execution at Maine.
If I really care about debugging print column, I could have clicked down here instead, and then it would have just run main automatically and only paused once print column gets called.
So a break point is where your code will break, the point at which it will break.
As for the garbage values, I'm it's I'm oversimplifying exactly what's going on inside of the computer's memory, and it's not necessarily using exactly the same memory as before, but the operating system will govern exactly how the memory is laid out.
This is actually a significant problem, long story short in a lot of today's systems because it's not that interesting to me to know that there was 32,000, whatever that number is, or.
Negative number, but suppose that that revealed the password of someone, another program or function that had some information there.
It seems all too easy with the debugger, let alone see, to actually poke around the computer's memory, and we're going to come back to that in a couple of weeks.
But for now it's a garbage value insofar as you didn't put the value there, it somehow got there on its own for now.
Other questions.
Some like when it's 0 goes to 1 at the end of the 4, but when it goes the 4 the next time because.
After you Correct.
So the question is about the order of operations for a 4 loop.
So the first time you go through a 4 loop, the initialization happens, the stuff before the first semicolon, and the condition is actually checked, the Boolean expression.
Then everything inside of the curly braces is executed, then the incrementation or update happens.
Which in this case is I + plus and then the condition is again checked, the Boolean expression, the code is executed, the update happens, the condition again, the code is updated, and so it starts to loop like this.
The debugger's graphics are fairly simplistic and it just highlights the whole line without making super clear what's happening, but that's just the definition of a for loop.
Good question.
Others about debug 50 or print death?
All right, yeah.
the And uh Can you change the position of I plus and height?
Short answer no.
The first thing is the initialization, the variable you want to create and initialize.
The second thing is the actual condition, the so-called boolean expression.
The third thing is always the update.
So it must come in this order.
What you're not seeing is that you can actually have multiple boolean expressions, you can have multiple initializations, you can have multiple updates, but we're keeping it simple for now.
And this is canonical.
All right, so to make clear, assuming that either print F or Debug 50 helped me figure out where the illogic was in my thoughts, I now know that the fix here is to just go and change the less than or equal to to a simple less than, and if I run the program again, of course it's going to give me the 3 bricks that I always wanted instead.
But there's other techniques we can use too.
So besides printF and debug, you might wonder why we have a 7 ft duck behind me here, all of these little rubber ducks on the floor.
So rubber duck debugging per week zero is actually a thing.
This was popularized in a book some years ago, and the idea is that when you are facing some bugs, some mistake in your program, or you're just confused on some concept, there is anecdotal evidence to suggest that just talking out the problem with an.
An inanimate object like a rubber duck on your desk is enough often for that proverbial light bulb to go off over your head because you hear in your own words what confusion you're having, what illogical thoughts you're having, and you don't even need another human or TA or AI in the room to answer the problem for you.
So in fact, on the way out today at the end of class, we've got hundreds of ducks and enough for everyone to take home with you if you'd like to use that as another debugging technique, whether in 50 or something else, but of course now in the age of AI, you also have the AI powered virtual duck at CS50.AI and also in VS code at CS50.dev, which really is a mechanism for asking questions that
you don't think you can solve on your own.
So it might be reasonable to ask the duck what does this error message mean if you're having trouble wrapping your mind around it, but it's less reasonable to say copy paste your code into the duck and say.
What's wrong with my code?
You should really be meeting the AI halfway.
After all, what's the point of actually doing this or any other class is to develop that muscle memory, develop those mental models, get some practical skills.
So try hard to walk that line between asking the duck too much versus deploying some of these same tools yourself, print FDbug 50, even a physical rubber duck on your desk before you resort to sort of escalating it to human-like or duck help.
All right, so with those tools added to one's tool kit, let's actually consider and reveal what's been going on underneath the hood since last week.
So this was the mental model we proposed for last week, whereby when you write source code in a language like C, it's not something that the computer itself understands natively because computers we saw only understand zeros and ones, AKA machine code.
So the compiler is the program that we use to convert your source code.
To the machine's code from C to zeros in 1 in this case.
More generally, a compiler is just a program that translates one language to another, and in this case we're going from source code to machine code.
So let's consider what's really happening.
And indeed this is among the goals of this week is to take a look at a lower level so that when you encounter more interesting, more challenging problems, you'll understand from so-called first principles what the computer is actually doing and supposed to do.
So you can deductively figure things out for yourself and generally not view computers as like magic or I don't know how this works.
You'll have a fairly bottom up sense of how everything works by term's end inside of any computer, laptop, desktop, phone, or the like these days.
So here's the simplest of programs that we wrote last week, even though there's a lot of syntactic complexity as we've seen.
The goal is to get it to machine code.
These here zero.
And ones.
So how has that been happening when you just run Make since last week?
Well, these are the two commands that we've typically run after creating a file like hello.c.
We then compile with Make hello and then we run it with hello.
So let's give ourselves this starting point real quick just so that we have an example in mind of exactly what it is we're compiling.
So let me go back to VS code here.
Close out buggy.c.
And let's create a new file just like last week called hello.c
inside of which is our old friend Standardio.h in Main void, and then inside of this we'll keep it simple, just printing out hello world, which again is my source code in C.
How do I now actually compile that?
Well, of course I can go down to my terminal window, make hello hello, and we're off and running.
So it was a bit of a white lie for me to let you think though that last week the compiler itself is called Make.
Make is a command that literally makes your program.
It makes it by compiling it, but make is not technically the compiler.
If we really want to get nitpicky, the compiler you've been using is actually called Clang for C language, and this is a very popular compiler freely.
All open source, so to speak, you can even look at the code other humans wrote to create the compiler online, and what Make is really doing for us is essentially automating this command.
So all this time I could have just run clang space hello.c,
but the default file name from Klang, the compiler, we and for historical reasons it's not going to be hello, as you would hope it's going to be a do out for assembler output and we don't do this in the first in week one of the class because like this just makes things unnecessarily complex that we're adding some random name that you just have to know to type.
However, we can do this now as follows.
Let me go back to VS code here.
Let me clear my terminal and type LS and we'll see everything we've created thus far buggy.c,
which when I compiled it, I got buggy, and hello.c,
and hello.c, which I just wrote, and when I compiled it, I got hello.
Let's do this command now manually though.
Let's use clang on hello.c and hit enter.
That too seems to work, but if I now type LS you'll see a third program specifically called A.out,
which happens to be the same as hello, it just is using the default name instead of my custom name hello, but if I do slash A.
out, indeed that too will work.
But the reason we don't do that certainly in the first week of the course is that things get a little annoying or sort of escalate quickly thereafter.
So let me go ahead and change this program as we've done a few times already.
Let me include CS50.
H so that we get access to like get string.
Let me do string name equals get string, quote unquote.
What's your name?
question mark, close quote.
And then down here just like before, let me add my percent S and add in my name.
So I did that super quickly, but it's the same program we wrote a few minutes ago and it's the same one we wrote last week.
What happens now though is as follows if I now try to do clang, hello.c enter,
hello.c enter, I actually get.
An error message, this one perhaps more cryptic than most.
Somehow or other I have this error Linker command failed with exit code one because of an undefined reference to get string.
Now in the past when we've seen undefined or really undeclared mentions of get string, the problem was just with missing this line.
This line is clearly here.
But the catch is I'm getting this error message now because when I run clang of hello.c,
I'm just assuming that lang knows where to find the CS50 version of Get string, and that is not the case.
Technically, if I want the compiler to compile this code for me, what I'm actually going to have to do is this.
Let me go back to uh my terminal window here and I'm gonna say clang.
Hello.c,
but I'm then going to specify LCS 50, which is cryptic at first glance, but this is telling the compiler to link in the CS 50 library so that it knows what the zeros and ones are that belong to the get string function.
Long story short, if I hit enter now, the error message has gone away.
If I type LS, I've still got a.
out, but it's.
New version thereof, and if I do a out now I see the new behavior where I can type in my name and see hello David.
Now this is getting a little stupid that I keep using A.
out.
We can change that as well.
In fact, these commands, as we're starting to see, support what are called command line arguments, and a lot of the programs we've run already take command line arguments when we run code space hello.c,
the so-called command line argument.
To code is hello.c.
When I run make hello, the command line argument to make is hello.
In other words, the command line arguments to a program are all of the words you're typing in your terminal after the name of the program itself, whether it's make or whether it's code or any.
Else, so this is to say what I just ran clang of hello.clCS50.
I was passing in two command line arguments hello.c,
which is the code I want to compile, and dash LCS 50, which means use the CS 50 library, please.
But I can add another to the mix.
I can actually do something like this whereby I do clang oh hello, then I can do hello.c.
And then dash LCS50 enter.
Now that too seems to work, and if I type LS, well, I've got all the same programs as before.
So let's go ahead and get rid of those to make clear what's going on.
I'm going to remove a out.
I'm going to remove hello, and just for good measure, I'll remove buggy as well, so that all I have left in this folder is source code.
So if I type LS, there's my two files.
Let's do this again, clang oh hello.
Hello.c-LCs 50.
Enter.
Now if I type LS, I don't see how a.
out anymore because apparently according to the documentation for Klang, the actual compiler, if you pass 0 as a command line argument followed by another word of your choice, you can name the program anything you want without having to resort to MV or clicking on it and typing a new name in manually.
So if I now do hello, I see the exact same version where it's just asking me for my name and then printing it out.
But long story short, the whole point of this exercise is that like running commands like this quickly gets very tedious.
You have to remember like the order in which to do it, what the command line argument.
I mean this is just stupid waste of time, typically, certainly in week one of the course they have to memorize these kinds of magical commands to get things working.
But for now, no, that when you run make.
It's essentially automating all of that for you and making it as simple semantically as make hello or make buggy, but what's really happening is the make command, because of the way we've configured CS50.dev for you, is doing all of this behind the scenes.
It's not that magical.
This just means change the file name to hello when you compile it.
This just means compile this code, and this just means use the CS 50 library.
Like that's all But that message about linking something in, there's there's something juicy going on there such that make is in fact helping us sort of solve a whole bunch of problems when we compile.
And in fact, let me propose that if we take a step back and look at some of the actual code that we're compiling, let's consider like what we actually mean by compiling.
Yes, it's the case that the compiler code means to go from source code.
To machine code, but technically there's a few more steps involved.
Technically, when you compile your code, that's sort of become the industry term of art that really is referring to 4 separate processes, all of which are happening in succession automatically, but each of which is doing a different thing.
So just once, let's walk through these these several steps.
So what is this pre-processing step?
So consider this program here which we wrote in brief last week.
We've got include standardio.H,
which is there because we want to be able to use print.
Ultimately we've then got a prototype for this meow function, and the meow function does this.
All it does is print out quote unquote meow followed by a new line, takes no input, returns no return values.
The main function now has a 4 loop, iterates 3 times each time calling the meow function, and we saw this already earlier today.
This line of code here, the so-called prototype, is necessary because we need to tell the compiler that meow exists before we actually use it here, especially if I don't get around to implementing it.
Until later, so this copy paste of that first line of code, a so-called prototype, solved that problem.
This is what the header files are essentially doing for us.
Before I use printF down here, the compiler needs to know what it is, what its inputs are, what its outputs are.
Turns out the prototype for printF is going to be in standardio.H,
and that's what that line of code has been doing for us all this time.
In fact, let's take a simpler example that we keep using here whereby I'm in.
Including CS50.
H and standard IO.h and I'm using the CS50 get string function to get someone's name and put it in a variable called name and then I'm printing out hello, such and such.
What's going on now when I preprocess this file by running make, which in turn runs clang?
Well, the compiler finds on the server's hard drive the file called CS50.
H goes inside and essentially copies and pastes its contents into my own code.
Meanwhile, Such that we get the prototype there for get string, and we haven't seen this yet, but it stands to reason that all this time using printF we've been passing in a prompt like what's your name?
and we've been getting back a string.
What's inside the parentheses we call is the input, what's before the function name is the output, the so-called return value.
What about standardio.h?
It's in that file that printF's prototype is.
So essentially what the compiler does when preprocessing this file is it finds standard IO.h somewhere on the server's heart.
Drive goes inside and copy and pastes those relevant lines of code into my code as well.
It's to avoid me having to do all of that myself, find the file, copy, paste it, or manually type out the prototype.
These preprocessor directives just automate all of that TDM.
So what this effectively has at the top of my code after the file has been preprocessed is all of those hash symbols followed by Inclu are changed to contain the actual contents of those.
Header files.
Now the compiler knows what GitString is all about and what printF is all about.
That then is the preprocessing step.
What does compiling technically mean?
Compiling means taking that pre-processed code, which again looks a little something like this, and convert it into something called assembly code, and we won't spend much time in this class on assembly code, but this is how programmers used to write code before there was C, before there was Python.
And Java and all of these other modern languages, programmers were writing code like this.
Before this existed, they were programming zeros and ones into the earliest of mainframe computers using punch cards and other technologies like literally sheets of paper with holes in them.
Not very fun, very tedious, so the world invented this.
Also not very fun, very tedious, so the world invented C.
Not that much fun, so the world invented Python and so forth that we continue to sort of evolve as a species with code, but The compiler technically takes your preprocessed source code and converts it into something that looks like this cryptic, and that's to be expected, but there are some familiar phrases.
There's mention of main, there's mention of get string, there's mention of print F.
and there's a bunch of other things move and push and XO and call and these other commands here.
These are the assembly instructions.
Those are the lowest level instructions that the CPU inside of a computer understands.
CPU is the central processing unit, the thing by Intel or AMD or Apple or other companies.
Those are the lowest level commands that the actual hardware inside of the computer understand.
It's just nice to be able to write words like main and for and print F than it would be to run these much more arcane commands that you'd have to look up in a manual.
So compiling just takes C code and makes it a lower.
Level type of code called assembly.
When I said a.
out means assembler output.
That's why inside of that file is essentially the output of an assembler.
All right, we're almost there.
What does it mean to assemble a program, which is step 3 of the compilation process?
That means converting assembly code to the actual zeros and ones we keep talking about.
So if the file is called hello.c,
when that file is a.
Assembled the assembly code becomes the zeros and ones for your code in hello.
C, but your code is not everything that composes your final program.
Your code from hello.c has to be combined with code from CS50's library, from the standard IO library that other humans wrote.
I and the team wrote the CS 50 code.
Other humans in the world wrote the print F code and standard IO.
So essentially.
Fourth and final step is to link all of those zeros and ones together.
Somewhere on the server there is not just the header file CS50.
H and Standardio.
H, but your code, hello.c,
hello.c, our codecs50.c,
our codecs50.c, and the code that contains ride's own implementation.
bit of a white lie.
It's technically not called standardio.C,
but the point remains ultimately the same.
So these files have already been compiled for you in advance.
This is your code.
What the assembly process does is it combines all of that into zeros and 1s, and then all three chunks of zeros and 1s are linked together.
So if you think back to when I tried compiling the code without LCS 50, there was some mention of link.
linking just means the computer did not know how to link your code with CS 50's code because we were missing LCS 50, which tells the compiler to go find it somewhere on the hard drive.
And the final step then of linking is to combine all of those zeros and ones into one bigger blob of zeros and ones, and that's What's inside your Hello program that you can execute.
So long story short, these 4 steps are what's been happening ever since the start of last week pre-processing, compiling, assembling, and linking.
But thankfully the world of programmers generally just treats all four of these steps as what we know now as.
Compiling, it's just a lot easier to say compile and not worry about those lower level details, but that might reveal better to you what all these error messages mean when you see hints of this kind of terminology.
Questions On any and all of that, from here on out, we're gonna go higher level than lower, yeah.
I, I, I don't get the part with the like when we're talking about um well I think it's the assembly process when you basically convert into zeros and ones um doesn't like across the multiple like the three different ones so the zeros and ones signify different things like one can signify text and the other can signify something else how does the computer know like what part what 8 bit corresponds to like which part?
Really good question.
How does the computer know which of those zeros and ones corresponds to data like numbers or strings of text or.
commands.
We're going to come back to that in week 4 of the class, but long story short, what we just saw on the screen is a big blob of zeros and ones actually follows some pattern where the bits up top represent a certain functionality.
The bits on the bottom represent something else, and they're organized into patterns.
So long story short, we'll come back to that, but they follow conventions.
It's not just a hot mess of like zeros and ones.
Other questions.
Correct, the pre-processing step goes into the header file and essentially copies and paste the contents of it into your own code so you don't have to waste time doing that manually yourself.
Other questions?
Just curiosity when you're talking about the um how it convert it to assembly code and you're saying that the CPUs.
Commands is the CPU that That into the binary Uh, no, the, so when you compile your code, you're going from the uh assembly code to the zeros and ones.
That, sorry, uh, when you compile, let me pull up the, the chart again.
When you compile your code, you're going from the C code to the assembly code, and the patterns you get when you see the assembly code are specific to a certain CPU.
So long story short, if you're designing software for iPhones or for Android devices or Macs or PCs, you're going to necessarily use a different compiler because given the same C code, you will get different assembly.
Instructions in the output and this is why you can't just take back in the day like a CD containing a program from a Mac and run it on a PC or vice versa because it's the wrong patterns of instructions.
But the reason why we have all of these annoying layers of complexity is because 14 different people can now implement the notion of compiling.
Someone can implement the preprocessor.
Someone can implement the compiler, the assembler, the linker, and you can actually collaborate.
By breaking things down into these quantized steps, but also you can do this step, this step, and then two different people can write compilers to actually write to output assembly code for like iPhones over here and Android devices over here, but all of us can still enjoy using the same language up here.
So there's a lot of reasons for this complexity.
Just understanding it is useful, but you're not going to need to use this sort of knowledge day to day.
But it's what enables so much of today's complexity nonetheless.
All right, so a bit of a flourish now as to what we've been doing with compiling.
Well, compiling is going ultimately from source code to machine code.
Couldn't you just kind of reverse the process, right?
If someone wrote really interesting software like Microsoft Word or Excel or something like that, well, when I buy it or download it, like I literally have a copy of all of those zeros and ones, couldn't I just kind of reverse this process and reverse engineer someone else's code by Decompiling it and this is genuinely a threat, and this comes up in matters of law and intellectual property because the zeros
and ones have to be accessible to you and to your computer so it's not a great feeling if someone with enough time and enough savvy could sort of reinvent Microsoft Word by just figuring out what all those zeros and ones mean.
However, it's sort of easier said than done to reverse engineer code from these zeros and ones.
For instance, this pattern of bits on the screen here did what did we say last week?
Silly quote, no normal person should be able to answer this, but I did say it before.
These zeros and ones print what?
It just prints out hello world, and I cannot glance at that and figure it out like off the top of my head, but if I know what architecture, what CPU this code has been compiled into and I pay attention in week 4 and know what the various layout of the zeros and ones are, I could painstakingly figure out what each of those patterns of zeros and 1 means
by breaking them into chunks of 8 or 16 or 30.
64, which are common units of measure that I alluded to last week.
Now that's going to take a crazy amount of time and the sort of presumption is that if you are smart enough and capable enough and have enough free time to do that, it would probably take you less time to just implement Microsoft Word the normal way and just rebuild the software.
It's going to take you more time to go in reverse than it would in the so-called forward direction, but there's other subtleties as well.
Inside of this code is not only commands like printF functions like printF, but suppose that it contained a loop, for instance, to print meow, meow, meow.
Well, we know already that you can use a for loop sometimes, or you can use a while loop, but they're functionally equivalent.
It's sort of a stylistic decision which one you use, whichever one you're more comfortable with, or maybe feels a little better design.
But you can't figure out from the zeros and ones whether or not it was a Y loop or a 4 loop because it just results in the same pattern of zeros and ones.
It's just a programmer's choice, which is to say you can't even perfectly reverse engineer everything because it's not going to be obvious from the zeros and ones what the source code originally looked like.
But again, the bigger deal breaker is if you have that much time and energy and savvy, just like.
Implement Microsoft Word itself.
Don't try to reverse the whole process, which is going to be much more painstaking and time consuming instead.
Now this is not true for all languages, and just as a teaser, in a few weeks' time when we talk about web programming in another language called JavaScript, it turns out that JavaScript source code is actually sent from web servers to web browsers, and you can look at the source code of any website on the internet, Harvard.edu.
Harvard.edu.
Book.com, gmail.com, it's going to be there, so not all languages, it turns out, are even compiled typically.
Sometimes the source code is just executed by the underlying computer.
So we're just scratching the surface of some of the implications of all of this in a little bit of time, let's take a look further under the hood at the actual memory, solve some other problems, but I think it's now time for Cheez It.
So let's go ahead and take a 10 minute break.
Snacks are now served.
See you in 10.
All right, we are back and up until now when we've been writing code, recall that we have to specify like what type of value you want to put in a variable.
Like that's why I had to go in and add string before the word name in my first bug today.
But it turns out C, as we've kind of seen already, has a whole bunch of these data types.
I rattled these off last week, Bull into long float double char string, but we'll consider for a moment just how much space each of these things takes up and see if we can help you see what the debugger was seeing earlier.
That is what is.
In memory.
So a bull, it turns out, actually takes up one bite, which is kind of stupid because technically a bull, true or false, really only needs one bit.
It just turns out that it's more efficient and easier to just use a whole bite, 8 bits, even though 7 of them are effectively unused.
So a bull will take up 1 bite even though it's just true and false.
An int recall uses 4 bytes.
So if you want to count really high with an int, the highest you can go is roughly 4 billion we've claimed, unless you want to represent negative numbers, in which case the highest is like 2 billion, because if Want to be able to count all the way down to -2 billion.
You've got to kind of split the difference.
A long meanwhile is twice that.
It uses 8 bytes, which is roughly 9 quadrillion possibilities, which is quite a few more than 4 billion.
That is if you want to include negative numbers as well.
Then we had floats which were real numbers with decimal points which speak to just how precise you can be with significant digits.
A float is 4 bytes by default, but a double gives you twice as many bits to play with, which gets you get lets you be more precise.
Even though at the end of the day whether you're using floats or doubles, floating point imprecision as we've seen is a fundamental problem for scientific, financial, and other types of computing where precision is ever so important.
A char meanwhile, at least as we've seen it, is a single byte using Asy characters specifically.
And then string I'll put as a question mark because a string totally depends on its length.
If you're storing high, that's like 12 bytes.
If you storing hello, that's like 5 bytes and so forth.
So string.
Things depend on how many characters you actually want to store inside of them.
So where does this go?
Well, here's a picture of a stick of memory, a dim, so to speak, whereby on this stick of memory, which is slid into your computer, your laptop, your desktop, or some other device, there's all these little black chips that essentially contain lots of room for zeros and ones.
It's somehow electronic, but inside of there are all of the zeros and ones that we can store data in.
So if we kind of zoom in on this, it stands to reason.
That for the sake of discussion, if this one chip represents like 1 gigabyte, 1 billion bytes, it stands to reason that we could slap some addresses on these bytes whereby we could say this is the first byte and this is the last bite, or more precisely this is by 0123 by 1 billion, and it doesn't matter if it's top down, left, right, or any other order we're just talking about this conceptually at the moment.
So in fact, let's go ahead and draw this really as a grid of memory, a sort of canvas that we.
Just use to store types of data like bows and ins and jars and floats and everything else.
If we are going to use 1 bye to store like a char, well, you might use just these 8 bits up here, 1 bye up here.
If you want to store an in, well, that's 4, you might use all 4 of these bytes necessarily contiguous.
You can't just choose random bits all over the place when you have a 4 byte value like an in, they're all going to be contiguous back to back to back in memory like this.
But if you've got a long or a double, you might.
Use 8 bytes instead.
So truly when you store a value in memory, whether it's a little number or a big number, all you're doing is using some of the zeros and ones physically in the computer's hardware somewhere and letting it permute them, turn them on and off to represent that value you're trying to store.
All right, so let's go ahead and abstract away from the hardware though, and let's just start to think of this grid of memory sort of in zoomed in form and consider at a lower level what is actually being stored inside.
of here.
For instance, suppose that we've got some code like this containing 3 scores on like problem sets.
You've got a 72 on one of them, a 73 on another, and a 33 on the third.
I've deliberately chosen our old friends 72, 73, 33, which we call spell high or together in the context of colors is like a shade of yellow just so that we're not adding some new random numbers to the mix.
These are our old friends three integers.
Well, let's use these in a program.
Let me go over to BS code here and let me create with code.
A program called scores.c that's just going to let me quickly calculate my average score on my problem sets.
I'm going to go ahead and include, as we often do, standard IO.
H at the top.
I'm going to do in Main, void after that, and then inside of my curly braces, I'm going to do exactly those sample lines of code.
My first.
was let's say a 72.
My second score was 73, and my third score was 33.
So I've declared three variables, one for each of my problem set scores.
Now let's calculate the average.
So print F quote unquote, average colon, just so I know what I'm printing.
And now I'm going to go ahead and use maybe percent I backslash N and then what I'm going to pass in is a bit of math.
So to compute an average it's just score 1 plus score 2 plus score 3 divided by 3, and I put the scores, the numerator.
Parentheses just like in grade school like I need to do that operation first before doing the division so just like math class, semicolon at the end to finish my thought.
Let's see how this goes.
Make scores enterslash scores and it would seem that my average across these three problem sets is 72, which I, which is great, but I don't think that's actually what I want here.
What have I done wrong?
It's unintentional, yeah.
Yeah, I'm kind of being a little generous with myself here.
I didn't really factor in my worst score, so that was accidental.
So now let me do this correctly.
Make scores slash scores, and now, OK, my average is 59, but I, I beg to differ.
I'd like to quibble.
My score technically, I think mathematically should really be 5913.
I'm kind of being cheated that third of a point.
So what's going on here?
Why am I only seeing 59 and not my full grade?
So when Perfect, because I'm using integers.
When I divide by 3, it's going to truncate everything after the decimal point, which we touched on at the very end of week one, which is an issue with just truncation in general.
So one approach to fix this, I could change my percent I to F, which is the format code it turns out for a float, and that is what I want to print.
So let's see if that fix alone is enough.
Make scores, oops, it's not.
I got ahead of myself there and let me scroll up to the error.
Format specifies double, but the argument has type in.
Turns out you can use percent F for doubles as well, so that's why I'm seeing double, even though I intended to float in this case.
So there's a problem here.
I, the argument has Type in even though I'm passing in percent, you're seeing mention of percent D here, which is an alternative to percent I.
We typically encourage you to use percent I because I for integer, but there is a that is not the solution to this problem because I want my third of a point back.
So how could I go about fixing this?
Well, the fundamental problem here is that I'm trying to format an integer as a float or even as a double.
Well, I need to convert these scores to floats instead, so I could go in.
And change this to float, this to float, this to float, and heck, just to be super precise, I could add a 0.0 on the end of each of them, just to make super clear these are floats, but there's another way, I could, for instance, Uh, simply convert my denominator to 3.0 because it turns out so long as you involve like one float in your math, the whole thing is going to get promoted, so to speak,
to floating point values instead of integers.
I don't have to convert all of them.
So I think now if I do make scores, scores now, ah, there's my third of a percent, the third of a point back.
There's another way to do this just as an aside, and we'll see.
This again down the line if you really want to stick with 3 because it's a little weird just semantically to divide by 3.0, like that's an implementation detail, but you're truly computing an average of 3 things.
You can technically cast the 3 to a float in parentheses you can specify the data type that you want to convert another data type to, and this too should make the compiler happy.
A ha, scores.
I get.
Roughly the same answer.
We're seeing some floating point in precision though nonetheless, but that too would achieve the goal here.
But a short, that's all just a function of floating point arithmetic there.
So what's going on now actually in the computer's memory?
Let me revert back to the simpler one with just 00 there and let me propose that we consider where these three things are in memory.
Well, if we treat this as my grid or canvas of memory, who knows where they're End up, but for the sake of discussion, let's assume that 72 ended up in the top left of my computer's memory.
I've drawn it to scale, so to speak, and that this score 1 variable is clearly taking up 4 bytes of memory, and it's an inch, and that's typically how many bytes are used on systems. Technically it depends on the exact system you're using, but nowadays it's pretty reasonable to assume that an integer will be 32 bits on most modern systems. Score 2 is probably over there so.
Score 3 is probably over there, so I'm using 12 bytes total, 4 bytes for each of these values.
All right, so that's really all that's going on underneath the hood.
I don't have to worry about this.
The compiler essentially figured out for me where to put all of these things in memory, but what really is in memory, well, technically each of these variables, if it's, if it's composed of 32 bits, is really just a pattern of literally 320s and 1s, and I figured out the pattern here.
I crammed them all into the.
Space there, but you see here 3 patterns of 32 bits which collectively compose those numbers there.
But let's consider design now.
In terms of my code, this gets the job done.
It's not that bad or big of a deal for just calculating the average of 3 scores, but this should also start to rub you the wrong way this week onward when it comes to design.
Like this is correct, especially now that I clamored back my third of a point.
But this is bad design using the variables in this way.
Why might you think?
Yeah.
Yeah, I'm going to have to type in each score manually with each passing week when I get the 4th problem set in the 5th.
I mean, surely people who came before us came up with a better way to solve this problem than like manually create 10 variables, 20 variables, whatever it is by the end of the semester.
It just feels a little sloppy and indeed that's often the way to think about the quality of something is designed.
Think about the extreme.
If you don't have 3.
Scores, but 30 or 300, is this really going to be the best way to do it?
And if you feel like, no, no, there's got to be a better way, odds are there are certainly if the language itself is well designed.
So let's consider how else we might go about solving this.
Well, it turns out we can treat our canvas of memory, that grid of bytes, into chunks of memory known as arrays.
An array is a chunk of contiguous memory back to back to back, whereby if you want to store 3 things, you ask the computer for a chunk of memory for 3 things.
If you want 30, you ask for one chunk of size 30.
If you want even more, you ask for a chunk of size 300.
Chunk is not a term of art.
I'm just using it to colloquially explain what an array actually is.
It's a chunk or a block of memory.
That is back to back to back to back.
So what does this mean in practice?
Well, it means that we can introduce a little bit of new syntax and see.
If I want to create one variable instead of 3, and certainly 1 variable instead of 30, I can use syntax like this.
Hey compiler, give me a variable called scores plural.
Give me room for 3 integers therein.
So it's a little bit of a weird syntax, but you specify the type of all.
Of the values in the array you specify the name of the array scores in this case, and I pluralized it just semantically because it makes more sense than calling it score now and then in square brackets, so to speak, you specify how many integers you want to put into that chunk of memory.
So this one line of code now will essentially give me 12 bytes automatically, but they'll all be referable by the name scores plural.
So let's go ahead and weave this into some code.
As follows.
Let me go back to VS code here, clear my terminal, and now let's just whip up the same kind of program but get rid of these three independent variables.
And instead let's go ahead and just say in scores plural bracket 3.
Now I need a way to initialize the three values, but this I can do too.
It turns out that if I want to put 3 values in this, I just need slightly new syntax.
I can say scores 0 equals 72 scores.
1 equals 73 scores 2 equals 33.
So it's not all that different from having three variables, but now I technically have one variable and I am indexing into it at different locations locations 01, and 2, and it's a 0 because we always in computing start counting from 0.
So I do scores bracket 0 is going to be my 72 problem set.
Scores 1 is my 73 problem set, and scores 2.
Is my weakest my 33P sets.
Now my syntax down here has to change because there are no more score 12 score 3 variables, but there are scores 0 plus scores 1 plus and notice what VS Code is trying to do for me.
It's saving me some keystrokes as I type in scores and type 1 single bracket.
Notice it finishes my thought for me and magically puts the cursor where I want it so I can put the two.
Right there and generally save on keystrokes, but that has nothing to do with C.
It just has to do with VS code trying to be now helpful.
So I think now if I go down here and do make scores slash scores, we get the same answer, but it's arguably better designed because I now have one variable instead of 3, let alone many more.
And in fact, if I wanted to change the total number of scores, I can just change what's in that initial square bracket.
So if we consider what's going on now, if we look at the computer's memory, it's the same exact layout, but there's no more 3 variable names.
There's one scores 0, scores 1, and scores 2.
And notice here ever more important in arrays values are indeed contiguous back to.
Back to back.
Now the screen is only so wide, so they kind of wrap around to the next row of bytes, but the computer has no notion of up, down, left, right.
I mean it's just a piece of hardware that's got lots of bytes available that can be addressed from the first bite all the way down to the last bite.
The wrapping is just a visual artifact on this here screen.
All right, so if I've done this now, maybe we can make this program a little more dynamic than just hard coding in my scores.
Let me go in and add the CS 50 header library so that we could also use, for instance, like get in and start getting these scores dynamically.
So I could do get in and I could prompt the user for a score.
I could use get in again and I can prompt the user for another PET score.
I can use get into a third time.
And prompt the user for a third such score and then pretty much the rest of my code can stay the same.
Let's do make scores again, scores 72, 73, 33, and now my program's a little more interactive.
Like this doesn't work for just my 3 scores.
It can work for anyone's scores in the class.
Now this too hints of bad design.
I like my introduction of the array because I now have one variable instead of 3, but what now might rub you the wrong way among lines 78, and 9.
Yeah.
It's repetitive.
I mean, I typed it manually, but I might as well have just copied and pasted like literally the same thing.
So what's a candidate for fixing this?
Like what programming construct might clean this up, yeah.
Yeah, we could use a 4 loop or a Y loop or whatever, but a 4 loop would get the job done, and that's often my go to.
So let's do that instead.
Let's go under my declaration of the array and do 4 in I equals 0, I less than 3, I + plus, which we keep seeing again and again.
Uh, now how do I index into the array at the right location?
Well, here's where the square brackets are kind of powerful.
I can just say my scores array at the location I should get an in.
From the user as follows.
So now I'm using get it once inside of a loop, but because I keeps getting incremented, as we've done many a time now for meowing and other goals, I'm putting the first one at location 0.
Why?
Because I is initialized to 0.
I'm putting the second one at location 1.
Why?
Because I'm going to + + or increment I on the next iteration, then the next iteration.
So this has the ultimate effect of putting these three scores at location 01, and 2.
Instead of me having to type all of that out manually, now I don't love how I've done this still.
If we really want to nitpick, this solves the problem correctly, but it's kind of got a poor design decision still.
It's got a magic number, as people say.
What is the magic number here and why is it bad?
Yeah, over here.
Yeah, it was a little soft, but I think the number 3 is hard coded in two places.
We've got it on line 6, which is the size of the array, and then again on line 7, which is how many times I want to integrate.
But those are the exact same concepts, but it's on the honor system that I typed the number 3 correctly both times.
So I think we can fix this a little better.
I could do something like in N equals 3, and then I could use N here and then I could use N here so that now I only change it in.
One place if your eyes are wandering to the bottom of the program, there's still a problem here because I've still hard coded 01 and 2, but we'll come back to that.
But this is arguably a little better.
But let's talk a little bit about style.
Typically when you have a con, typically when you've got a variable that should not change its value, we saw last week that we should declare it as constant, and the trick there is to literally just write constant for short in front of the type of the variable, and now it should not be changeable by you, by a colleague, a collaborator, or the like.
But typically too by convention stylistically to make visually clear to another programmer that this is a constant, it's convention also to capitalize constant, so to actually use like a capital N here in all places just to make clear visually that there's something interesting about this variable and indeed it is a constant that cannot be changed.
All right, with that refinement, I don't think we've really improved the program fundamentally.
I think we're going to need to do a bit more work to do this really well.
So I'm going to do this a little quickly, but mostly to make the point that we can make this indeed more dynamic.
So let me hide my terminal window there.
Let me go ahead now and get the scores, as I already am as follows here, and let me go ahead and uh assume for the sake of time.
That we have a function that exists already called average and I simply want to pass in to that average function the scores whose average I want to calculate.
So average does not exist off the shelf.
Like I can't just use an existing library for it.
I'm going to have to implement this thing myself, but how?
All right, well, let's go ahead and do this at the top of my file.
I'm going to go.
Head and compute or define a function called average uh that takes in what an array of numbers.
So this syntax is going to be a bit new, but the way I do this is in say array bracket 0 or array sounds a little too generic.
Let's just call it numbers for instance here.
So that says my average function is going to take as an argument in array of numbers.
This average function though should return a value too, and it should return what type of value from what we've seen thus far.
A number of floats specifically.
It could be in, but then I'm going to get short changed by 3 of a point potentially.
So I think I wanted to return a float or if you really want precision, then you could return a double just to be really nitpicky, but that seems excessive here.
All right, well now inside of my average function, how can I calculate the average?
Well, this is just kind of like a math thing, so I could declare a variable called sum and set it equal to 0.
I could then have a 4 loop inside of this function for in I gets 0, I less than.
Huh, uh, I'm gonna come back to this.
The number of numbers in the array, and then I'm gonna do I plus plus, and then on each iteration I'm gonna do sum equals whatever the current sum is plus whatever is in the numbers array at that location.
So I'm going a little quickly, but again I'm just applying the same lesson learned numbers is my array.
Numbers 1 means go to the I location in there, but if my loop starts at 0, that means go to location 0 and then 1 and then 2, and heck, if there's more scores in this array, it's just going to keep going on up from there because of the plus plus.
But I hesitated here for a couple of reasons, so I put it to do here, which is not a thing that's a note to self.
How far do I iterate?
Well, if you've come into CS 150 with programming before, you can usually just ask an array, AKA a vector, what its length is in Java and in Python and the like.
You can't do that in C.
So if I want to know what the length is of this array, I've got to have the function tell me.
So I'm gonna to additionally propose that this average function can't just take the array, it's also going to have to take another argument, a second input for instance called length that tells me how long it is.
And then down here, which is where we started the story when I used this so-called average function.
I'm gonna have to tell the average function by passing in N how many numbers are in that array just because this is annoying that you have to pass in not only the array but also its size separately.
That's the way it's done in C.
More recent languages have improved upon this so you can just figure out what the length of the array.
is as we'll see in a few weeks in Python.
All right, back to the average function at hand.
I think we're almost there.
This is a little unnecessarily verbose.
Recall that we can tighten this up by just doing plus equals whatever is in numbers 1.
That's just tightening it up.
It's syntactic sugar, so to speak.
And then the last thing I'm going to do in my average function is what actually calculate the average.
So what is the average?
It's just the numerator, like the sum of all of the scores divided by the total number of all of the scores.
Well, I've got the sum, so I think I just want to do some divided by what to get the actual average now.
Yeah.
Exactly sum divided by length will give me the average because the sum is the numerator effectively all of the scores added together and the denominator is the length.
How many numbers were there actually?
Now I can't just write this math expression here.
If this is going to be my function's return value, and we've done this once or twice before, I literally say.
And my average function return this value so it hands back the work.
I could use printF and just print it on the screen, but I don't want that visual side effect.
I want to hand it back so that on line 23 I can simply calculate the average of those end scores and let print F use it as the value of that format code F.
All right, uh, I think we are in reasonably good shape.
Let me cross my fingers now and hope I didn't screw this up.
Make scores.
OK, slash scores, how many do we want to do?
So we'll do 72, 73, 33, enter, and there is, oh, so close, average.
I've had a regression.
I've made the same mistake again just in a different way.
I think I saw your hand go up.
Why am I getting 59 and I'm not getting my third of a point?
Yeah, I in this return line on line 11 right now, I'm again stupidly doing integer divided by integer that will make us suffer from inte integer truncation because if you're returning an integer, there's no room for the decimal point or any numbers thereafter.
So how do we fix this?
Well, I could change the sum to float like that would.
Reasonable.
So then I do a float divided by the length.
I could do my casting trick like convert the flow, the length to a float just for the sake of floating point arithmetic.
There's a bunch of ways to solve this, but I think I'll go with this one now.
Let me now do make scores again 72, 73, 33.
And now I've got, albeit with some imprecision, I think enough precision certainly for like a college grade in this case 59.333 and so forth.
OK, so what are the things to actually care about here?
So there's a decent amount of code here.
Most of it is sort of stuff we've seen before, but the interesting parts I would propose are this.
When you create your own function that takes an array as input, you have to take as input the length of the array.
You're not going to be able to figure it out correctly, as in modern newer languages.
You also need, of course, to pass in the array itself.
How do you pass in an array?
Well, when you're defining the function, you specify the type of values in the array.
Whatever you want to name the array inside of this function, and then you use empty square brackets like this.
You don't have to put in or some other number there.
All you need to tell the compiler is that my average function is going to take some array of values, specifically this many.
You don't put it inside the square brackets there.
Then when I use it now it's just.
The now familiar syntax when you want to index into your array, that is go to location 0 or 1 or 2, you just use square bracket notation here.
But the array itself, recall, was actually created in Maine when I did this line of code here where I said give me an array called scores, each of whose values it's going to be an in and I want this many of them.
And so maybe the final flourish that I'll add here just to be sort of nitpicky is I keep saying that main should really go at the top, fine, no big deal.
Let me highlight my average function, move it to the bottom of my file just because, and then And only then I'll copy and paste that first line, the so-called prototype, so that lang doesn't freak out by not knowing what the average function is.
So in short, there's seemingly a bunch of complexity here, but all were the only thing that's really new in this one example is this is how you pass to a function in a array that already exists elsewhere, not by its name, but by the square brackets there.
OK.
Questions on arrays or any of this new syntax, yeah.
When did the whole like OK.
Said that we could store as a float and instead of saying 3.0 is a float, you just said because 3.0 is a float.
How does it know it's not a double?
Oh, how does it know it's not a double?
So by default, if you just type a number like 3.0 into your code, it will be assumed to be a double just because, um, raw values, literal numbers with a decimal point will be treated by the compiler as doubles and be allocated 64 bits.
percentage Uh, uh, just because, like the world did not need to create a new format code like percent D is not double.
percent D is decimal integer, but don't worry about that.
We tend not to talk about it too much in class.
percent I is integer, percent F is float, but percent F is also double.
And this is not consistent because what's a long percent LLI what did I say last week?
percent LLI gives you a long integer.
It's just a mess that's, there's no good reason for this other than historical baggage.
Sure, I'm not sure if that's reassuring, but all right, so, um, OK, let's use this knowledge for like something useful now and actually tease apart what is how we can use these these skills for good and to better understand what's going on inside of the computer as follows.
Let me go over to our grid of memory and this time let's not store some numbers, but let's store like these three lines of code.
These three variables, so 3 chars, even though you know where this is going, like this is not good design because I've got 3 stupidly named variables C1, C2, C3.
But let's make a point first.
The first variable's value is quote unquote h.
Second is I.
Third is exclamation point.
Why though am I using single quotes suddenly instead of double quotes?
It's a character.chars are single quotes, strings are double quotes, and we'll see the distinction why in a moment.
So for instance, if this is my grid of memory and this program contains just 3 variables, each of them a char, odds are they'll end up like this in memory C1, C2, C3, H.
assuming there's nothing else going on in my program, they're just going to end up being back to back to back in this way, even though it might not, uh, in this way.
So what does this really mean?
is going on.
Well, let's go ahead and poke around.
Let me go back to BS code here.
Let's close scores.c,
reopen my terminal, and let's create a new program called Hi.c and just do something playful.
So let me include standard IO.
H at the top.
Let me do in main void after that, and inside of my curly braces, let's just repeat this C1 equals H in caps, char C2 equals I in caps, and then char C3 equals.
In uh in exclamation point, that's all.
Now let's actually poke around and see what's inside the computer's memory.
So I could do something like this.
I could print F, for instance, C, C, C backslash N and C turns out means character.
So what do I want to plug in C1, C2, and C3 semicolon.
So let's go ahead and do this.
Make hi enter high and voila, there's my HI exclamation point.
There's no magic here.
Like I'm literally just printing out three HR variables.
I can, I don't need the spaces.
If I want to get rid of those spaces between the word, I can remake this, make hi hi, and now we're back in business.
But here's where an understanding of types can give you a bit of power and sort.
Satiate some curiosity.
What if I change my percent C to peri peri peri.
So into into into.
Well, turns out that a char is really just a number because it's an ACI value from 0 to 255.
So there's nothing stopping me from telling the compiler, don't print these as chars, print them as integers.
So let's do make high high, E.
And that's a little cryptic.
It looks like it's saying 727,333, but no, let me add those spaces back in between each of those placeholders.
Ma high again.
There are our old friends 72, 73, 33.
It is not necessary in this case to say in.
In in because the compiler is smart enough and printF is smart enough that if you hand it a value that happens to be a char, it knows already it's going to be an integer essentially, so you don't even need to bother explicitly casting it this way.
We're essentially implicitly casting it to an integer by using those format codes as such.
All right.
So that just proves that what I've claimed is the case that there is this equivalence between characters and numbers is actually the case inside of the computer's memory.
So even though you're storing HI exclamation point, technically you're storing three patterns of 8 bits each that give you these decimal numbers 72, 73, and 33 or specifically these patterns here.
All right then, what is a string?
And this is where things get a little more interesting.
A string, as we've used it, is like a whole word or a phrase or when we started class today, like a whole paragraph of text.
So that's multiple values.
Now why is that interesting for us potentially?
Well, let's go ahead and write one line of code as a string.
So here for instance is one line of code with a string.
Let's go ahead and put that into my program.
So I'm going to go back to VS code here and clear my terminal, and I'm going to go ahead and delete all of this code here for a moment.
And I'm going to do something like this strings equals quote unquote high with with double quotes now.
And now just like in week one, I'm going to print out pers backslash N and print out the value of S.
Per earlier, because string is technically one of our training wheels for just a few weeks, I'm going to additionally include CS 50.
H at the top so that the compiler knows about what this word is string.
All right, let's go into the terminal.
Make high high enter, and we're back in business, printing that out now as an entire string.
Well, what's going on inside of the computer's memory this time?
Well, I still have HI exclamation point, but it's a string now.
Well, it turns out the way that's going to be laid out in the computer's memory is exactly like before.
There's no mention of C1, C2, C3 because those variables don't exist.
There's just one variables, but it's referring to 3 bytes of memory, it would seem HI point, and you can kind of see where this is going, like a string as a spoiler turns out it's actually just what?
It's just going to be an array of characters, hence the dots we're trying to connect today.
So at the moment though, this is a single variables a string, the value of which is HI.
But you know what, if it is in fact an array, I bet we can start playing around with our new square bracket notation and see as much in our.
Actual code.
So in fact, let me go ahead and do this.
In VS code now.
Let's not use percent S.
Let's use C, C, and C 3 times.
Then instead of just S, let's print it out like it is an arrays 0, S1, S2.
Let's go back to VS code, uh, my terminal and BS code, make high high and nothing has changed, but I'm printing it out now one character at a time because I understand what's going on underneath the hood in this case.
I can actually see these values now.
Let's go ahead and change the percent C to I and at a space just so it's easier to read percent I space I space.
Don't need my casts in parentheses because printF is smart enough to do this for me.
Make high again.lahi.
There again is my 72 73 33.
However, that came from the mere fact that I put in double quotes.
So what's really happening here is it seems that a string is indeed just an array of characters.
But how does the computer know when doing percents know what to actually print?
In other words, it stands to reason that eventually if I've got more variables, more code, there's going to be other stuff in the computer's memory.
Why does print F know when using percent S to Stop here and not just keep printing characters that are over here, especially if I did have more variables and more stuff in memory.
Well, let's take a look at what's just past the end of this array.
Let's go back to VS code and now let's get a little crazy and add in a 4th I.
And even though this shouldn't exist, let's do S3, which even though it's the number 3, it's the 4th location, but H is only 3 values.
So let's look 1 location past the end of this array.
Make.
Interesting.
It seems, and maybe it's just luck, good or bad, that the 4th bye in the computer's memory seems to be a 0.
Well, that's actually very much by design.
It turns out if we look a little further by convention, what the compiler will do for us automatically is terminate, that is end any string we put in double quotes with a pattern of 8.
er bits more succinctly it's just the number 0 because if you do the math where you've got 80s, it gives you zero and decimal or more technically the way it's typically written is this because it's not like the number 0 that we want to see on the screen.
Backslash 0 similar to backslash N is sort of a special escape character.
This just means literally 80 bits, not the number 0 that you might see in a phone number or something like that.
So even though we said strings equals high with an exclamation point, seemingly 3 characters, how many bytes does a string of length 3 actually seem to take up in memory?
It's actually gonna be 4, and this happens automatically.
That's what the double quotes are doing for you.
They're telling the compiler this is not just a single character, this is a sequence of characters.
Please be sure to terminate it for me automatically with a special pattern of 80 bits, and that special pattern of 80 bits actually has a name.
It's the so called.
Null character or NUL for short.
The null character is just a bite of zero bits, and it represents the end of a string.
You've actually seen it before, if super briefly, two weeks ago.
He was our AI chart, and we focus mostly on like this column here and this column here, and then we looked at the exclamation point over here, but all this time over here, Asky character 0 is null, NUL, which just means that's how you pronounce all 80 bits.
It's been there this whole time.
So why is it done this way?
Well, how is the computer actually printing something out in memory?
Well, it needs to know where to stop.
PrintF is pretty stupid.
Odds are inside a print up there's just a loop that starts printing the first character, the next character, the next character, and it's looking for the end of the string.
Why?
Well, consider what might happen.
Suppose you've got a program that has not just one string, but 2, for instance, two strings like this.
So in fact, let me go back to.
code here, clear my terminal, and let's just make this program a little more interesting for a moment.
String T equals by for instance and then down here let's do two print Fs percents, backslash N and print outs.
Print F pers backslash N print out T.
Now to be clear, percents means string placeholder.
T and S are just also the names of the variables.
There's no percent T that we want to use here.
All right, let me go down to my terminal, make high high, and voila, I get high and by just like you would have expected last week.
But what's going on inside of the computer's memory?
Well, insofar as I've asked it to create two variables S and T like this.
Odds are what's happening in the computer's memory is high is ending up here, a.
A S T, because there's nothing else in this program, it's probably going to end up here, BE, but it wraps on this particular screen.
T is taking up 12345 bytes total, just as high it's taking up 4 bytes total because the compiler is automatically adding for me the backslash zero, the null character to make clear to other functions where this string ends.
So what does this mean in real terms and why is it 0?
Why is it 0?
Like, uh, just because like at the end of the day all we have is bits.
We've got 8 bits to work with for chars.
You've got to pick some pattern.
We could have chosen all ones.
We could have chosen all zeros.
We could have chosen something arbitrary.
A bunch of humans in a room years ago decided.
8 zeros will mean the null character.
That's the special character we will use to terminate strings in this way.
Well, what does that mean with our new syntax?
Well, it means we can poke around with strings as well.
So even though that first variable is S and that second one is T, you could technically poke around and access S 0 and 1 and 2 and 3, T 0123 and 4, and so forth.
So in fact, if I wanted to dive in deeply there.
And actually see that.
Well, let me go ahead and do this back in VS code here.
Let me make a refinement here.
I've now got my two strings here.
I could go and for instance down here just like before, C, C, C, C, C, C, C, and if I then do S 0.
Uh, S1, S2, oops, 2, and then down here T 0 T1, T2, T3, and I'm doing that only because the word by is longer than the word hi.
If I do make hi.i.
Same principles work even in this context here.
But let's add an interesting twist just because if I have these values in memory here.
As follows.
Well, it's kind of if I've got two words in memory, I could use them in an array too instead of having like S and T or word 1 and 2.
I can actually put strings in an array too.
So let's go ahead and do this.
Let me go back to VS code and just for fun now, let's go ahead and do this.
Give me an array called Words that's going to fit two strings, then.
In the first words, words 0 put high, then in words 1 put by.
The only thing new here is that I'm making an array of strings now instead of an array of ins, but all of the syntax is exactly the same.
How can I go about printing these things?
Well, just as before, I can do print Fs backslash N and print out words 0.
Then I can do print Fos.
Backslash N words 1.
Again, I'm just sort of applying the same simple syntax that we saw before I again of the 6th version of this program, right?
I'm just sort of jumping through hoops and tactically to demonstrate that these are just different lenses through which to look at the exact.
Same idea.
And while a normal person would not do this, we could think about what's really going on in memory with arrays of words when those words themselves are arrays of characters because a word is just a string.
So this code here gives us something like this in memory in that program a moment ago this is words 0, this is words 1.
The only thing that's different is I'm not calling them SNT.
I've given them one name.
With two locations, 0 and 1.
Well, if each of these values is itself a string, well, you said earlier that a string is just an array, so we can actually think of these two strings, even though the syntax is getting a little crazy.
Using two sets of square bracket notation where I can index into my array of words and then index into the individual letters of that word by just using more square brackets.
And again this is just to demonstrate a point not because a normal person would do this, but if I go back to VS code instead of printing out these two strings, why don't I do something like this?
Print F C C C backslash N.
Then let's print out the first word, but the first character they're in.
Let's print out the first word, but the second character they're in, the first word, but the third character they're in, and even though I'm saying 3rd and 2nd and 1st, it's 21 and 0 respectively because we start counting at 0.
And then lastly here we can print out the second word C, C, C, C, backslash N then words bracket, how do I get to the second word in this array?
Words bracket 1, the first character they're in.
Words bracket 1, the second character they're in.
Words bracket 1, the third character they're in words bracket 1.
The last character therein and again I'm this is just to demonstrate a point, but if I do make high now high, we have full control over everything that's going on.
If you now do agree and understand that an array can be indexed into with square bracket notation, as can a string because a string is itself just an array.
Strings are arrays for today's purposes then.
Questions on any and all of these tricks.
No?
All right, yeah, in front.
about OK.
Like, with like the full average in in in in in number, is that, is that establishing in the right?
like how are you?
How do you like establish?
How do you establish or create an array?
Well, in the context of this program, if I go back to VS code, line 6 here gives me an array of size 2.
An array of two strings if you will.
The previous example we were playing with, which was my scores, oops, wrong program, wrong file.
If I open up scores.c as before, this line here, line 9, gives me an array of N integers.
So that is what establishes or creates the array in memory.
You specify a name, the size, and the type.
That's all.
And the only thing that's new today again is the square bracket notation, which in this context creates an array of that size, but once it exists, you can then access that chunk of memory by using square brackets as well.
Other questions on a raise, yeah, in front.
Or do you need to go in index index.
Good question.
Do you need to go index by index to put things inside of an array?
Short answer no.
So let me open up again scores.c from before, and what I could have done in an earlier version of my program would be something like this.
I could have done 72, 73, 33, and I deliberately didn't show this because I didn't wanna add too much complexity, but.
You can use curly braces in this new way and initialize the array in one line, and in that case you don't even need to specify the size because the compiler is not an idiot.
It can figure out that if you've got 3 numbers on the right, it knows that it only needs 3 elements on the left to put them into.
But let me undo that and leave it just as I did, but short answer yes, you can statically initialize an array if you know all of the values up front and not when using get in in that case.
All right, so if you're on board with the idea that all a string is, is an array and that array is always null terminated, we can now.
Use that knowledge to solve some simple problems and problems that others have already solved before us.
So let me go ahead and close that file in VS code.
Let me go ahead and open up another program here called length.C
and let's just play around with the length of strings as follows.
Let me include the CS 50 library at the top.
Let me include standard IO after that.
Let me do into Main void after that, and then inside of Main, let's prompt the user for their name.
By using get string and just say name Colin today and then after that, let's go ahead and figure out the length of the person's name like D A V I D.
I should get the answer of 5 and K E L L Y, we should get the answer of 5 and hopefully for a longer or shorter name, we'll get the correct answer as well.
So how can I go about counting.
The number of characters in a string.
Well, the string is just an array, and that array ends with the null character.
There's a bunch of ways we can do this, but let me go ahead and do this.
Let me create a variable called N which eventually will contain the length of the name, and I'm going to set it equal to 0 because I don't know anything yet about the length.
Then I can do this with a 4 loop, but I prefer this time to use a while loop.
I'm gonna say the following while the person's name.
At that location does not equal backslash zero, go ahead and add 1 to the value of N.
And then after all of this, go ahead and print out with percent I backslash N the value of N.
So what's going on here, this is easier said when you know already where you want to go with it, but with practice, you too can bang this out pretty quickly.
N is going to contain the length of my string.
I have in my loop here a boolean expression that's just asking the question, Does name at the current value of N not equal the null character?
In other words, you're asking yourself, Is this character null?
Is this character null?
Is this character null, is this character null?
And if not, you keep going, you keep going.
And this is kind of a clever trick because I'm using N.
And incrementing it inside the loop.
So when I look at D, that's not equal to backslash zero, so I increment N.
Now N is 1, so I look at name bracket 1.
What's it name bracket 1?
If it's my name, A A does not equals 0, so it increments N.
What's at location?
D A V I D V V does not equal backslash n.
So we repeat with I, we repeat with D, and then we get to the end of my name, which is the null character because the get string function and C put it there automatically for me.
The null character does equals zero and does not get incremented any more time.
So at this point in the story on line 13, N is still 5 because I have not counted the new the null character.
So I hope I will see 5 on the screen.
This is just kind of a very mechanical way of checking, checking, checking, checking, trying to figure out through inference how long the string is because it's as long as it takes to get to that backslash zero, the null character.
So let's do make length enters length.
Type in my name David and I indeed get 5.
Let's go ahead and do length.
Kelly, I indeed get 5 and hopefully for shorter and longer names I'm going to get the exact same thing too.
In fact, we can try a corner case length enter.
Let's not give it a name at all.
If I just hit enter here, what should the length of the person's name be?
0, which is not incorrect.
It's literally true, but that's because we're going to get back essentially quote unquote, but even though it's quote unquote in the computer's memory, it's still going to take up one bite because the get string function will still put null at the end of the string even if it's got no characters therein.
So it turns out.
This is not something you need to do frequently like initializing a variable, using a loop like this.
It turns out there are better solutions to this problem.
You do not need to reinvent this wheel yourself because it turns out in addition to standard IO.H and CS 50.
H and as you probably saw on Problemset one, math.
H and perhaps others, there are other libraries out there, namely the string library itself.
In fact, if you go into the CS50 manual, you can look up the documentation for a header file called string.H which contains.
for that is prototypes for a whole bunch of helpful functions.
In fact, the manual pages for it are at this URL here.
The most important function, and the one we're going to use so often for the next few weeks is wonderfully called Stirlang for string length.
Someone else literally decades ago wrote the code that essentially looks quite like this but packaged it up in a function that you and I can use so we don't have to jump through these stupid hoops just to count the length of a string.
We can just ask the string length function.
What the length of a string is, but odds are, if we looked at the C code that someone wrote decades ago, it would look indeed quite like this.
So how can I simplify this program?
Well, I can get rid of all of this code here.
I can include string.
H at the top of my file, and then I quite simply.
Do something like this in length equals stir lang of name.
That's going to put in the variable length, actually, let's be consistent in N equals stir length of name, and then on line 9, let's print it out.
Let's try this.
Make length length David.
OK, Kelly, OK, and no one.
And 0, it seems to now be working.
So this is a wheel we do not need to reinvent and frankly now in a matter of design I don't really need the variable N anymore.
Recall that we can nest our functions just like we did with average before.
So let me get rid of that line and just say sterling of name is actually perfectly reasonable here.
All right, well, what more can we do with this?
Well, let's consider some other matters of design.
Let me close out length.C and let's create another program of our own called String.c
in which we'll play around now with this library and others.
Let me go ahead and include CS50.
H.
Let me go ahead and include standardio.h.
Let me go ahead and include also string.h.
string.h.
All right, what do I want to now do?
Well in Main void and inside of Main?
Let's go ahead and write a program that prints a string character by character just to demonstrate these mechanics.
So strings equals get string, and I'm going to ask the user for some input because I just want to play around with any old string.
I'm going to go ahead and proactively say output here, and I'm going to go ahead and not use a new line character there deliberately.
Below this now I'm going to have a 4 loop, though I could use a Y loop that says in I equals 0, I is less than slang.
Of the string I just got from the human and increment I on each iteration, and on each iteration print out just one character in that string.
Specifically at S location I and then at the very bottom of this program, let's just print a single backslash end to move the character onto a new line.
Long story short, what have I done?
I wrote a stupid little program that prompts the user for a string, prints the word output thereafter, and then just prints the word that they typed in character by character by character by character until it reaches the end of the string based on the length returned by Stirling.
So let's go ahead and run this in my terminal window.
I'm going to do make strings string, and I'll type in my own name it before.
This was a subtlety.
I deliberately wrote two spaces here because I just, um, to be nitpicky, I wanted input and output to line up perfectly so you can see what's happening.
Indeed, if I do enter here, now I see input is David, the output is David as well.
So that was just a formatting trick that I foresaw.
Why is this program correct but not arguably well designed?
It's pretty good in that it's using the Stirling function.
I didn't reinvent the wheel unnecessarily, but there's an inefficiency that's kind of subtle.
And it relates to how a for loop works.
Any thoughts?
This program I claim is doing unnecessary work somewhere.
Yeah.
OK, that's definitely stupid.
Um, you don't have to output a character by character, that's just my pedagogical decision here.
So correct but not the question we're fishing for.
There's a second stupid thing, yeah.
Che Yes, every time through this loop, and this isn't so much my uh conscious choice but my mistake.
I'm checking the length of S again and again.
Why?
Because recall how a for loop works.
The initialization happens once at the very beginning.
Then you check the Boolean expression.
Then if it's true, you do the code.
Then you do the update, then you check the boolean expression, then you do the code.
Update Boolean expression, you do the code.
But every time you evaluate this Boolean expression, you're asking, does is I less than the ster length of S?
But this is A function call like you are literally using Stirling again and again and again and like a crazy person you're asking the computer what's the length of S, what's the length of S?
What's the length of S?
It's not going to change.
It's going to be the same no matter what.
So how can we fix this?
Well, I could solve this in a couple of ways like I could, for instance, down here dot and equals stirling of S and store it in a variable n.
And just do that.
I think that eliminates the inefficiency because now I calculate the length of s once.
It's not going to change, nor is my variable, so I can now use and reuse that variable.
It's just saving me a little bit of time, you know, microseconds maybe, but when you're writing bigger programs and you're doing things in loops, if that loop is running not 3 times or 5 times, but a million times, millions of times, all of those microseconds, milliseconds might very well add up.
But it turns out there's some syntactic tricks we can do too.
I alluded to this earlier.
If you want to initialize not one variable, but 2, you can actually do it all before the first semicolon like that.
So now on line 9 I'm declaring a variable called I and setting equal to 0, and I'm declaring a second variable called N, also the same type in, and setting it equal to the length of S.
And now I can use that again and again.
Now, as an aside, this is a little bit of a white lie because smart compilers nowadays are so advanced that they will notice that you're calling Sterling again and again inside of a loop, and they will just fix this for you unbeknownst to you, but it's representative of a class of problems that you should be able to spot with your own human eyes and avoid altogether so
that you don't waste more time and more compute and more money in some sense than you might otherwise need to in this case.
Any questions on that there, optimization, yeah.
You do not say it again.
The constraint is that you have to use the same data type for all of your initialization, so you better hope that you only want ins in this case.
Otherwise, you got to pull it out and do what I did earlier.
Good question.
Others on this.
Yeah.
When does it account for spaces?
Uh, a space is just uh ask a character number 32, so there's nothing special about it.
It's sort of invisible, but it is there.
It is treated like any other character.
There's no special accounting whatsoever.
The null character, which is also invisible, is special because Pri F and Stirling know to look for the end of that variable, the end of that value as such.
All right, let's try one other demonstration of some of these ideas here.
Let me go into Uh, another file that we'll create called how about uppercase.c.
Let's write a super simple program that like uppercases a string that the human types in and see how we can do this sort of good, better, and best.
So I'm going to call this file uppercase.c.
Inside of this file, let's use our now friends, include CS50.h. Let's do include standardio.h.
include CS50.h. Let's do include standardio.h.
Let's then include lastly, how about string.
H?
And the goal here inside of Main is going to be to Get a string from the user, so string S equals get string, and we're gonna ask the user for a before string, representing what it is they typed before we uppercase everything.
Then I'm going to go ahead after that and print out just as a placeholder after and two spaces just to be nitpicky so that the text lines up vertically on the screen.
Now I'm going to do the following 4 I equals 0, N equals slang of S, semicolon I less than N just like before I + +.
So I'm just kicking off a loop that's going to iterate over the string the human typed in.
Now if my goal in life is to Change the user's input from lower case, if indeed in lower case to upper case, let's just express that literally if the current character in the string, so SI is greater than or equal to A and S bra I is less than or equal to z using single quotes,
this is arguably a very clever way of expressing the question is it lower.
Case we know from our AI chart from week 0 that the AI chart has not only numbers representing all the uppercase letters but also numbers representing all the lower case letters.
Lowercase a for instance, is 97, and they are all contiguous thereafter.
So we can actually treat just like we did before chars as ins and ins as chars and sort of ask mathematical questions about these chars and say is S I between A and Z inclusive.
So if it is lowercase, and I'll add a comment here for clarity, if SI is lowercase, what do we want to do?
We want to force it to uppercase.
So this is a little trick I can do as follows.
Print F.
The current character, but let's do some math on it.
Let's changes I by subtracting some value.
Well, what might that value be?
We'll recall from week ze our AI chart here and let's focus, for instance, on the lower case letters here and the uppercase letters here.
What's the distance between all upper and lower case letters?
It's 32, right, and the lowercase letters are bigger, so it stands to reason.
If I just subtract 32 from the lower case letter, it's going to immediately get me to the uppercase version thereof.
So this is kind of cool.
So I can actually go back to VS code and I can literally subtract the number 32 in this case because AI is a standard, it's not going to change.
Else if the letter is not lowercase, I'm just going to go ahead and print it out unchanged without doing.
Any mathematics at all to it, and I'll make clear with the comment, uh, else if not lower case makes clear what's going on there.
All right, let me go ahead and make upper case in my terminal window.
uppercase.
Let's type in my name all lower case and I get back, David, uh, minor bug, couple bugs actually.
Let me fix my spacing.
I think I want another space after the word after, and at the very bottom of my program, I think I want a backslash n.
Now let's rerun uh make on uppercase uppercase enter DAVID and now it's forcing it all to uppercase.
Meanwhile, if I do it once more and type in name capitalized, it's still gonna force everything else to uppercase Questions.
Oh, I'm an idiot.
OK, thank you.
Yes, uh, I misspelled after, otherwise my aligning, my alignment would have worked.
So let's do this again, make uppercase, if only so that we can prove it's the same, DAVID and all lower case, and there we go.
That was thank you, the intent.
Alright, so it's kind of a little trick, but this is kind of tedious, right?
Like Microsoft Word, Google Docs all have the ability to toggle case from uppercase to lower case or lower case to uppercase.
It's kind of annoying that you have to write this much code to achieve something so simple, seemingly and so commonplace.
Well, it turns out there's a better approach here.
In addition to there being the string library, there's also the C-type library.
In Cype.H,
another header file, there's a whole bunch of other functions that are useful that relate to characters, uh, characters, uh, in AI.
So for instance, if we go ahead and use this.
As follows, I'm going to go ahead at the top of my file here and include now C.
H.
It turns out there's going to be functions via which I can actually ask these questions myself.
For instance, in this next version of the program, I don't need to do any of this clever but pretty verbose math.
I can just say if the is lower function, which comes from the CType library passing in SI, returns true, will then convert.
The letter to lower uppercase by subtracting 32, but you know I don't even need to do this mental math or math in code.
I can also from the CT type library use a function called 2 upper which takes as input a character like SI and let someone else's function do the work for me.
So let me go back down to my terminal window here.
Let me make uppercase now uppercase enter before DAVID.
This now works too.
But if I really dig into the documentation.
And for the CType library, you'll see that you can just use the is lower function on any character and it will very intelligently only uppercase it if it is actually lowercase.
So someone else years ago wrote the conditional code that checks if it's between little a and little Z.
So knowing this, and you would see that indeed in the documentation, I don't even need this else.
I can instead just get rid of this whole conditional, tighten my code up significantly here, and simply say.
Print F using percent C, the two upper version of that same letter and let the function itself realize if it's uppercase, pass it through unchanged, if it's lowercase, change it first and then return it.
So now if I open my terminal window again and clear it, make uppercase uppercase enter DAVID, and we're back in business.
So again, demonstrative of how if you find that coding is becoming tedious or you're solving a problem that like surely someone else has solved, odds are there is in fact.
A library function for whether it's from CS 50 or from the standard library that you yourselves can use.
And unlike the CS 50 library, which is indeed CS50 specific, which is why Klang needed to know about L CS 50, many of these libraries just automatically work.
You don't need to link in the C-type library.
You don't need to link in other libraries, but non-standard libraries like CS 50's training wheels for the first few weeks, we do need to do that.
But Ma is configured to do all of that.
Automatically for you.
All right, in our final minutes together, let's go ahead now and reveal some of the details we've been sweeping under the rug about Maine.
I asked on week one that you just sort of take on faith that you got to do the void, you got to do the end, you got to do the void, and all of that.
Well, let's see why that actually is.
So Maine is special insofar as in C, it is the function that will be called automatically after you've compiled and then run your code just.
Because not all languages standardize the name of the function, but C and C++ and Java and certain other ones do in this case.
Here is the most canonical, simple form of Maine.
We know that including standardio.h just gives us access to the prototypes for functions like print F.
But what's going on with IT and what's going on with void?
Well, void in parenthesis here just means that Maine, and in turn all of the programs we've written up until this moment.
Do not take command line arguments.
Literally every program we've written out, hello, scores, everything else, I have never once typed another word after the name of our programs that we've written in class.
That is because every program has void inside of these parentheses telling the computer this program does not take command line arguments, words after the programs. Name that is different from make and code and CD and other commands that you've typed with words after them their names at the prompt,
but it turns out the other supported syntax for the main function in C can look like this too, which at a glance looks like kind of a mouthful, but it just means that Ma can take zero arguments or it can take 2.
If it takes 2, the first.
is an integer and the second is an array of strings.
By convention those inputs are called RC and RV.
RC is the count of arguments that are typed after the after the program's name.
RV is the argument vector, AKA array of actual words.
In other words, now that we have the ability to use arrays, we can get 0 or 1 or 2 or 3 or more words from users.
The prompt when they run our own programs. So what do I mean by this?
We can now write programs that actually have command line arguments as follows.
Let me go into BS code here and close our old program uppercase.
Let's write a new simpler program here in my terminal called greet.c and just greet the user in a couple of different ways.
So I'm going to include initially CS50.
H, and then I'm going to include standardio.h here.
Then I'm going to Say in main void without introducing anything new just yet.
I'm going to ask the user like we did last week for a return value from Get string, asking them what's your name, as we've done so many times.
Then I'm going to say print hello com persn spitting out their answer as follows.
Same program as last week again.
I'm going to make greet.
I'm going to say greet, and I'm prompted now for my name.
I hit enter.
Notice that I did not take any command line arguments.
The only command Iran was greet, no other words.
Let's now use this new trick and actually let the user type their name when they're running my program rather than waste their time by using Getstring and prompting them.
Let me go into my editor here.
Let's get rid of the CS 50 library.
Let's get rid of my use of Getstring.
And let's simply change void to in RGC, then string RGV open open bracket close bracket.
That's all.
Down here, let's simply print out RGV 1 for reasons we'll soon see.
The only change that I'm making really is changing the prototype for Maine from the first version, which we've been using for like a week and a bit now, to the second version, which is the only other version supported.
I'm gonna go back to my terminal window now, make greet, and darn it, I shouldn't so close.
Why did I make, uh, how do I fix the mistake I accidentally made?
Yeah, and back.
Oh, no, in front.
Yes, I should have kept the CS 50 library because it's in the CS 50 library that string is defined, so includes CS50.
H in week 4 we will delete that line for real and actually show you what string actually is.
I promised at the start of class that string is a term of art, but it's not a keyword in C, but it will see what it means in a couple of weeks' time.
OK, let me fix this.
Make greet.greet,
but now I'm going to type before.
I even hit enter my actual name, and when I hit enter now, I see hello David.
If I instead do greet Kelly, enter, now I see hello Kelly.
If I do nothing like greet enter, I just see hello null, which is not the same null as before NUL.
This is NULL for reasons we'll come back to before long, but clearly Pri F knows something's going on.
There's no actual word there.
Why though, did I do RV1?
Well, it turns out that just as a feature of C, if I recompile this program and do slash greet and type in nothing else, I'm gonna see something kind of curious.
Hello.
greet because automatically the zero location in the RVR variable will automatically contain the program's own name.
Why is this useful?
If you ever want to do something self-referential like thanks for running my program or you want to show documentation for your program and the name of your program, that it depends on whatever the file itself is called, you can use RV 0, which will always contain the program's name no matter what the file.
Has been named or renamed to, but we can fix that null issue now in a couple of ways.
So RC is the other input that I said now can exist, which is the count of arguments at the prompt.
So if I want to check if the user actually typed their name, I could say something like if R C equals equals 2, well then and only then go ahead and print out their name else let's just do some clever default like print F, quote unquote hello world or heck, nothing at all.
This.
Version of the program now is a little smarter because when I run Make greet and do greed of my name, it works exactly as intended.
But if I forget and only do greed, it's going to say hello world.
Moreover, if I don't quite cooperate and I say David Malan, enter, it similarly just ignores me because our count is not 2 anymore.
It's now 3.
So RC contains the total numbers of words at the prompt, but the first one is always the program's name.
Question.
Why do you Sorry, can you say that once a little louder?
Why is it information that we just have, or?
Oh, so the short answer is just because, like the definition of C, if you look at the documentation for C, you can either define Maine as taking no arguments with the word void, or you can specify that Ma can take two arguments, and the compiler and the operating system will just ensure that if you provide two, those two variables, RC and RV will be filled with those two values automatically.
Someone else decided that though that's just the way it works.
You can't come up, you can't put 3 there, you can't put 4 there, you can change the names of those variables, but not the types.
Because of this convention.
So there's one last feature of Maine then.
It's the actual value with returns.
Up until now, every program I've written starts with in Ma something.
I mean something.
What is that in?
We have yet to use it.
Technically, the value that Maine returns is going to be called a so-called exit status, which is a numeric status that indicates success or failure.
Numbers are everywhere in the world of computing.
So for instance, here's a screenshot from Zoom whereby Something goes wrong with Zoom, like you have bad internet connectivity or something like that, you might see an error code like 1132.
That means nothing to normal people unless you Google it, look up the documentation, but it means something very much to the software engineers who wrote this code because they know, oh shoot, 1132 means this error, and they probably have a spreadsheet or a cheat sheet somewhere that converts those codes to actually useful error messages and frankly, in a better world.
They would just tell you what the problem is rather than just say report the problem and mention this number.
That said on the web, odds are you're familiar with this number 404, which is also a weird thing for so many normal people to know, but this generally means file not found.
It's a numeric code that signifies that something has gone wrong.
Exit status isn't quite this, but it's similar in spirit.
In Maine, you can return a value like 0 or 1 or 2 or something else.
to indicate whether something was successful or not.
By convention, a program, a function like main returns 0 on success if all is well, and that leaves you then with like several 100 possible things that can go wrong because you could return 1 to signify one thing, 2 to return another, 3 to signify another, and so long as you have a spreadsheet or a cheat sheet or something, you can just keep track as the programmer as to what error means what.
So what does this mean in real terms?
Well, if I go over to VS code here, let me implement a relatively simple program, our last called Status.c.
So in status.c,
I'm going to go ahead and use the CS 50 library at the top, the standard IO library at the top, and then inside of Int Main and with our new format.
RC string arg square brackets inside of main I'm going to now do the following.
If RGC does not equal 2, then I'm going to go ahead and print out this time a warning.
I'm not going to have some silly default like he world.
Let's tell the user that they didn't use my program correct, and I'm going to say print F missing command line argument.
And we'll assume they know what that means.
Then to signify an error, I'm going to say return 1.
It could be 2, it could be 3, but this is the first possible error, so I'm going to start simple with 1.
Otherwise, if RXC does equal 2, and I get to this part of my code, I'm going to say hello SN and pass in RV 1 just like before.
And just to be super specific, I'm going Return 0 to tell the computer, the operating system, that this is success.
0 signifies success.
Any other value signifies error.
Let's make status now.
Let's do status, and this is a little magical, but let me go ahead and cooperate initially.
I'm going to type in my name David and I'm going to see hello David.
Most people wouldn't know this, but among the commands you can type at your Terminal are this one here and the TFs and the TAs and I would do something like this.
We, after running your code, can do echo, space, dollar sign question mark, and we can see secretly the return value that your program returned 0 in this case.
Meanwhile, if we do this again, status status, and let me not type my name this time.
When I do this, I see missing command line argument.
What value should the code have returned then?
One, so let's see, echo dollar sign question mark, there's the one.
So even after just one week of CS 50, if you've ever wondered how Check 50 knows if your code was correct or not, among the ways we check for that is by checking the semi-secret status code, this exit status, which is.
really a secret.
It's just not displayed to normal people because it's not all that enlightening unless you're the software developer who wrote the code in question.
But this means we could return one in some cases or 2 in other cases or 3 or 4 and yet others.
And these command line arguments are sort of everywhere and in fact.
A program I skipped over a moment ago was going to be this.
There's no academic value to what you're about to see, but another program that takes command line arguments is known as CAOSay, and this is sort of very famous in computing circles because it's been on systems for many years.
CASay is a program that allows you to type in a word after the prompt like moo, and it will print out what's called Askki Art, an adorable little cow with a speech bubble that says moo, so kind of evocative of like scratch.
But it takes other command line arguments, not just the words that you want to come out of its mouth, but even the appearance that you want it to have.
So for instance, I can say dash F duck and run it again enter, and now I have a little cute duck saying mu, which is a bit of a bug.
So let me change that to quack for instance instead.
And again, no academic value here.
It's just fun to now play with the various options.
But if we really want to have fun with this, we can do another one, so.
I say dragon and we can say something like R and now we have this crazy dragon appearing on the screen, which is to say again, no value here.
It's just fun to play with command line arguments sometimes and how is CO say doing this?
Well, someone wrote code maybe in C or some other language using RGC and RV and poking around at their values and maybe a conditional that says if the F value is dragon, then print this graphic.
Else if the value is duck, then print this other one, it all boils down to the same fundamentals of week zero of functions and conditionals and loops and boolean expressions and the like.
It's just being composed into more and more interesting things.
And indeed, in closing among the other interesting things we'll play with this week to come full circle is that of cryptography, the art of scrambling information so as to have secure communication.
So important nowadays with passwords and credit card numbers and personal messages that you might want to send.
And we'll have you explore through code some of the algorithms via which you yourselves can encrypt information.
And there's a number of ways we can do this form of encryption, and they all boil down to this mental model.
You've got some input like the message you want to send and you want to encipher it somehow, encrypt it somehow so that No one knows what message you've sent, so you want your plain text, which is the human readable version in English or any other language, to become ciphertext ultimately.
So the code you'll be writing this week is inside of this black box some kind of cipher, an algorithm that encrypts information.
So that you can do exactly this.
Now the catch is that you can't just give it plain text and run it through an algorithm and get ciphertext because you need to somehow have a secret typically for encryption to work.
Like if I'm going to send a message to someone in back, well, I could just randomize the letters that I'm writing down, but how would they know how to reverse that process?
Probably what we need to do is agree in advance that, you know what, I'm going to change every A to a B.
And every B to a C and a C to a D and a Z to an A.
I'll wrap back around at the end of the alphabet.
It's not very sophisticated, but who no middle school teacher if they intercept two kids passing notes in class, are going to waste time trying to figure out this cipher, but it does presuppose that there's a secret between them, the number one in that case, because I'm changing every letter by one place.
So how might this work?
Well, if I want to encrypt the word.
Him and my secret key with someone that I've come up with in advance is one, I should send the cipher text IJ.
Now this is a simple cipher, so I'm not really encrypting the punctuation, which may or may not be a good thing, but I am encrypting at least the alphabetical letters.
But what does the recipient then have to do to decrypt this message when they see on paper IJ, how do they know what I said?
Well, they use that same key, but subtract, so B becomes A, C becomes B, A becomes Z, and so forth, essentially inverting the key from positive one.
To -1, of course, slightly more secure than a cipher of 1.
A key of one would be 13.
And in fact in computing circles, 13 has special significance.
Rot 13 ROT 13 is an algorithm that's been used for many years online just to sort of avoid spoilers like Reddit might do this or other websites where they want you to have to do some effort to see what the message says, but it's not all that.
Hard, you just have to click a button or write the code that actually does this, but if you use 13 instead, you wouldn't get you wouldn't get IJ, you'd get UV because you and V are 13 places away from H and I respectively.
But again, we're not touching the punctuation or we could send something more personal like I love you and the message comes out like that.
Slightly more secure than that would be rot 26, no.
No, why?
Because it's the same thing.
It literally rotates all the way around A becomes A, B becomes B.
So there's a limit to this.
But more seriously, that speaks to just how strong this encryption is or is not, because if you think about this now from an adversary's perspective, like the teacher in the room intercepting the slip of paper, how much work do they need to do?
Well, they just try all possibilities key of 1, key of 2, key of 3, 25, and at some point.
will see clearly that they guessed the key, which means that cipher is not very secure.
Nonetheless, what we're talking about is historically known as the Caesar cipher because back in the day when Caesar was communicating by by by legends with his generals, if you're the first human on Earth to come up with encryption or come up with this specific cipher, it doesn't really matter how complex it is if no one else knows what's going on nowadays it's not hard at all to write some C code or any other language.
Which they could just brute force their way through this.
So there are much more sophisticated algorithms nowadays than simple rotations of letters of the alphabet, as we'll soon see.
But when it comes to decryption, it really is just a matter of reversing that process.
So this message here, if we rotate all the letters in the opposite direction by subtracting one, will be our final letters for today.
There's a bit of a hint there which will reveal that this message and our final words for us as the clock strikes 4:15 is going to be the you becomes.
T and the I becomes H.
I'm, I'm the only one this is amusing.
H I I S W A S C S 50 and this was CS 50.
We'll see you next time.
Loading video analysis...