Random Variables and Probability Distributions
By Steve Brunton
Summary
## Key takeaways - **Random Variable: Number of Heads**: A random variable X is the number of heads in four fair coin flips, taking integer values from 0 to 4, with probabilities 1/16, 4/16, 6/16, 4/16, 1/16 following the binomial distribution. [01:23], [04:49] - **Plot as Histogram**: Plot the probability of X (number of heads out of four coin flips) as a histogram with bars at 0 to 4 heads of heights 1/16, 4/16, 6/16, 4/16, 1/16, concisely summarizing all 16 sample space outcomes. [05:41], [06:49] - **Discrete vs Continuous**: Discrete random variables like number of heads take a discrete set of values; continuous ones like height of Americans take any continuum of values within a range. [02:46], [03:23] - **Distributions: Binomial, Normal**: Number of heads in N coin flips follows a binomial distribution; height of Americans follows a normal or Gaussian distribution, with binomial approximating normal for large N. [05:10], [11:22] - **Why Random Variables? No Counting**: Random variables provide a concise function for probabilities without enumerating huge sample spaces, like 2^100 coin flip sequences reduced to binomial distribution. [14:26], [15:07] - **PDF Models Real Processes**: The probability density function is a mathematical model approximating random processes, like binomial for coin flips despite minor real-world biases such as wind resistance. [15:40], [16:36]
Topics Covered
- Random Variables Compress Sample Spaces
- Binomial Converges to Normal
- PDFs Model Real Random Processes
- Distributions Enable Hypothesis Testing
Full Transcript
welcome back so today we're going to introduce a really really major Concept in probability the concept of a random variable so just like in uh algebra and
calculus you can have a variable X that takes on values two or three or Pi or whatever values a random variable in
probability and statistics is also a a variable that can take on uh a given set of values and then you can assign a
probability to each of those uh values of X okay so I'm going to give you some examples um in fact let's just start off with an example and then kind of Define
what we mean by X and give uh a more formal definition so um a good example is let's say that I am flipping My Fair
coin okay remember uh I have my My Fair coin here so I'm going to flip a coin four times flip a coin uh four
times and we've already talked about the sample space Omega of all of the things that could happen you could have heads heads heads heads or heads heads Heads
Tails Etc a random variable is going to be some uh real valued kind of um number that you associate to each of those
experiments in that sample space so let's say I you know flip a coin four times x my random variable could be the number of heads that occur this could be
the number of heads that occur so um we we know I'm actually just going to write this in Orange we know
our sample space um Omega for four um four coin flips is just I'm just going to list it out heads heads heads heads heads heads Heads Tails heads Heads
Tails heads uh dot dot dot dot all the way to taals taals Tails Tails okay um and I think there are uh 16 elements of
the sample space there's 16 different sequences of heads and tails I can flip um 2 to the^ 4 and this random variable
X is the number of heads that could uh occur in these coin flip sequences and
so X we would say we would say that X is uh takes values between 0 1 2 3 and four it could be an integer between 0 and
four I could get zero heads I could get four heads I could get one two three heads as well okay so X is a variable
and it has um you know this kind of um set of numbers that it could it could take so this is a discrete random variable my random variables could be
discret um they could be discreet like the number of heads in N coin
flips or they could be continuous they could be continuous random variables a good example of a continuous random variable would be the height of an average the height of an
American the height of Americans um height of Americans you could make it the height of Brazilians the height of uh French the
height of people worldwide it doesn't matter the height of Americans this could be in feet it could be in meters it could be in CET but this is a variable that could be
kind of any Continuum of values you know within some reasonable range discret random variables can only take on a discreet set of values okay now
importantly I can assign a probability to each of these values of X that's really really important so to each value of x I can assign a probability so the
probability of x uh equals Zer that I get no heads there's only one out of these 16 uh cases where I get zero heads so
this is a 1 and 16 probability um the probability that x equals 1 I'm just going to use shorthand um x equal 1 the probability um we're
going to use the binomial distribution we know that this satisfies or follows the binomial distribution remember Pascal's triangle uh and N choose R so
the probability that xal 1 is going to be uh I think 416 I think the probability that x equals
2 uh is 66 then x = 3 the probability again is 416 and the probability that x = 4
is 1116 so this just is a nice example uh of a binomial distribution and what we would say I'll get into this in a minute but
we would say x is distributed as a binomial random variable so it is a random variable that adheres to the binomial distribution of probabilities
um that we can calculate here and one of the cool things um that you can do then when you have this random variable and you can compute the probabilities of each of these states of the random
variable is this gives you a very nice concise way of plotting uh or summarizing those probabilities in a histogram in a plot so let's actually do
that so we're going to plot um the probability of X number of heads out of four coin flips so this is my variable
X um this is going to be 0 1 2 3 and four and I'm going to rough roughly try to do this it should go from 1 2 3 4
five six so the probability of getting zero uh heads the probability of X being
zero is 116th the probability of X being one of having one uh heads in this coin flip is
416 uh 1 two 3 4 16th each of these ticks is a 16th of the probability the probability of X equaling 2 is 616
that's this one here the probability of X equaling 3 is again 416 and the probability of X equaling 4 is 116th and
you should always label your axes this is 0 1 2 3 and four heads and this is
probability in 16 this is 1 16th okay good so this is a really nice concise way of summarizing this relativ
you know uh big set of things that could H happen all of these 16 possible cases of four coin flips can be summarized in this distribution this is called a
probability distribution over my random variable X now when I was learning Algebra I remember this idea of a variable being kind of a tough
abstraction I think this is something that kids um I had a hard time with this I remember my mom telling me she had a hard time with this um um you know my kids I remember there was a day or two
where they had a hard time with this idea of an abstraction of a variable so you're used to doing math with numbers 3 + 5 is 8 you know and then when you
start introducing x + 5 = 8 what is X that takes people some time to get used to now we're all super comfortable with that abstraction now but this might take
you a little while so it's okay if it takes you a minute for this abstraction to sync in and for you to have intuition we're going to do probably 10 lectures just on examples of random variables how
to compute functions of random variables what are some of the most common examples of Rand random variables and you're going to build this intuition just like you did in algebra and
calculus so it's going to be okay okay so um a random variable essentially is just like a normal variable except I can assign a probability to X being in each
of its possible states that that's really important um and maybe I'll write that down here so um we essentially have P of
x uh is a function you can plot is a function called a distribution called um it's called a pro a
distribution or a probability distribution um called a distribution and roughly speaking what
you need you need the values of X you need to Define kind of what is the the domain of this probability space and what is the range of this
probability distribution so the values that X can take and then you need to have um essentially the probability uh P of each of each value
okay probability p uh of each possible value of x so here we know that our our number of heads our random variable is the number of heads in four coin flips X
can take five different values and here are the probabilities of X taking each of those five values so we have defined a probability distribution over our random variable X and that's a really
really useful concept um this is the kind of thing that you're going to really be glad that you have access to this abstraction of probability um distributions for example let's say I
flip a coin uh a hundred times I don't want to list two to 100 possible coin flips it's it's uh it's like kind of incalculably large the number of
possible sequences of coin flips if I flip a coin a hundred times I don't want to ever write that down on a on a whiteboard um or try to enumerate that
but the number of heads that occur in a 100 coin flips is a binomial distribution it has a name and there is a function for the probability density
of each of you know for their being 20 heads or 21 heads or 22 heads and so to some extent you can write down the probability distribution sometimes we
call it a probability density function you can write down that probability distribution for much more complicated random variables for much more complicated processes where you could never enumerate all of the possibilities
so this becomes really useful um and we have this uh number of heads and N coin flips that is going to be uh we're going to say X
follows a binomial distribution we're going to Define what this is in one of the next lectures um it's related to those binomial coefficients um that we looked
at earlier if we have a continuous variable like the height of Americans This is actually going to uh in this case follow a normal distribution uh a
normal or a gausian distribution so you've seen this before it's kind of the bell curve a normal distribution here and you'll notice actually the binomial distribution if I have n gets very large
if I have 100 coin flips or a thousand coin flips this will start to approximate uh or converge to a normal distribution so really really kind of
useful stuff and you can do things like you can calculate what is the probability if this is my distribution of heights so this is x in let's say uh
feet and let's say this you know average height I don't know what the average height of Americans is let's say it's like five8 or something like that and maybe this is six feet and I don't know I'm just making up numbers right let's
say 5 six 7 4 3 two whatever you can compute the probability that someone is less than 6 feet tall you can compute
what is the probability that someone is less than six feet tall just by integrating all of these probabilities up to that point I can compute the probability that someone is between seven and 8T tall um you know by
Computing adding up all the probabilities between xal 7 and xals 8 so these random variables and these probability distributions on those random variables you can do all kinds of
things you can do calculus on those variables you can integrate probabilities and find what's the probability within some range that I'm within some range of values of X um I
can compute the expected value what do I expect if I just pick someone off the street what is the expected height of that person and how much spread
do I have in that expectation how would I be um surprised if someone was 2 feet shorter or taller than that expected value these distributions have all of
that information and you can Define functions on these random variables the probability is one function you can also Define uh the expected value of x or the
variance of X or the standard deviation of X and all kinds of other functions of this random variable okay good um so we're going to have a bunch more examples we're going to talk about
binomial normal Pon exponential gamma a lot of the most useful random variables that are useful for things like how long do you expect to wait at the DMV if
there's five people ahead of you and the average weight time is 2 minutes um how many emails do I expect to get in the next 10 minutes given a certain rate of
emails and would I be surprised if I got no emails in that time like these are the kinds of questions you can ask an answer now that we have this abstraction of random variables so maybe I'll just
write down some of the why so I've already said most of this but like why do we want random variables um first it's a
concise summary a concise expression of all of your probabilities a summary of all
probabilities good it's a concise of all of the probabilities um sometimes it's hard to count these probabilities remember if I have 100 coin flips I don't want to count all of the ways I
can get 50 heads my probability distribution over that random variable it's a function that I can write down so I don't have to count all of those
possibilities okay so sometimes we get a function for p ofx so we don't have to count so no
counting remember uh the older I get the harder it is for me to count I have to have everyone be quiet while I'm counting to 20 and so I don't like counting if I have a function for these probabilities I want to use that
function most of the time uh two the probability density function this probability distribution sometimes we're going to call it a PDF uh which is a
probability uh density function and usually I think of probability density functions for these continuous variables a PDF this PDF this
is just a Shand I'm going to use to define my probability density this PDF uh is a
model of a random process and this is a really really important idea here that the real world never is
exactly uh beri or binomial or normal or any of the distributions these are mathematical approximations just like if I throw ball you know and I write down f equals Ma I might neglect wind
resistance I might neglect rotational effects there are all these things these approximations I make to get my simple you know ballistic trajectory or my simple f equals ma descriptions or my
simple um you know Galileo's Tower of Pisa constant gravity like all of these things are are approximations that's also true in probability the PDF is just a model of a
random process um if I flip this coin really there's wind resistance probably it's not perfectly 50% probably the way I flip a coin might be like a tiny bit
bias there's all of these things that make it not perfectly uh random but this is a good model of that random process and an important point of that model is
that this allows you so we're talking about probability for the most part here but this dual notion of Statistics is let's say I collect data I should be able to test my hypothesis that my
system is by bomi distributed or normally distributed I should be able to test hypotheses this allows me to test
hypotheses uh with data so for example if I flip 10 coins in a row and I get heads all 10 times does that do I actually think that my
PDF is binomial um with you know equal probabilities that would be a hypothesis testing problem and I could get a probability of How likely that sequence
of 10 heads is given that my coin is binomial a fair binomial or brui coin and so you can do hypothesis testing you can also do parameter
estimation um parameter estimation uh again with data so this is a machine learning or a data statistics
problem if I know that my my uh distribution of Americans Heights is normal we know that the normal distribution has two parameters that completely define it the mean and the
standard deviation so I can take my data I can sample 100 people off the street and I can get a really good estimate of that mean and standard deviation and
then from that small sample I can say something important about the much larger uh distribution of you know
whatever 300 million people good okay um three you can visualize data in a much easier way you can visualize you can
visualize and compute so we can compute uh our probability density functions uh we can um we can plot maybe
I'll switch to Yellow we can uh compute things like the probability that X is less than 6 and greater than four let's
say you know what's the probability that a random person is greater than 4 feet and less than 6 feet this is something you can compute once you have this probability
distribution um I can compute how unlikely is event X how unlikely is it that I am greater than 7 feet tall all kinds of things like that uh and you can
do calculus um in this case this probability is going to be the integral from 4 to 6 of my probability density
function let's just call this P of X DX okay so you can do calculus on these continuous distributions if I had a disc
distribution I would be doing a sum you know from K = 4 to 6 of P of X you know equals k something like that okay you
can compute these things using sums or integrals on these distributions you can do computations with these um last thing and there's a lot more dot dot dot dot dot but the last thing I'm going to
mention is that you can build functions you can build uh functions uh on X you can compute the
distribution of x s that can be useful sometimes uh you can compute the distribution of X squ that's a function of X you can compute the expected value of x that's another function or the
variance of X that's a function um you can take an uncertainty a normal gausian distributed uncertainty and you can propagate it through some Engineering Process some manufacturing process or
some dynamical system I can take an uncertainty in my initial condition of a d of a differential equation and I can propagate that uncertainty and see how it spreads uh in chaotic systems it gets
spread around the chaotic attractor all kinds of interesting things you can build functions on X you can propagate X through dynamical systems uh you can propagate uh propagate
uncertainty um through Dynamics this is what gausian processes do um is they propagate uncertainty through Dynamics super super useful
ideas um okay that's probably all I want to show you right now um one thing I'll just very very briefly mention is that if we had this probability density
function that tells me the probability of X being at a certain value there is also this notion of a cumulative probability it's the probability of X
being less than a certain value um and that would just be the integral of this probability distribution up to that point x that's called the cumulative density function the CDF we'll talk about that later I just wanted to
mention that that's also a useful function of X is the cumulative distribution function okay um a lot more coming up soon I'm going to show you examples of binomial normal Pon
exponential gamma um you know a bunch more we're going to work out examples we're going to use these to compute really intuitive things that you're going to be able to use uh both in your
daily life and um you know to do better engineering okay thank you
Loading video analysis...