Two-dimensional continuous distributions: an introduction
By Ben Lambert
Summary
## Key takeaways - **Continuous Outcomes Require Continuous Distributions**: When describing random processes with continuous outcomes, like the volume of beer consumed or body fat percentage, we must use continuous probability distributions, not discrete ones. [00:10] - **Visualizing 2D Distributions: Surface vs. Contour Plot**: Two-dimensional continuous probability distributions can be visualized as a 3D surface, but contour plots are often preferred for easier analysis, representing lines of constant probability density. [02:15], [03:14] - **Valid 2D Distribution Conditions**: For a two-dimensional continuous distribution to be valid, the probability density function must be non-negative for all outcomes, and the double integral over all possible values must equal one. [05:25], [06:03] - **Integral Represents Total Probability**: The double integral of a two-dimensional probability distribution function over all possible values represents the total probability, which must equal one, signifying that any sampled individual must fall within the defined outcome space. [06:34], [07:02]
Topics Covered
- Randomness means uncertainty, not chaos.
- Visualizing complex data with contour plots.
- Contour plots simplify 3D probability distributions.
- Valid probability distributions require non-negativity.
- The integral of probability density must equal one.
Full Transcript
in this video I want to further explain
what is meant by a 2-dimensional
probability distribution and in this
video we're going to be considering a 2d
probability distribution for continuous
outcomes so as before we're considering
two processes or we're considering the
outcome of two separate processes where
each of those processes is random in
nature and by random I don't mean the
colloquial use of random I mean that the
outcome of that process is uncertain and
unlike the discrete case the individual
outcome of one of these processes
belongs or can be seen as belonging to
one of a continuum of possible outcomes
and because the thing which we're
describing can be measured as being
continuous and because the outcome is
continuous in nature the distributions
that we use to describe it are
unsurprisingly continuous probability
distributions the example that I'm going
to be using here is going to be two
processes one of them is the volume or
beer that an individual drinks per week
and I'm going to use the random variable
B to represent that measurement of the
volume of beer that they drink on an
average week the other thing that we're
going to consider is the number of grams
of body fat that an individual has and
just for completeness I'm going to say
that we measure the volume of beer drunk
in terms of the number of liters so why
are we using probability distributions
to describe the outcomes of these two
processes well the idea is that what
we're imagining we're doing here is that
we are sampling an individual at random
from the population and because before
we actually measure their level of body
fat or asked them or they tell us their
level of alcohol that they drink a week
we are uncertain about both of those
outcomes and hence because of this
uncertainty we use probabilities
distributions so what might a
probability distribution function look
like in this case well we've got two
potential outcomes and so we've got on
our axes here be a random variable which
represents the volume of beer drunk per
week and F which represents an
individual's level of body fat
now the third axis as for the univariate
case represents a probability density so
what might our probability distribution
function look like in this case well it
would be a kind of surface so I might
sort of represent it in this sort of
form here
where we can sort of imagine that the
thing I'm drawing is actually kind of a
three dimensional surface or rather a
surface which exists in three dimensions
but this representation that I've shown
thus far is a bit difficult to analyze
and so what we typically do is we
represent the probability distribution
function in a slightly different form
here what we do is we use something
which is known as a contour plot so here
the bottom axis will be the volume of
beer drunk per week and then the
vertical axis will be an individual's
level of fat then what we do is we draw
contours in a graph and these contours
represent lines which have all got
constant probability density so here
this purple line which I've drawn here
might correspond in the left hand case
to a sort of path which is represented
by the purple line I've shown here on
the Left where all of the combinations
of body fat and alcohol drunk per week
that I've shown here by the sort of path
on the sort of horizontal plane
correspond to the same level of
probability density then I could draw a
contour of slightly lower levels of
probability density and represent that
by the Green Line and perhaps that green
line corresponds to a sort of Green Line
which I'm now writing over the orange
line on the left-hand side here and then
finally I might draw another contour
which car
horns on the left here to the sort of
line of level of a sort of lower level
of probability density here and I
represent that in the right hand case
here by another sort of consort which I
draw here because humans tend to find
two dimensions easier to think in than
three dimensions we often use this kind
of thing on the right hand side here
this visualization trick which is known
as a contour plot but remember whenever
you see a contour plot what essentially
is happening here is that the
probability density is sort of coming
out of the page at you or out of the
screen in this case where in this case
the pink line here corresponds to a high
probability density the green to a
slightly lower one and the light blue to
a slightly lower one still okay so now
we've covered how we kind of visualize a
two-dimensional continuous probability
distribution but what are the conditions
under which that distribution is
actually a valid probability
distribution well much like the discrete
case we require that the values of our
function which is now a function of the
two random variables B and F must be
greater than or equal to zero for all
potential values of B NF and we see that
that satisfied here because of the fact
that our probability density of all I've
really shown it but it's always greater
than zero the second condition is
essentially the analogue of the two
dimensional discrete case where we have
to do two summations
now because we're dealing with
continuous variables what we do is we do
two integrals we integrate the
probability distribution function of now
it's of two variables being an F between
B being between zero and infinity and F
between zero and infinity it's zero here
in both cases because you can't drink a
negative volume of beer nor can you have
a negative level of body fat and this
integral must be equal to one and the
intuition behind this condition is just
saying that any individual that we pick
from any population
must belong in somewhere in this kind of
plane that we've drawn here where we've
got positive levels of body fat and
positive levels of beer drunk per week
or not at least non-negative values of
both of those individual things so what
actually does this two-dimensional
integral kind of represent well we can
imagine it as being sort of working out
the volume which is contained under this
surface here and plane which corresponds
to a probability density of 0 so we're
kind of working out the the volume
underneath this particular surface that
I've drawn here
Loading video analysis...