Introduction to Statistics
By The Organic Chemistry Tutor
Summary
## Key takeaways - **Calculate Mean, Median, Mode, and Range**: To find the mean, sum all numbers and divide by the count. The median is the middle number in an ordered set. The mode is the most frequent number, and the range is the difference between the highest and lowest values. [00:32], [01:45] - **Median for Even Data Sets**: When a data set has an even number of values, the median is the average of the two middle numbers. For example, if 21 and 37 are the middle numbers, their average (29) is the median. [05:10], [05:30] - **Bimodal Data Sets**: A data set can have more than one mode. If two numbers appear with the same highest frequency, both are considered the mode, resulting in a bimodal data set. [06:02], [06:15] - **Identifying Outliers**: An outlier is a data point outside the range of Q1 - 1.5*IQR to Q3 + 1.5*IQR. For instance, if the range is -4 to 28, a value like 29 would be an outlier. [09:14], [13:52] - **Skewness and Distribution Tails**: A distribution skewed to the right has a tail extending towards higher values, with the mean being greater than the median. Conversely, a left skew has a tail towards lower values, with the mean less than the median. [27:28], [30:01] - **Creating Histograms**: Histograms represent data using adjacent bars, unlike bar graphs. They are constructed from frequency distribution tables that group data into classes or categories. [42:07], [43:40]
Topics Covered
- The Median: Finding the Middle Number
- The Mode: Identifying the Most Frequent Number
- Calculating Quartiles (Q1 and Q3) and the Interquartile Range (IQR)
- Building a Frequency Distribution Table for Histograms
- Calculating Relative and Cumulative Relative Frequency
Full Transcript
let's start with this problem
find the mean median mode
and range of the following data set
now the first thing that i would
recommend doing
is arranging the numbers in increasing
order
so the lowest number is seven and we
have two of them
after that the next number is 10 and
then
14 15
23 and 32.
now to calculate the mean what we need
to do
is we need to take the sum of the sum of
the numbers
and divided by the seven numbers that
are in the data set so this is going to
be 7
plus 7 plus 10 and so forth
and then we're going to divide it by the
seven numbers
so the mean is basically the average of
those numbers
so the total sum that i got is a hundred
and eight if you take 108 and divide it
by seven
that's going to give you 15.4
i'm gonna round it to three so it's
approximately 15.43
so that is the mean that's how you can
find it
now what about the next thing how can we
find the median
of this data set
the median is basically the middle
number
so what i like to do is eliminate the
first
and the last number and then working
towards the middle i'm going to
eliminate the next two numbers
until i'm left with the middle number so
as we could see in this example
the median is equal to
14. it's simply the middle number of
data set
now what about the mode what is the mode
in this problem the mode is simply the
number
that occurs the most frequently
i think i said that wrong it's basically
the the most frequent number
in the data set and notice that 7
appears twice in this data set so 7
is the mode
now what about the range
the range is simply the difference
between
the highest number and the lowest number
so it's going to be 32 minus 7
which is 25. and so now you know how to
find the mean
median mode and range of a data set
now let's move on to number two now this
is going to be a similar problem
but it's not exactly the same as you can
see
we have eight numbers in our data set
as opposed to the seven that we had
previously
so let's begin by putting these numbers
in order
so the lowest number is 11 and then the
next number is 15
and then we have 21
and then 37 41
and 59 so let's begin by calculating
the mean so the mean
represented by the symbol x bar is equal
to the sum
divided by the number of numbers in the
data set
so let's add up the eight numbers
this is going to take some time
and then divide the sum by those eight
numbers
so we have 11 plus 15 plus another 15
and then 21 37
41 and 259
so i got a sum of 258. if we divide that
by 8
you'll find that the mean is 32.25
so that's the answer for the first part
of the problem
now let's move on to the second part
let's calculate the median
as we said before the median
is simply the middle number
so let's eliminate the first and the
last numbers
and then the next two and the next two
notice that we don't have one number in
the middle but this time
we have two numbers in the middle so
what should we do
if we come across a situation
in this case what you need to do is take
the average
of those two middle numbers so you need
to add them up and divide
by two two plus three is five
seven plus one is eight so 21 plus 37 is
58
and if you divide it by two this gives
you 29 so 29 is the median
in this data set
now what about the mode
what is the mode in this example
now as we said before the mode is the
number
that is basically is the most frequent
number in the data set
but this problem is different from the
last one because
we have two numbers that appear twice 15
and 59 so which of these is the mode
it turns out that they're both represent
the mode
so the mode is 15 and 59
so what we have is something known as a
bimodal
data set because there's two modes
instead of one
it's not unimodal
now what about the range well there's
nothing different about the range
in this problem compared to the last
problem it's simply the highest number
divided by the lowest number so you can
say h
minus l so the highest number is 59
as we could see here and the lowest
number is 11.
so 59 minus 11 is 48.
so now you know how to find the mean
median mode
and range of a data set
now let's talk about finding the
quartiles
and the interquartile range
so what i'm going to do right now is i'm
going to
make basically a number line
with a beginning and an end and let's
say this number line represents
our data the lowest value
is known as the minimum the highest
value in our data set
is the maximum now we're going to break
this number line
into four equal parts
the first part is known as q1 this is
the first
quartile the second one is q2 the second
quartile
and then this is the third quartile
so let's say if the data was normally
distributed
this would be at a zero percent level
this would be 25
50 75
and then this will be 100 so you could
see
how the quartiles are related to each
other respectively
now how do we go about finding q1
q2 and q3 how do we do that
q2 is basically the median
of the entire data set
q1 is the median
of the lower half of the data set
and q3 is the median of
the upper half of the data set
now the interquartile range represented
by iqr
this is the difference between
q3 and q1 so once you find q1 and q3
you can now calculate the interquartile
range
now the next thing that i want to
mention is the ability to find
or identify if a number in a data set is
an
outlier
so here's what you need to know it's not
going to be an
outlier if it's within this range
if it's between q1 minus 1.5
times the iqr so that's the lowest that
it can be
or the highest it can be is q3 plus
1.5 times the iqr
so if you have a number that is within
this range
it is not an outlier but if you have a
number in the data set
that is outside of this range and then
that number
is an outlier so let's work on an
example
let's say we have the numbers 7
11 14 5
8 27 16
10 13 17 and 16.
go ahead and identify q1 q2 and q3
calculate the interquartile range the
iqr
and determine if there's any outliers
in this data set feel free to pause the
video and
use what you know to try
now as always the first thing we should
do is organize
the data the lowest number is 5
and then we have 7 8
10 11 perhaps it's uh best if we
cross it off as we go along and then the
next number
is uh 13 and then
14 16 there's two 16s
and then the 17 and then 27.
now what do you think is our next step
in order to find the interquartile range
and the three quartiles
what's our next step the best thing to
do
at this point is to determine the median
of the entire data set
which is going to be q2 so we could
eliminate the first two numbers
the first and the last number and then
the next two
until we get a number in the middle
so notice that 13 is in the middle
so therefore 13 is going to be
q2 now what i like to do is
i'm going to get rid of this number for
now
and i'm going to put a line between
the left side and the right side so i
want to separate
the lower half of the data set with the
upper half
of the data set but the 13 is still here
though
so just keep that in mind so that's our
q2 value
now q1 is the median of
the lower half of the data set so what
is the median
of those five numbers the median
is simply going to be the middle number
of those five numbers
so q1 is 8. now what is the median
of the upper half of the data set
notice that the middle number is 16 so
that is
q3 notice that we have a total of
11 data points and so that's why
13 is not included in the lower half or
the upper half
because if it was one side will have
five numbers the other side will have
six
and that's why i chose to write it up
here so that
the lower half is the same as the upper
half
they both contain five numbers
now let's go ahead and calculate the
interquartile range
iqr so we said it's the difference
between the third quartile
and the first quartile so it's going to
be 16 minus 8
which is 8. and so that's how you could
find the interquartile range of a data
set
now what about the presence of any
outliers
so looking at these numbers do you think
we have
a number that really stands out that
doesn't belong
right now 27 appears to be very far
off from all the other numbers so do you
think 27
is an outlier
well let's find out
so let's write down what we know
the presence of an outlier is based upon
this range
it's q1 minus 1.5 times the iqr
to q3 plus 1.5
times the iqr
so we know that q1 is 8
and the interquartile range is also
8. q3 is 16.
so this is going to be 16 plus 1.5 times
8.
what is 1.5 times 8 1 times 8 is 8.5
times 8 is 4
8 plus 4 is 12. so this is going to be 8
minus 12
and this is 16 plus 12.
now 8 minus 12 is negative 4 and
16 plus 12 is 28.
so now looking at what we have is 27
an outlier based on its
range because 27
is between negative 4 and 28 27
is not an outlier now if we had 29
that would be an outlier so now you know
how to determine
if a point is an outlier within a range
now let's talk about how we can create a
box and whisker plot
the reason why we want to talk about
this now is because it's related
to the values of q1 q2 and q3
so typically a box in whisker plot looks
something like this
assuming if there's no outliers
this is going to be the lowest value on
the right we're going to have the
highest value or the maximum
and then this line here represents the
value of the first quartile
which is the 25th percentile
in the middle we have q2 the second
quartile which is the 50th percentile
and then this is q3 the 75th percentile
and so that's the basic shape of a box
and whisker plot
now what about if we have an outlier
let's say if we have one to the right
then this will no longer be the maximum
the outlier is shown as a point it's
outside of the box in whisker plot
now if it's on the left side this will
no longer be the minimum
and that will be the outlier there so
let's work on an example
let's say we have the numbers 16
18 28
13 50 31
25 22
and let's say uh another 18
23 29
31 and 38 actually i wrote down 31
already
let's make this 38
so we have 12 numbers
using this data set go ahead and find
the interquartile range q1 q2
q3 identify the presence of any outliers
and then using all of that information
construct
a box and whisker plot feel free to
pause the video if you
want to try that yourself now the first
step
as always is to write the numbers in
increasing order
so we have 13
16 18
so let's get rid of those numbers
and then there's another 18 and then
22 23
next is 25
and then 28
29
and then the last three are 31
38 and 50.
now what i like to do is break up the
data
into four quarters or four sections
now because we have an even number of
data points we have a total of 12
we can put a line right in the middle so
now this is the lower half of the data
set
and here we have the upper half
so let's determine the median for the
entire data set
because it's even we can't just
immediately
identify the median the median is going
to be an average
of these two numbers if we eliminate the
first two
and then the next two and so forth
we will eventually get to these two
numbers so what is the median between
or what is the average between 23 and 25
so if you add up those two numbers and
divide by two
you're just going to get the midpoint of
23 and 25
which is 24.
so 24 is the second quartile
this is the q2 value
now let's focus on the lower half of the
data
what is the median of the lower half of
the data
so notice that we have six data points
in that section and because it's even
the median is going to be an average of
these two numbers
so let's put a line there the average of
18 and 18
is 18. so that's our q1 value
now what about the median of the upper
half
of the data so because we have six
numbers here we're gonna put a line
right in the middle
so we have 3 on the left 3 on the right
and the median is simply going to be the
average of these two numbers
the average of 29 and 31 is the number
in the middle
which is 30.
and so this is why i like to use these
lines here
so now we have three numbers
in each of the four sections of our data
so now that we know q1 q2 and q3
what is the interquartile range what's
our iqr value
the iqr is the difference between the
third quartile
and the first quartile so it's going to
be 30
minus 18 which is 12 in this example
so now that we have that
our next step is to determine if we have
any outliers
in this problem
so remember this is what we need we need
to create a range
the lowest point of it will be q1 minus
1.5
times the iqr and the highest point of
the range
will be q3 plus 1.5 times the iqr
now i'm going to have to get rid of a
few things
so let's write down the information that
is important
so q1 is 18
q2 is 24
q3 is 30. our minimum
the lowest value is 13
and the maximum our highest value is 50.
so let's keep that in mind so now i can
get rid of this
you may want to write that down just in
case we need to go back to it
and i can also get rid of this too
so let's plug in what we know into this
expression
so q1 is 18 and
the iqr i gotta write that down again
that was uh 12
it was 30 minus 18 which is 12.
so this is going to be 18 minus 1.5
times 12
and then q3 is 30 plus
1.5 times 12.
now what is 1.5 times 12 so it's
basically 12
plus half of 12 which is six twelfth
plus six is eighteen
eighteen minus eighteen is zero thirty
plus eighteen
is forty eight so
do we have any outliers
are there any numbers outside of this
range
and it turns out that there is the
maximum 50
is not between 0 and 48 so this
is an outlier
now going back to our original data i'm
just going to rewrite it
so you could see everything
this is what we had
so now at this point let's go ahead and
let's make a number line
so 0 is going to be our lowest point and
we're going to go up to 50.
so let's go by tens
now let's put a mark in the middle to
represent
the fives
so let's begin by drawing a box
so we need to draw a box ranging from q1
to q3
so that's going to start at 18 and this
is basically a rough estimate
it's not going to be perfect and the end
of the box
will be at 30.
now 13 is the minimum
which is approximately around that area
so that's the left side of the box and
50 is the maximum but that's an
outlier so we're going to put a point at
50.
38 is the second highest
which is not an outlier so that's going
to be part of the box
and whisker plot and so 38 is in this
region
so we're going to say it's over there
now q1 is 18
q3 is 30 but we also need to write what
q4
i mean not q4 but where q2 is q2 is 24
which is around here
so this is q1 that's 18 q2
and q3
so the left of the box represents the
interquartile range
which is 12. that's 30 minus 18.
and so here is the minimum and this is
the maximum which is the outlier but
that's how you can construct a box and
whisker plot
given a data set
now the next topic we need to talk about
is skewness
so let's say we have
this representation of our data
and notice that it is symmetrical
this line represents the median
and if you have a data that is perfectly
symmetrical
the mean this is the sample mean
it's going to be equal to the median
now the box and whisker plot will look
something like this
q2 is going to be right in the middle of
q1 and q3
so notice that the box plot is evenly
distributed
the left side is the same as the right
side and left
and also these lines are equal in length
so that represents a symmetric
distribution
now what if it's not symmetrically
distributed
what's going to happen in that case
there are two possibilities the data can
be skewed to the right
or it can be skewed to the left
so which one would you say this
particular shape represents
would you say it's skewed to the right
or skewed to the left
notice that we have a tail
that extends towards the right so
this particular data or this graph
we say that it's skewed
to the right now what is the
relationship between
the sample mean and the median in this
case
so the median the middle portion of the
data will be somewhere in this region
and the sample mean will be to the right
of the median
since it's skewed to the right so the
mean
will be greater than the median in this
case
by the way whenever you have a shape
that's skewed to the right
some textbooks will refer this as a
positive skew
and it makes sense because positive
numbers on a number line
will be on the right side
now you need to be familiar with the box
and whisker plots
for this type of distribution
so here's one example
notice that the right side of the box
is longer than the left side so this
tells us
that q3 minus q2
is greater than q2 minus q1
and so in that case looking at the box
and whisker plot
you can see that it's skewed to the
right now sometimes
these two boxes may be equal in left
nevertheless this side might be longer
than this side so even if the boxes are
of the same life if this side is longer
it will also be skewed
to the right
now what about if it's skewed to the
left
in this case the graph is going to look
something like this
let me try that again
so notice that the tail is on
the left side so in this case we have a
negative skew
where we can say that is skewed to the
left
now the median will be somewhere in this
region
and the mean is going to be to the left
of the median since it's q to the left
so the sample mean is less than median
now how can we represent this using a
box plot
well here's one possibility
so in this case q2 will be closer to q3
in the box plot so as you can see the
left side
is longer than the right side
so we could say that q2 minus q1
is greater than q3 minus q2 in this case
now if the boxes are equal in life
you can also tell that we have a
negative skew
if the left side of the box plot
is longer than the right side
so that's another indication that the
data is skewed to the left
now there's some other things that you
need to know if you're going to take
a statistics course and you need to be
able to create
a dot plot so let's say if you have the
numbers 5
8 3 7
1 5 3
2 3 3 eight
five with this information how can we
construct
a dot plot well we can begin by
drawing a number line so let's say this
is zero
one 2 3
4 and so forth
we could stop by 8 since 8 is the
highest number
now the first number is a 5. so all we
need to do
is draw a dot above the number five
and then let's put the dots one at a
time
so the next number is an eight so we're
going to draw a dot
at eight and then it's a three
so let's put the dot there next we have
a seven
and then it's a one
and then a five so notice that we have a
second five
all we need to do is draw another dot
above the first one
and then it's a three two
and then another three and another one
and then eight and then five
so that's how you can make a dot plot
now using this dot plot which
number is the mode what would you say
now if you recall the mode is the number
in the data set that occurs most
frequently
so in this case it's the number with the
most dots
so the mode for this data set is 3. by
the way
if you haven't done so already don't
forget to subscribe to this channel
and click on that notification bell now
let's talk about how we can make
a stem and leaf plot
so let's say we have the numbers 4
9 13 13
17 21 36
38 let's see
another 38
and then 56. how can we make a
stem-and-leaf plot
with this data so the first thing we
need to do
is we need to write two columns
on the left it's going to be the stem
and on the right the leaf
so the first number is four so for the
stem we're going to write
zero and for the leaf we're going to put
four the next one is nine
so we're going to write or represent
nine as zero nine
so we have zero on the left and then
we're going to put nine
on the right the next number
is thirteen
so the first digit is a one the second
digit we're going to put
in the second column now we have another
13.
so all we got to do is add another three
now for 17
we need one and seven we already have
the one but we need to write the 7
to the right side now for 21
we need to put a 2 in the stem column a
1 in the leaf column
next we have 36 so we need a 3
in a stem column a 6 in the leaf column
now notice that we have 3 38
so we got to add 3 8 to the leaf column
and finally 56 now we don't have
anything in the 40s
so we're going to write a 4 but we're
not going to put anything here
for 56 we're going to write a 5 in the
stem column
and a 6 and the leaf column and so
that's how you can make
a stem and leaf plot now it's always
good to have a key
so we could say that 2 1
represents 21. so
if someone looks at the stem relief plot
they know what you mean
let's try another example so let's say
we have the numbers
56 actually let's see
78 85
89 92
106
107 and 119.
go ahead and make a stem-a-leaf plot
with those numbers
so the first number 78 we're going to
write a 7
in the first column and the 8 in the
second column
next we have 85 and 89
so we're going to put an 8 for the first
digit and then 5 and 9 for the second
and then it's 92 so we're going to write
9 and 2.
now for 106 we're going to put a 10
in the stem column but a 6 in the leaf
column
and for 107 we just got to add a 7 here
and there's usually no commas so let's
get rid of that
and then finally for 119 we're going to
write 11 in the stem plot
and 9 in the the leaf column
and so for example this will be our key
so this represents 92
and this would represent 106. because
sometimes you could have decimal values
for instance let's say if we have 1.2
1.6 1.8
2.1 2.3 2.3
and 2.5 we can construct this
stem-and-leaf plot like this
so we could start with one and to write
1.2 we could just put a 2 for the leaf
plot
now for 1.6 we just got to put a 6
in the the leaf column and for 1.8 just
an 8.
now we can move on to the twos so we
have 2.1
2.3 2.3 and 2.5
and so that's how you can make a
stem-a-leaf plot using decimal values
so we can say 1 6
represents 1.6 in this particular
example
now the next thing we're going to talk
about is something called
a frequency table
so given a data set how can we make a
frequency table
so let's say we have the numbers
5 9 8
7 8 12
nine eight ten
eight nine seven so using those numbers
how can we make a frequency table
we're going to put two columns so the
first column will represent the number
and the second column will represent the
frequency
and let's put down what we have
so the first number is a five and how
many fives do we have
there's only one five so the frequency
is one
the next number is a seven
and notice that we have two sevenths so
the frequency is two
next is eight we have a total
of four eighths
so that gives us a frequency of four
after that we have nine and
i've spotted three nines
and then there's one ten and we have
one twelve
so that's a simple way in which you can
make a frequency table
now here's another question for you how
can we
use the frequency table in order to
calculate
the sample mean how can we calculate the
average
instead of just adding all of those
numbers up and dividing by the number
of data points in a set
let's add another column and we're going
to call this the sum
now we have one five
so five times one is five we have two
sevens
if you add up seven and seven you get
fourteen
we have four eights eight times four is
thirty two
or if you add eight four times you get
thirty two
nine times three is twenty 27 10 times 1
is 10
12 times 1 is 12. now we're going to add
up the sum column
to get the total sum so 5
plus 14 plus 32 plus 27
plus 10 plus 12 that's 100
and we're also going to add up the
frequency column that's going to give us
the total number of numbers that we have
here
so we have 1 plus 2 which is 3 plus 4
that's 7
plus 3 that's 10 plus 2 that's 12.
so we have a total of 12 numbers
so the mean is going to be the sum
divided by
the total number of points that we have
in our set
so it's a hundred divided by twelve
so the mean in this example is eight
point three repeating
and so that's how you could use the
frequency table to calculate the mean
of a data set
now the next thing that you need to know
how to do is how to create a histogram
a histogram looks like a bar graph
but unlike a bar graph a histogram has
its bars connected to each other
so here's an example of a histogram
on the left side
on the right side i'm going to draw a
bar graph
so this would be a bar graph
so as you can see they're very similar
but the bars in the histogram
they're adjacent to each other there's
no space in between
but how do we go about taking the data
set and making a histogram
so let's say we have the test scores
of students in a typical class
let's say the test scores are 65
72 93
68 76
98 let's say 84
85 79 88
90 82
83 87 and 78
now the first thing we need to do is
create a frequency distribution table
and so what we're going to do this time
rather than
talk about how frequent or rather than
describing the frequency of each number
we're going to break it up into
categories or classes
on the left side we're going to have the
grade on the right side
the frequency
now i'm going to categorize the grades
in levels of 10. so a d would be
60 to 69. a c
would be 70 to 79. a grade of a b
would be 80 to 89. and an a is going to
be 90 to 100
so those are of our four categories or
four classes
now how many students received a grade
between 60 and 69
so notice that there are two students we
have the grades 65 and 68
so the frequency will be 2.
now how many students received a c
on their exam how many students received
a grade of
somewhere between 70 and 79. so we have
one two three
and four so four students got a c
on their exam what about a b
so we have 84 85 88
82 83 87 so i counted a total of six
and what's left over are those who got
an a a 93 a 98 and a 90.
so now that we have our frequency
distribution table
we can now make a histogram
so on the y-axis we're going to plot the
frequency
on the x-axis we're going to put the
grades
so the grades will vary between 60
70. 80
90 and 100 because the classes are
they're separated by intervals of 10
approximately
now the highest frequency is six so
let's go by one
one two three four
five six
so let's plot the first one between 60
and 69
which is close to 70 the frequency is a
two
so it's going to look like that and then
between 70 and 79
four students got a grade in that region
i'm going to use the same color
and between 80 and 89
six students
fall in that category and between 90 and
100
only three students
received an a so let's put the grade so
this is a d
two students got a d four students
got a c and six students
received a b and three students received
an a so that's how you can create a
histogram
from a data set like this using a
frequency distribution table
now there's something else that we need
to go over
and that is making a table with the
frequency
the relative frequency and also the
cumulative
relative frequency so let's say we have
the numbers 2
3 5 3
6 eight seven
eight three three
five three seven
three eight five two
seven 783
so we're going to have four columns
the first column is going to be the
value
the second column will be the frequency
the third column is going to be the
relative frequency
and the last one is going to be the
cumulative
the relative frequency so the lowest
value that we have in our list
is two and notice that we have
two twos so the frequency for that
number is two
next we have a three and we have
one two three four
five six seven threes
so that's the frequency the next number
in the list is five
we have one two three fives
now we only have one six which is here
next is a seven and there's one
two three sevens
and i need to extend this list
and then finally we have some eights one
two
three four eighths
now let's take the sum of the frequency
column
so if we add these numbers 2 plus 7 is 9
plus 3 that's 12
plus 1 is 13 plus 3 is 16 plus
4 is 20. so we have a total of 20
numbers
in our set now how do we calculate the
relative frequency
so for the first entry in that column
take the frequency
and divide it by the total number of
numbers that you have in the data set so
the relative frequency is basically
the frequency divided by n so for the
first one it's going to be 2 over 20
which is 1 out of 10 and so that's point
10.
for this one it's going to be 7 divided
by 20
which is 0.35
and for the next one it's 3 divided by
20. which is point
15 and then it's 1 divided by 20.
that's 0.05 3 over 20 is point 15 again
and then 4 out of 20 is .20
now if you add up all of these numbers
you should get one
next we have the cumulative relative
frequency
so we're going to start with point 10
and then we're going to add these two
numbers
0.10 plus 0.35
is 0.45 now 0.45
plus 0.15 that's 0.60
and then if we add 0.60 and 0.05
that's going to be 0.65 and then add in
those two numbers
that's going to give us 0.80 and then
.80 plus .20
will give us one
so that's how you can complete the
cumulative relative frequency table
given a data set
now let's talk about how we could use
this information
using this table what is the value
of the 60th percentile
what would you say
so what we need to do is look at the
cumulative relative frequency
so the 60th percentile will end here
now notice that it's exactly 0.60 which
corresponds to 60
to find the 60th percentile you need to
average
these two values so you have to do five
plus six divided by two the average of
five and six
is five point five and so that's going
to be
the 60th percentile now what about
the 80th percentile
so notice that we have exactly 0.8 in
the cumulative
relative frequency column so what we're
going to do is we're going to average
these two numbers
the average of seven and eight is 7.5
now what if it's not listed in the
cumulative relative frequency table
for instance let's say if we have or if
we want to find
the 20th percentile what value
corresponds to that
now the 20th percentile is between 0.10
and 0.45
it's important to understand that after
0.10 you're going to exceed the value of
two
you're going to go into the threes and
after 0.45
you're gonna move from the threes to the
fives
so the 20th percentile because it's more
than two but less than three it's going
to be three
there's no numbers in between here if
you look at the data that we have
so the 20th percentile is going to fall
in this number to explain it better it's
important to understand that between 0
and 0.10 the value is 2.
between 0.10 and 0.45 the value is 3.
now if your percentile falls between
0.45 and 0.60
not including 0.45 and 0.60 but if it's
between those numbers
then this can be 5. it's going to be 6
if it's between 0.60 and 0.65
and it's going to be 7 if it's between
0.65 and 0.8
so let's say if we want to determine the
75th
percentile
the 75th percentile is between 0.65
and 0.8 so because it's more than 0.65
we're not gonna have the value six we're
going to get seven
so that's going to be the 75th
percentile
now let me help you to see this visually
so that it makes more sense let's begin
by
arranging the numbers in increasing
order so we have two twos
we have a total of seven threes
and we have three fives one six
three sevenths and 4 8.
now i'm writing these in pairs of twos
and you'll see why
we have a total of 20 numbers
and if you take 20 and divide it by
2 you're going to get 10 equal parts
so this is going to be the 10th
percentile let me put that in a
different color
so here's the 10th percentile the 20th
percentile
the 30th and so forth
so this would be the 100th percentile
there's nothing higher than that
so the first thing that we went over was
the 60th percentile
which is here so notice that we have a 5
on the left
and a 6 on the right so we need to
average five and six
and so we said the 60th percentile
was 5.5 the second one was the 80th
percentile
which is here notice that it's between
two different numbers
seven and eight so if we average seven
and eight
it will give us seven point five
now for that we talked about the 20th
percentile
and the two numbers that it's between
are identical to each other
so therefore the 20th percentile is just
three
and the last one was the 75th percentile
which is between 70 and 80.
and so these two numbers are just seven
and so the 75th percentile has to be
seven
so now you can visually see the values
that correspond
to the different percentiles
if you want to you could put zero as
well i forgot to do that
but that's basically it for this video
so hopefully i gave you a good
uh introduction into statistics there's
a lot of other stuff
that you'll learn in this course but
these are just some of the basics
thanks again for watching
Loading video analysis...