LongCut logo

Introduction to Statistics

By The Organic Chemistry Tutor

Summary

## Key takeaways - **Calculate Mean, Median, Mode, and Range**: To find the mean, sum all numbers and divide by the count. The median is the middle number in an ordered set. The mode is the most frequent number, and the range is the difference between the highest and lowest values. [00:32], [01:45] - **Median for Even Data Sets**: When a data set has an even number of values, the median is the average of the two middle numbers. For example, if 21 and 37 are the middle numbers, their average (29) is the median. [05:10], [05:30] - **Bimodal Data Sets**: A data set can have more than one mode. If two numbers appear with the same highest frequency, both are considered the mode, resulting in a bimodal data set. [06:02], [06:15] - **Identifying Outliers**: An outlier is a data point outside the range of Q1 - 1.5*IQR to Q3 + 1.5*IQR. For instance, if the range is -4 to 28, a value like 29 would be an outlier. [09:14], [13:52] - **Skewness and Distribution Tails**: A distribution skewed to the right has a tail extending towards higher values, with the mean being greater than the median. Conversely, a left skew has a tail towards lower values, with the mean less than the median. [27:28], [30:01] - **Creating Histograms**: Histograms represent data using adjacent bars, unlike bar graphs. They are constructed from frequency distribution tables that group data into classes or categories. [42:07], [43:40]

Topics Covered

  • The Median: Finding the Middle Number
  • The Mode: Identifying the Most Frequent Number
  • Calculating Quartiles (Q1 and Q3) and the Interquartile Range (IQR)
  • Building a Frequency Distribution Table for Histograms
  • Calculating Relative and Cumulative Relative Frequency

Full Transcript

let's start with this problem

find the mean median mode

and range of the following data set

now the first thing that i would

recommend doing

is arranging the numbers in increasing

order

so the lowest number is seven and we

have two of them

after that the next number is 10 and

then

14 15

23 and 32.

now to calculate the mean what we need

to do

is we need to take the sum of the sum of

the numbers

and divided by the seven numbers that

are in the data set so this is going to

be 7

plus 7 plus 10 and so forth

and then we're going to divide it by the

seven numbers

so the mean is basically the average of

those numbers

so the total sum that i got is a hundred

and eight if you take 108 and divide it

by seven

that's going to give you 15.4

i'm gonna round it to three so it's

approximately 15.43

so that is the mean that's how you can

find it

now what about the next thing how can we

find the median

of this data set

the median is basically the middle

number

so what i like to do is eliminate the

first

and the last number and then working

towards the middle i'm going to

eliminate the next two numbers

until i'm left with the middle number so

as we could see in this example

the median is equal to

14. it's simply the middle number of

data set

now what about the mode what is the mode

in this problem the mode is simply the

number

that occurs the most frequently

i think i said that wrong it's basically

the the most frequent number

in the data set and notice that 7

appears twice in this data set so 7

is the mode

now what about the range

the range is simply the difference

between

the highest number and the lowest number

so it's going to be 32 minus 7

which is 25. and so now you know how to

find the mean

median mode and range of a data set

now let's move on to number two now this

is going to be a similar problem

but it's not exactly the same as you can

see

we have eight numbers in our data set

as opposed to the seven that we had

previously

so let's begin by putting these numbers

in order

so the lowest number is 11 and then the

next number is 15

and then we have 21

and then 37 41

and 59 so let's begin by calculating

the mean so the mean

represented by the symbol x bar is equal

to the sum

divided by the number of numbers in the

data set

so let's add up the eight numbers

this is going to take some time

and then divide the sum by those eight

numbers

so we have 11 plus 15 plus another 15

and then 21 37

41 and 259

so i got a sum of 258. if we divide that

by 8

you'll find that the mean is 32.25

so that's the answer for the first part

of the problem

now let's move on to the second part

let's calculate the median

as we said before the median

is simply the middle number

so let's eliminate the first and the

last numbers

and then the next two and the next two

notice that we don't have one number in

the middle but this time

we have two numbers in the middle so

what should we do

if we come across a situation

in this case what you need to do is take

the average

of those two middle numbers so you need

to add them up and divide

by two two plus three is five

seven plus one is eight so 21 plus 37 is

58

and if you divide it by two this gives

you 29 so 29 is the median

in this data set

now what about the mode

what is the mode in this example

now as we said before the mode is the

number

that is basically is the most frequent

number in the data set

but this problem is different from the

last one because

we have two numbers that appear twice 15

and 59 so which of these is the mode

it turns out that they're both represent

the mode

so the mode is 15 and 59

so what we have is something known as a

bimodal

data set because there's two modes

instead of one

it's not unimodal

now what about the range well there's

nothing different about the range

in this problem compared to the last

problem it's simply the highest number

divided by the lowest number so you can

say h

minus l so the highest number is 59

as we could see here and the lowest

number is 11.

so 59 minus 11 is 48.

so now you know how to find the mean

median mode

and range of a data set

now let's talk about finding the

quartiles

and the interquartile range

so what i'm going to do right now is i'm

going to

make basically a number line

with a beginning and an end and let's

say this number line represents

our data the lowest value

is known as the minimum the highest

value in our data set

is the maximum now we're going to break

this number line

into four equal parts

the first part is known as q1 this is

the first

quartile the second one is q2 the second

quartile

and then this is the third quartile

so let's say if the data was normally

distributed

this would be at a zero percent level

this would be 25

50 75

and then this will be 100 so you could

see

how the quartiles are related to each

other respectively

now how do we go about finding q1

q2 and q3 how do we do that

q2 is basically the median

of the entire data set

q1 is the median

of the lower half of the data set

and q3 is the median of

the upper half of the data set

now the interquartile range represented

by iqr

this is the difference between

q3 and q1 so once you find q1 and q3

you can now calculate the interquartile

range

now the next thing that i want to

mention is the ability to find

or identify if a number in a data set is

an

outlier

so here's what you need to know it's not

going to be an

outlier if it's within this range

if it's between q1 minus 1.5

times the iqr so that's the lowest that

it can be

or the highest it can be is q3 plus

1.5 times the iqr

so if you have a number that is within

this range

it is not an outlier but if you have a

number in the data set

that is outside of this range and then

that number

is an outlier so let's work on an

example

let's say we have the numbers 7

11 14 5

8 27 16

10 13 17 and 16.

go ahead and identify q1 q2 and q3

calculate the interquartile range the

iqr

and determine if there's any outliers

in this data set feel free to pause the

video and

use what you know to try

now as always the first thing we should

do is organize

the data the lowest number is 5

and then we have 7 8

10 11 perhaps it's uh best if we

cross it off as we go along and then the

next number

is uh 13 and then

14 16 there's two 16s

and then the 17 and then 27.

now what do you think is our next step

in order to find the interquartile range

and the three quartiles

what's our next step the best thing to

do

at this point is to determine the median

of the entire data set

which is going to be q2 so we could

eliminate the first two numbers

the first and the last number and then

the next two

until we get a number in the middle

so notice that 13 is in the middle

so therefore 13 is going to be

q2 now what i like to do is

i'm going to get rid of this number for

now

and i'm going to put a line between

the left side and the right side so i

want to separate

the lower half of the data set with the

upper half

of the data set but the 13 is still here

though

so just keep that in mind so that's our

q2 value

now q1 is the median of

the lower half of the data set so what

is the median

of those five numbers the median

is simply going to be the middle number

of those five numbers

so q1 is 8. now what is the median

of the upper half of the data set

notice that the middle number is 16 so

that is

q3 notice that we have a total of

11 data points and so that's why

13 is not included in the lower half or

the upper half

because if it was one side will have

five numbers the other side will have

six

and that's why i chose to write it up

here so that

the lower half is the same as the upper

half

they both contain five numbers

now let's go ahead and calculate the

interquartile range

iqr so we said it's the difference

between the third quartile

and the first quartile so it's going to

be 16 minus 8

which is 8. and so that's how you could

find the interquartile range of a data

set

now what about the presence of any

outliers

so looking at these numbers do you think

we have

a number that really stands out that

doesn't belong

right now 27 appears to be very far

off from all the other numbers so do you

think 27

is an outlier

well let's find out

so let's write down what we know

the presence of an outlier is based upon

this range

it's q1 minus 1.5 times the iqr

to q3 plus 1.5

times the iqr

so we know that q1 is 8

and the interquartile range is also

8. q3 is 16.

so this is going to be 16 plus 1.5 times

8.

what is 1.5 times 8 1 times 8 is 8.5

times 8 is 4

8 plus 4 is 12. so this is going to be 8

minus 12

and this is 16 plus 12.

now 8 minus 12 is negative 4 and

16 plus 12 is 28.

so now looking at what we have is 27

an outlier based on its

range because 27

is between negative 4 and 28 27

is not an outlier now if we had 29

that would be an outlier so now you know

how to determine

if a point is an outlier within a range

now let's talk about how we can create a

box and whisker plot

the reason why we want to talk about

this now is because it's related

to the values of q1 q2 and q3

so typically a box in whisker plot looks

something like this

assuming if there's no outliers

this is going to be the lowest value on

the right we're going to have the

highest value or the maximum

and then this line here represents the

value of the first quartile

which is the 25th percentile

in the middle we have q2 the second

quartile which is the 50th percentile

and then this is q3 the 75th percentile

and so that's the basic shape of a box

and whisker plot

now what about if we have an outlier

let's say if we have one to the right

then this will no longer be the maximum

the outlier is shown as a point it's

outside of the box in whisker plot

now if it's on the left side this will

no longer be the minimum

and that will be the outlier there so

let's work on an example

let's say we have the numbers 16

18 28

13 50 31

25 22

and let's say uh another 18

23 29

31 and 38 actually i wrote down 31

already

let's make this 38

so we have 12 numbers

using this data set go ahead and find

the interquartile range q1 q2

q3 identify the presence of any outliers

and then using all of that information

construct

a box and whisker plot feel free to

pause the video if you

want to try that yourself now the first

step

as always is to write the numbers in

increasing order

so we have 13

16 18

so let's get rid of those numbers

and then there's another 18 and then

22 23

next is 25

and then 28

29

and then the last three are 31

38 and 50.

now what i like to do is break up the

data

into four quarters or four sections

now because we have an even number of

data points we have a total of 12

we can put a line right in the middle so

now this is the lower half of the data

set

and here we have the upper half

so let's determine the median for the

entire data set

because it's even we can't just

immediately

identify the median the median is going

to be an average

of these two numbers if we eliminate the

first two

and then the next two and so forth

we will eventually get to these two

numbers so what is the median between

or what is the average between 23 and 25

so if you add up those two numbers and

divide by two

you're just going to get the midpoint of

23 and 25

which is 24.

so 24 is the second quartile

this is the q2 value

now let's focus on the lower half of the

data

what is the median of the lower half of

the data

so notice that we have six data points

in that section and because it's even

the median is going to be an average of

these two numbers

so let's put a line there the average of

18 and 18

is 18. so that's our q1 value

now what about the median of the upper

half

of the data so because we have six

numbers here we're gonna put a line

right in the middle

so we have 3 on the left 3 on the right

and the median is simply going to be the

average of these two numbers

the average of 29 and 31 is the number

in the middle

which is 30.

and so this is why i like to use these

lines here

so now we have three numbers

in each of the four sections of our data

so now that we know q1 q2 and q3

what is the interquartile range what's

our iqr value

the iqr is the difference between the

third quartile

and the first quartile so it's going to

be 30

minus 18 which is 12 in this example

so now that we have that

our next step is to determine if we have

any outliers

in this problem

so remember this is what we need we need

to create a range

the lowest point of it will be q1 minus

1.5

times the iqr and the highest point of

the range

will be q3 plus 1.5 times the iqr

now i'm going to have to get rid of a

few things

so let's write down the information that

is important

so q1 is 18

q2 is 24

q3 is 30. our minimum

the lowest value is 13

and the maximum our highest value is 50.

so let's keep that in mind so now i can

get rid of this

you may want to write that down just in

case we need to go back to it

and i can also get rid of this too

so let's plug in what we know into this

expression

so q1 is 18 and

the iqr i gotta write that down again

that was uh 12

it was 30 minus 18 which is 12.

so this is going to be 18 minus 1.5

times 12

and then q3 is 30 plus

1.5 times 12.

now what is 1.5 times 12 so it's

basically 12

plus half of 12 which is six twelfth

plus six is eighteen

eighteen minus eighteen is zero thirty

plus eighteen

is forty eight so

do we have any outliers

are there any numbers outside of this

range

and it turns out that there is the

maximum 50

is not between 0 and 48 so this

is an outlier

now going back to our original data i'm

just going to rewrite it

so you could see everything

this is what we had

so now at this point let's go ahead and

let's make a number line

so 0 is going to be our lowest point and

we're going to go up to 50.

so let's go by tens

now let's put a mark in the middle to

represent

the fives

so let's begin by drawing a box

so we need to draw a box ranging from q1

to q3

so that's going to start at 18 and this

is basically a rough estimate

it's not going to be perfect and the end

of the box

will be at 30.

now 13 is the minimum

which is approximately around that area

so that's the left side of the box and

50 is the maximum but that's an

outlier so we're going to put a point at

50.

38 is the second highest

which is not an outlier so that's going

to be part of the box

and whisker plot and so 38 is in this

region

so we're going to say it's over there

now q1 is 18

q3 is 30 but we also need to write what

q4

i mean not q4 but where q2 is q2 is 24

which is around here

so this is q1 that's 18 q2

and q3

so the left of the box represents the

interquartile range

which is 12. that's 30 minus 18.

and so here is the minimum and this is

the maximum which is the outlier but

that's how you can construct a box and

whisker plot

given a data set

now the next topic we need to talk about

is skewness

so let's say we have

this representation of our data

and notice that it is symmetrical

this line represents the median

and if you have a data that is perfectly

symmetrical

the mean this is the sample mean

it's going to be equal to the median

now the box and whisker plot will look

something like this

q2 is going to be right in the middle of

q1 and q3

so notice that the box plot is evenly

distributed

the left side is the same as the right

side and left

and also these lines are equal in length

so that represents a symmetric

distribution

now what if it's not symmetrically

distributed

what's going to happen in that case

there are two possibilities the data can

be skewed to the right

or it can be skewed to the left

so which one would you say this

particular shape represents

would you say it's skewed to the right

or skewed to the left

notice that we have a tail

that extends towards the right so

this particular data or this graph

we say that it's skewed

to the right now what is the

relationship between

the sample mean and the median in this

case

so the median the middle portion of the

data will be somewhere in this region

and the sample mean will be to the right

of the median

since it's skewed to the right so the

mean

will be greater than the median in this

case

by the way whenever you have a shape

that's skewed to the right

some textbooks will refer this as a

positive skew

and it makes sense because positive

numbers on a number line

will be on the right side

now you need to be familiar with the box

and whisker plots

for this type of distribution

so here's one example

notice that the right side of the box

is longer than the left side so this

tells us

that q3 minus q2

is greater than q2 minus q1

and so in that case looking at the box

and whisker plot

you can see that it's skewed to the

right now sometimes

these two boxes may be equal in left

nevertheless this side might be longer

than this side so even if the boxes are

of the same life if this side is longer

it will also be skewed

to the right

now what about if it's skewed to the

left

in this case the graph is going to look

something like this

let me try that again

so notice that the tail is on

the left side so in this case we have a

negative skew

where we can say that is skewed to the

left

now the median will be somewhere in this

region

and the mean is going to be to the left

of the median since it's q to the left

so the sample mean is less than median

now how can we represent this using a

box plot

well here's one possibility

so in this case q2 will be closer to q3

in the box plot so as you can see the

left side

is longer than the right side

so we could say that q2 minus q1

is greater than q3 minus q2 in this case

now if the boxes are equal in life

you can also tell that we have a

negative skew

if the left side of the box plot

is longer than the right side

so that's another indication that the

data is skewed to the left

now there's some other things that you

need to know if you're going to take

a statistics course and you need to be

able to create

a dot plot so let's say if you have the

numbers 5

8 3 7

1 5 3

2 3 3 eight

five with this information how can we

construct

a dot plot well we can begin by

drawing a number line so let's say this

is zero

one 2 3

4 and so forth

we could stop by 8 since 8 is the

highest number

now the first number is a 5. so all we

need to do

is draw a dot above the number five

and then let's put the dots one at a

time

so the next number is an eight so we're

going to draw a dot

at eight and then it's a three

so let's put the dot there next we have

a seven

and then it's a one

and then a five so notice that we have a

second five

all we need to do is draw another dot

above the first one

and then it's a three two

and then another three and another one

and then eight and then five

so that's how you can make a dot plot

now using this dot plot which

number is the mode what would you say

now if you recall the mode is the number

in the data set that occurs most

frequently

so in this case it's the number with the

most dots

so the mode for this data set is 3. by

the way

if you haven't done so already don't

forget to subscribe to this channel

and click on that notification bell now

let's talk about how we can make

a stem and leaf plot

so let's say we have the numbers 4

9 13 13

17 21 36

38 let's see

another 38

and then 56. how can we make a

stem-and-leaf plot

with this data so the first thing we

need to do

is we need to write two columns

on the left it's going to be the stem

and on the right the leaf

so the first number is four so for the

stem we're going to write

zero and for the leaf we're going to put

four the next one is nine

so we're going to write or represent

nine as zero nine

so we have zero on the left and then

we're going to put nine

on the right the next number

is thirteen

so the first digit is a one the second

digit we're going to put

in the second column now we have another

13.

so all we got to do is add another three

now for 17

we need one and seven we already have

the one but we need to write the 7

to the right side now for 21

we need to put a 2 in the stem column a

1 in the leaf column

next we have 36 so we need a 3

in a stem column a 6 in the leaf column

now notice that we have 3 38

so we got to add 3 8 to the leaf column

and finally 56 now we don't have

anything in the 40s

so we're going to write a 4 but we're

not going to put anything here

for 56 we're going to write a 5 in the

stem column

and a 6 and the leaf column and so

that's how you can make

a stem and leaf plot now it's always

good to have a key

so we could say that 2 1

represents 21. so

if someone looks at the stem relief plot

they know what you mean

let's try another example so let's say

we have the numbers

56 actually let's see

78 85

89 92

106

107 and 119.

go ahead and make a stem-a-leaf plot

with those numbers

so the first number 78 we're going to

write a 7

in the first column and the 8 in the

second column

next we have 85 and 89

so we're going to put an 8 for the first

digit and then 5 and 9 for the second

and then it's 92 so we're going to write

9 and 2.

now for 106 we're going to put a 10

in the stem column but a 6 in the leaf

column

and for 107 we just got to add a 7 here

and there's usually no commas so let's

get rid of that

and then finally for 119 we're going to

write 11 in the stem plot

and 9 in the the leaf column

and so for example this will be our key

so this represents 92

and this would represent 106. because

sometimes you could have decimal values

for instance let's say if we have 1.2

1.6 1.8

2.1 2.3 2.3

and 2.5 we can construct this

stem-and-leaf plot like this

so we could start with one and to write

1.2 we could just put a 2 for the leaf

plot

now for 1.6 we just got to put a 6

in the the leaf column and for 1.8 just

an 8.

now we can move on to the twos so we

have 2.1

2.3 2.3 and 2.5

and so that's how you can make a

stem-a-leaf plot using decimal values

so we can say 1 6

represents 1.6 in this particular

example

now the next thing we're going to talk

about is something called

a frequency table

so given a data set how can we make a

frequency table

so let's say we have the numbers

5 9 8

7 8 12

nine eight ten

eight nine seven so using those numbers

how can we make a frequency table

we're going to put two columns so the

first column will represent the number

and the second column will represent the

frequency

and let's put down what we have

so the first number is a five and how

many fives do we have

there's only one five so the frequency

is one

the next number is a seven

and notice that we have two sevenths so

the frequency is two

next is eight we have a total

of four eighths

so that gives us a frequency of four

after that we have nine and

i've spotted three nines

and then there's one ten and we have

one twelve

so that's a simple way in which you can

make a frequency table

now here's another question for you how

can we

use the frequency table in order to

calculate

the sample mean how can we calculate the

average

instead of just adding all of those

numbers up and dividing by the number

of data points in a set

let's add another column and we're going

to call this the sum

now we have one five

so five times one is five we have two

sevens

if you add up seven and seven you get

fourteen

we have four eights eight times four is

thirty two

or if you add eight four times you get

thirty two

nine times three is twenty 27 10 times 1

is 10

12 times 1 is 12. now we're going to add

up the sum column

to get the total sum so 5

plus 14 plus 32 plus 27

plus 10 plus 12 that's 100

and we're also going to add up the

frequency column that's going to give us

the total number of numbers that we have

here

so we have 1 plus 2 which is 3 plus 4

that's 7

plus 3 that's 10 plus 2 that's 12.

so we have a total of 12 numbers

so the mean is going to be the sum

divided by

the total number of points that we have

in our set

so it's a hundred divided by twelve

so the mean in this example is eight

point three repeating

and so that's how you could use the

frequency table to calculate the mean

of a data set

now the next thing that you need to know

how to do is how to create a histogram

a histogram looks like a bar graph

but unlike a bar graph a histogram has

its bars connected to each other

so here's an example of a histogram

on the left side

on the right side i'm going to draw a

bar graph

so this would be a bar graph

so as you can see they're very similar

but the bars in the histogram

they're adjacent to each other there's

no space in between

but how do we go about taking the data

set and making a histogram

so let's say we have the test scores

of students in a typical class

let's say the test scores are 65

72 93

68 76

98 let's say 84

85 79 88

90 82

83 87 and 78

now the first thing we need to do is

create a frequency distribution table

and so what we're going to do this time

rather than

talk about how frequent or rather than

describing the frequency of each number

we're going to break it up into

categories or classes

on the left side we're going to have the

grade on the right side

the frequency

now i'm going to categorize the grades

in levels of 10. so a d would be

60 to 69. a c

would be 70 to 79. a grade of a b

would be 80 to 89. and an a is going to

be 90 to 100

so those are of our four categories or

four classes

now how many students received a grade

between 60 and 69

so notice that there are two students we

have the grades 65 and 68

so the frequency will be 2.

now how many students received a c

on their exam how many students received

a grade of

somewhere between 70 and 79. so we have

one two three

and four so four students got a c

on their exam what about a b

so we have 84 85 88

82 83 87 so i counted a total of six

and what's left over are those who got

an a a 93 a 98 and a 90.

so now that we have our frequency

distribution table

we can now make a histogram

so on the y-axis we're going to plot the

frequency

on the x-axis we're going to put the

grades

so the grades will vary between 60

70. 80

90 and 100 because the classes are

they're separated by intervals of 10

approximately

now the highest frequency is six so

let's go by one

one two three four

five six

so let's plot the first one between 60

and 69

which is close to 70 the frequency is a

two

so it's going to look like that and then

between 70 and 79

four students got a grade in that region

i'm going to use the same color

and between 80 and 89

six students

fall in that category and between 90 and

100

only three students

received an a so let's put the grade so

this is a d

two students got a d four students

got a c and six students

received a b and three students received

an a so that's how you can create a

histogram

from a data set like this using a

frequency distribution table

now there's something else that we need

to go over

and that is making a table with the

frequency

the relative frequency and also the

cumulative

relative frequency so let's say we have

the numbers 2

3 5 3

6 eight seven

eight three three

five three seven

three eight five two

seven 783

so we're going to have four columns

the first column is going to be the

value

the second column will be the frequency

the third column is going to be the

relative frequency

and the last one is going to be the

cumulative

the relative frequency so the lowest

value that we have in our list

is two and notice that we have

two twos so the frequency for that

number is two

next we have a three and we have

one two three four

five six seven threes

so that's the frequency the next number

in the list is five

we have one two three fives

now we only have one six which is here

next is a seven and there's one

two three sevens

and i need to extend this list

and then finally we have some eights one

two

three four eighths

now let's take the sum of the frequency

column

so if we add these numbers 2 plus 7 is 9

plus 3 that's 12

plus 1 is 13 plus 3 is 16 plus

4 is 20. so we have a total of 20

numbers

in our set now how do we calculate the

relative frequency

so for the first entry in that column

take the frequency

and divide it by the total number of

numbers that you have in the data set so

the relative frequency is basically

the frequency divided by n so for the

first one it's going to be 2 over 20

which is 1 out of 10 and so that's point

10.

for this one it's going to be 7 divided

by 20

which is 0.35

and for the next one it's 3 divided by

20. which is point

15 and then it's 1 divided by 20.

that's 0.05 3 over 20 is point 15 again

and then 4 out of 20 is .20

now if you add up all of these numbers

you should get one

next we have the cumulative relative

frequency

so we're going to start with point 10

and then we're going to add these two

numbers

0.10 plus 0.35

is 0.45 now 0.45

plus 0.15 that's 0.60

and then if we add 0.60 and 0.05

that's going to be 0.65 and then add in

those two numbers

that's going to give us 0.80 and then

.80 plus .20

will give us one

so that's how you can complete the

cumulative relative frequency table

given a data set

now let's talk about how we could use

this information

using this table what is the value

of the 60th percentile

what would you say

so what we need to do is look at the

cumulative relative frequency

so the 60th percentile will end here

now notice that it's exactly 0.60 which

corresponds to 60

to find the 60th percentile you need to

average

these two values so you have to do five

plus six divided by two the average of

five and six

is five point five and so that's going

to be

the 60th percentile now what about

the 80th percentile

so notice that we have exactly 0.8 in

the cumulative

relative frequency column so what we're

going to do is we're going to average

these two numbers

the average of seven and eight is 7.5

now what if it's not listed in the

cumulative relative frequency table

for instance let's say if we have or if

we want to find

the 20th percentile what value

corresponds to that

now the 20th percentile is between 0.10

and 0.45

it's important to understand that after

0.10 you're going to exceed the value of

two

you're going to go into the threes and

after 0.45

you're gonna move from the threes to the

fives

so the 20th percentile because it's more

than two but less than three it's going

to be three

there's no numbers in between here if

you look at the data that we have

so the 20th percentile is going to fall

in this number to explain it better it's

important to understand that between 0

and 0.10 the value is 2.

between 0.10 and 0.45 the value is 3.

now if your percentile falls between

0.45 and 0.60

not including 0.45 and 0.60 but if it's

between those numbers

then this can be 5. it's going to be 6

if it's between 0.60 and 0.65

and it's going to be 7 if it's between

0.65 and 0.8

so let's say if we want to determine the

75th

percentile

the 75th percentile is between 0.65

and 0.8 so because it's more than 0.65

we're not gonna have the value six we're

going to get seven

so that's going to be the 75th

percentile

now let me help you to see this visually

so that it makes more sense let's begin

by

arranging the numbers in increasing

order so we have two twos

we have a total of seven threes

and we have three fives one six

three sevenths and 4 8.

now i'm writing these in pairs of twos

and you'll see why

we have a total of 20 numbers

and if you take 20 and divide it by

2 you're going to get 10 equal parts

so this is going to be the 10th

percentile let me put that in a

different color

so here's the 10th percentile the 20th

percentile

the 30th and so forth

so this would be the 100th percentile

there's nothing higher than that

so the first thing that we went over was

the 60th percentile

which is here so notice that we have a 5

on the left

and a 6 on the right so we need to

average five and six

and so we said the 60th percentile

was 5.5 the second one was the 80th

percentile

which is here notice that it's between

two different numbers

seven and eight so if we average seven

and eight

it will give us seven point five

now for that we talked about the 20th

percentile

and the two numbers that it's between

are identical to each other

so therefore the 20th percentile is just

three

and the last one was the 75th percentile

which is between 70 and 80.

and so these two numbers are just seven

and so the 75th percentile has to be

seven

so now you can visually see the values

that correspond

to the different percentiles

if you want to you could put zero as

well i forgot to do that

but that's basically it for this video

so hopefully i gave you a good

uh introduction into statistics there's

a lot of other stuff

that you'll learn in this course but

these are just some of the basics

thanks again for watching

Loading...

Loading video analysis...