STAT 4243 - Introduction to Nonparametric Estimation
By Dr. Dylan Spicker
Summary
## Key takeaways - **Nonparametric Needs More Data**: Non-parametric techniques don't make explicit assumptions about the distribution of the data and will work for any problem, but they typically require a lot more data to be effective than parametric techniques. [01:35], [01:47] - **Monte Carlo is Parametric**: Monte Carlo techniques are a form of parametric inference because we determine an underlying distribution, estimate its parameters like mean and variance from data, then generate pseudo-random variables from that estimated distribution to assess estimators. [02:07], [03:06] - **Empirical CDF Definition**: The empirical CDF, denoted F_n(x), is the average of indicator functions I(X_i ≤ x) over the sample, counting the proportion of observations less than or equal to x. [18:52], [19:14] - **ECDF Converges to True CDF**: By the law of large numbers, the empirical CDF converges pointwise to the true CDF F(x), as it is a sample average of indicators whose expectation is F(x). [20:10], [21:44] - **Plug-in Principle for Estimators**: For any parameter θ(F) framed as a functional of the CDF, the plug-in estimator is θ(F_n), which replaces the true CDF with the empirical CDF; for integrals like the mean, this yields the sample mean. [40:55], [44:33] - **ECDF is Discrete Uniform CDF**: The empirical CDF is the CDF of the empirical distribution, a discrete uniform distribution placing mass 1/n at each observed X_i, producing a staircase pattern jumping by 1/n at order statistics. [30:47], [31:28]
Topics Covered
- Nonparametric Needs More Data
- Monte Carlo is Parametric Inference
- Empirical CDF Estimates Any Distribution
- Plug-in Principle Generates Estimators
Full Transcript
hello everyone and uh welcome to another online stat 4243 lecture uh in today's lecture we're going to pick up where we
left off at the end of the last video lecture and so in the last lecture we sort of finished talking about Monte Carlo integration and then I started to
introduce this idea of parametric versus nonparametric uh inference or estimation and statistics and sort of the key idea that we were thinking about is that with
parametric inference or parametric estimation which is sort of what we're most used to dealing with in our statistics courses we're making very explicit assumptions about say the
distribution of the data that we've gone and collected and this works out quite well right so there's there's a lot of benefits to doing this parametric uh
estimation however we run into troubles when we are sort of in a situation that we can't be sure sure that the assumptions that we're making are correct and a lot of the techniques if
the assumptions you're making are incorrect sort of start to break apart right and so we introduced this idea of non-parametric estimators or
non-parametric uh inference where the idea was let's try to come up with statistical techniques that don't make explicit assumptions about the distribution of the data uh and as a
result they'll sort of work for any problem right and this was sort of this uh Grand idea now there are going to be some drawbacks to this and it's sort of
worth investigating what those are in particular uh if you have non-parametric techniques they're typically going to require a lot more data to be effective than a uh parametric technique would
right so this is going to be sort of one of the major downfalls here now before we get sort of into talking about uh parametric and non-parametric estimators a little bit
more I think I should spend some time trying to address why this is relevant for where we're at in the course so far and so the motivation that I'll sort of
put forth for this topic is that if you think about what we were doing with Monte Carlo techniques right we're essentially using the pseudo random
generated uh variates in order to either say test statistical methodology to estimate uh particular traits from A distribution to see how estimator
perform or to solve integrals in in the case of Monte Carlo integration right and in each one of these cases essentially what we did was we determined some underlying distribution
and then we generated data from that underlying distribution and we used that as the data that was sort of relevant to us right and so if you wanted to see what's the coverage uh probability for
this uh confidence interval right we generated data from some distribution and so in some sense Monte Carlo techniques are a form of uh
parametric inference right and so what you might think that you could do for instance is if you were pretty sure that a data set that you had generated or that you had observed was uh from a
normal distribution say you could go and estimate the mean and you could estimate the variance right and then essentially what you're saying is my data is coming from this specific normal distribution
right maybe a normal with you know some a mean of 10 and a variance of 15 right whatever the casee happens to be we could then use Monte Carlo
techniques to randomly sample pseudo random variables from that normal 1015 distribution right and in doing so essentially what we're saying is that if
that distribution is correct or is approximately correct then the data that we're generating from this Monte Carlo procedure are going to look like the data that you've actually generated or
observed in your sample and so so if you wanted to say test out how does an estimator perform then you could do it through this procedure so maybe I'll open up sort of the the Whiteboard here
and sort of write down what we're thinking about right so if you have You observe say X1
X2 up to xn right and let's say you are willing to assume that the XI our IID from
subnormal distribution with a mean and a variance Sigma uh mean mu Sigma squar variance right so then what we could do is we could say
one estimate mu as xar for instance to estimate Sigma
squar as say s squ right the sample variance estimator and then what we're saying is that uh therefore the XI are going to be
approximately uh normal distribution with xar s s right and so then let's say that we have some estimator
right so we have Theta hat right and it's going to estimate something from our distribution and let's say we wanted to know for instance what is the
variance right if you are willing to make this assumption that we've sort of set up here one thing that you could do is you could say well we can estimate the
variance by actually repeatedly Computing this estimator right so you could say uh to
estimate step one would be generate say um X11
X12 and so forth up to X1 n which would be IID from this
normal xar s s right so generate using Monte Carlo techniques a sample from this distribution and then compute
Theta hat one on that generated sample and then repeat say m times right so this is going to give us
right here Theta Hat 1 Theta Hat 2 up to Theta hat M right so what we think about what we're doing is we've used our original
sample to figure out what is the distribution of this population or what's the approximate distribution of this population and then we use Monte Carlo techniques to repeatedly sample from that approximate distribution so
that we generate this whole sequence of estimators that are computed on Monte Carlo data and then what we could say is that the variance of theta hat should be
approximately equal to the sample variance of these different estimators right so that would be something like 1/ M minus 1 the sum from I = 1 to M of say
Theta hat I minus and then I'll write Theta hat I or just Theta Hat Bar for instance squared right so the sample
sample variance of this and if you get the distribution correct this result is going to hold sort of through all of the arguments we were making about Monte Carlo
estimation and so it's really sort of useful to think about Monte Carlo as though it's a parametric technique in some sense right so in some in some
sense we're making assumptions about the distribution of the data that we have and then once you have those assumptions we know how to generate from any particular distribution right but this
technique right here it's not going to work if you don't know what the distribution of the data are right because how are you going to calculate the variance of your estimator when applied to a real world data set if you
don't know sort of where that data is actually coming from and so the goal with our discussion of nonparametric techniques is to get us to a point where we can take that same process that we
just covered where you would be able to say estimate the variance of an estimator using Monte Carlo and be able to do this even when we're not willing to make specific assumptions about the
distribution of the underlying data right and because we're not willing to make specific assumptions about the distribution of the underlying data this is going to be non-parametric right and
so in that uh way what we're hoping to do is sort of build up a non-parametric version of the uh Monte Carlo techniques and we're going to end up calling this
the bootstrap so if you've seen that before uh the bootstrap is sort of a non-parametric version of mon Carlo but in order to get to explaining the bootstrap I think it's important for us
to at least take a slight detour into talking a little bit more broadly about uh non-parametric techniques and non-parametric inference right so just to pull up the course notes Here for for
a moment to sort of guide us through where we're at um in the last lecture we sort of introduced parametric inference right and I sort of uh indicated that this was the type of inference that
you're most used to sort of thinking about right and the idea is that sort of to make it as applicable to the real world as we can uh typically we're not going to know what a distribution is
it's often not even going to follow some nice closed form distribution at all so we introduced this idea of
non-parametric techniques and in the lecture last time I sort of motivated that we at least have a sense of some non-parametric estimators right so uh if
you think about the sample mean the sample mean sort of at no point uh indicates that you know what the distribution is right and so because of that we can view the sample mean as a
non-parametric estimator to the expected value of a distribution so long as that dist distribution actually has an expected value right and so we sort of set up this integral right we talked
about uh this sort of uh change in format from what you're used to looking at with integrals right because maybe we don't even have a density but we can still sort of think about it as a functional from the CDF to the real
numbers right and so then we talked about how that's a consistent estimator we did the same thing with uh the linear regression estimators right and so you can estimate beta 0 and uh beta the
slopes as well without actually making an assumption about the error distribution here and sometimes we'll assume those errors are going to be normally distributed because that helps us with inference or with sort of prediction intervals or that kind of
thing but you don't actually need that in order for the estimation of those slope parameters to be valid right and so we actually have seen some uh non-parametric techniques
already and it turns out that you know coming up with nonparametric techniques can be sort of a challenge right it's not always easy to know how you can non-parametrically estimate something
but just like we have sort of the um maximum likelihood estimation or uh method of moments estimation to come up with parametric estimators sort of as a general rule we're going to have sort of
a general purpose way of coming up with non-parametric techniques and it's not always going to work and it's not always going to be sort of the best thing for us to do however it sort of gives us a
general purpose idea to try to come up with a non-parametric technique and so it's a really good place to start and so what we'll do in today's lecture is talk about that it's something known as the
plug-in principle or the plug-in Technique we can form plug-in estimators uh through it and in order to talk about that it's going to require us to explore the empirical CDF a little bit more and
I know we touched on this when we were introducing sort of our background mathematical statistics but we'll talk a little bit more in depth about that uh today and then sort of from there we'll
have a basis of how do we think about non-parametric inference and we can start using that to jump off and explore the bootstrap and sort of these
non-parametric extensions to Monte Carlo estimation techniques right so with that we can open up our uh whiteboard here
again and we can start talking about the empirical cumulative distribution function so sort of the setup that we're thinking
about throughout this whole procedure is that we're going to be assuming that we have some sample we'll say X1 through xn and the only thing for right now that
we're going to assume is that this is coming from some distribution with a CDF given by F right we can always say that a random variable is going to have a CDF right and so it's sort of uniquely
defined by it CDF and so this is sort of a valid uh place for us to begin and the idea is that you know if you were willing to assume some distribution
right then this CDF in this sort of parametric case right if you assume some
distribution then in that case f is going to be governed by some parameter Theta right and so what we're thinking about here is for instance if you take
uh the normal CDF right so if normal then what you need to do is you take Theta is equal to the mean and the variance right and the idea is that in
this case once you know the mean and the variance then you know everything about that distribution right similarly if you were to take it to be the exponential then Theta is just going to
equal the rate parameter which is Lambda right and so again in these cases all we have to do is estimate the parameters and then we know the distribtion but if you're not going to assume some
particular distribution here then there's not going to be a single parameter that sort of indexes this function that we care about right and so instead of trying to estimate the parameters what we want to do is
estimate the function itself right so we don't want to say well to get to our CDF we need our mean and our variance instead we want to say
let's estimate a function that's going to approximate our CDF right so how might we think about going to do this well it's worth sort of thinking about
the fact that if you take say F ofx right by definition this is the defining relationship here this is going to be the probability that X is less
than or equal to this value X right this is true by definition and so how might we think about trying to estimate this and sort of I think the
natural way is if you think about having a bunch of data and let's say that we order all of our points here right so we take X1
X2 and so forth up to xn right these are what we would call the order statistics and we call them uh the order
statistics when we've placed them in order uh and and we typically write them with this sort of uh subscript bracket rather than just sort of X1 through xn
right but if you take them in order then sort of a question you could ask is where does X fall right where can we put X into here and then you count the proportion of values that are less than
here right so then the proportion here and so maybe to make this concrete right let's suppose that we observe the
values I don't know 1 one 2 5
7 uh 8 10 1 2 3 4 5 six 7 uh8 n and then we'll just take a tenth to make our math easy right so let's say
that these are the order statistics of the random variables that we've actually observed here right and then if we wanted to estimate say I don't know F of
six right which is the probability that X is less than or equal to 6 right if we assume that the data that we've observed are going to be somehow representative
from the true distribution which you know depending on how we've sampled it they should be well then what we note is that six is going to fall right in here
right and so if we look at this then we've observed 1 2 3 four values that are less than six and then we would have the other six values being larger than
six and so then this should sort of be approximately equal to 4 out of 10 right because 4 out of 10 of the values are
less than or equal to six right and this sort of makes sense and you could think that well if we sort of take a larger and larger sample size then of course eventually this is going to give us the
correct uh sort of distribution here and this sort of intuitive idea is the basis for the CDF right or the empir
CDF so how do we get the empirical CDF well to get the empirical CDF we first Define our uh indicator function
right so let um say the indicator function here x
i uh less than or equal to x equal say 1 if x i is less than or equal to X and
zero otherwise right so it's a function that's going to map to one if this is true and it's going to be zero otherwise okay and so once we've defined this well
then what we can do is we can say the empirical CDF which we denote as fat with a subn right and the subn is to emphasize the fact that this is going to depend on uh the sample size that we've
actually observed it's going to be a function right so we take in a parameter value X here and what we're going to do is we're going to write this as the average of
the summation from IAL 1 to n of the indicator function of our sample less than or equal to the parameter value here right and so what we're doing here
is we're counting up how much of our sample with this summation right this is going to give us the number less than or equal to X right and then we divide by the total number to give us the
proportion right so this is exactly what we were doing doing up here it's just sort of written out in this notation and one of the nice things about writing it out in this notation is that we can see
that the empirical CDF here is going to be a sample average right so for every single value uh so for any fixed
x f hat of N is a sample average right and why do we like the fact that it's it's a sample average well what we know is that sample
averages are going to converge based on the laws of large numbers to the expected value of that sample right so
what we can say is that therefore fatn of X is going to converge and we can say it's going to converge almost surely or we could say it's going to converge in probability depending on
whether you're referencing the strong or the weak laws here to the expected value of our indicator function x i less than
or equal to X right and if you think about what this expect value is right well we could actually write this out as an integral here right so the uh
integral here is going to be the integral over the real numbers of this expectation uh so I will take a different integral
integration variable I'll integrate over T right just to because we have this x here so it's going to be the function T less than or equal to X and then we can use sort of this notation that we've been
talking about here right because we're integrating with respect to this CDF now if you think about what's happening with this function right here right if we
graph this out well this function is going to be at a value of one up until whatever value we have as X and then it's going to be zero for the rest of
time right and so we can rewrite this as the interal from infity to uh X and then this whole function is sort of taken
care of and so then we can just write that as d f of T and this is by definition the CDF right so this is just
equal to f of x make sure that that's a lowercase x there right so in uh this
sense we have this uh empirical CDF is going to be converging at least pointwise for every single value of x to the CDF at that same point now the thing
that I'll point out here is that we could have gotten to the empirical CDF through this sort of defining relationship right because if we look at what this is right
here right so we take this and we say take Theta of f to be equal to uh the
integral over the real numbers of this indicator value that uh T is less than or equal to X
DF of T right then if we take Theta of uh sort of fatn right well then this is going to be the integral
here over this function and we'll take it with respect to Fat of N and whenever we're sort of integrating with respect to these discret valued uh cdfs right
and then sort of um our our empirical CDF here is this discret valued CDF then what it actually becomes is the integral becomes a summation right so we talked about this when we were introducing
discrete versus continuous is that we sort of write down everything as though they're integrals but when we have discret valued things uh then they're actually sort of summations right and so
what this actually is here is this is going to be the summation uh over sort of we could still SU for the real values of the indicator that um uh T is less
than or equal to X but the only values that are actually going to have sort of the positive probability here are going to be the values um from T I guess could
rewrite this based on sort of the notation we were using XI 1 to n indicator um
of XI less than or equal to X here and uh sorry the important thing that we have here is that because the density or the the mass function here is 1 over n should have included my 1 over n here
right so this is sort of the moving from discreet to continuous or continuous back to discreet we always uh can just write integrals as summations or vice versa right and so the idea that I'm
sort of getting at here is that we could have viewed this as sort of a non-parametric estimator in essentially exactly the same way as we did with the
sample mean right and so uh if we open back up our slides here you can see that our sample mean we Define as sort of this integral right and then or or sorry the population mean we Define as this
integral here and then the sample mean becomes 1 over n the sum over these values and so we're essentially doing the same thing here um but you could come up with the value for the empirical
CDF in a little bit more uh intuitive of a way I guess than just doing that but that connection is going to be uh sort of important right and so we can take
our um um empirical CDF to be defined right here and it's going to be valid sort of no matter what right and so we can take
a look quickly at the empirical CDF in R right and so um we have sort of this this code here I'll zoom in a little bit to make sure that it's all legible but we have uh sort of some code the
important function is going to be this ecdf call right here right and so what we can do is we can uh you know just generate some data so we'll set our seed we're going to take one sample is going to be just a a normal and then we're
going to take this other uh sample here and it's going to be this like weird distribution where we take a normal we multiply it by some binomial and then we add on a plus all multiplied by some exponential right I'm sure you could
work out what the uh distribution of this is going to be sort of using um transformation techniques and the like but that's going to be kind of annoying to do right and so what if this is what our data actually ended up sort of
looking like you know uh the the the sort of important thing to note is that the ecdf does not care what whether we have nice normal data or whether we have this sort of strange mixture distribution and so then we'll compute
the ecdf all you have to do is you call the ecdf function and you pass in uh the data that you want here and then we can easily sort of just plot these right uh
and so we'll plot the um uh the normal and then onto the normal plot I'm also adding the actual normal CDF right so that we can get a sense of what it looks
like and then we can plot out this strange mixture CDF here right and so we can see that we sort of can estimate all of these values here right and you can
see um these sort of stepwise function right here that's our estimated value and the actual curve is sort of what's going on here so we can see we're at least roughly following the same shape
right over here we can see that we get sort of this more uh aggressive looking uh CDF that we're estimating from I haven't included the actual CDF here because again that's going to be sort of
a an annoying uh thing to work out so it's worth uh popping back over here and just talking about sort of this shape that we're
going to always expect to get out of the ecdf right so if you think about having observed values uh in order say
X1 X2 up to xn right and then you think about actually treating the ecdf as a function
of F or of X right so I've had n of X right we can think about what these values are actually going to be right so note that we're going to have the
proportion of values that are smaller than the value of x that we plug in and so if we take anything that is less than our smallest value the proportion of values that we observe down there is
always going to be zero right so we're going to take zero for all X less than x
1 right and then what we'll find is that we get a value of 1 / n for all values of X1 less than or equal to X less than
X2 because if you fall between your smallest value and your second smallest value then we're going to be in this situation where uh one out of the N
values are right there right then we'll have two out of the N proportion if you fall between X X2 and
X3 right and you can sort of continue this pattern here right where we're ultimately going to have a sort of I out
of n if you're falling between x i and x i + 1 until ultimately make sure to write this in
nicely here until you get to a value of one if you are uh sort of greater than or equal
to X or the the maximum value xn right so in this case we can sort of actually think about writing this out sort of explicitly but if we think about this
sort of then emerging as an actual function that we can draw out here right and if I place in just some of these values
X1 X2 X3 and so forth right think put this continuing then what we're always going to have is this sort of staircase pattern right where we're going to start
out at zero right up until we get to X1 and then immediately we're going to jump up to 1 over n and we're going to stay constant there and then we're going to
jump up to 2 over n and we're going to stay constant there and so forth right and so in this case we're going to
always get this sort of staircase pattern right and with this staircase pattern this is sort of the the characteristic pattern of any discret
CDF right and so in a sense here if you're thinking about sort of a discret random variable here then it sort of makes sense that we're getting uh out this this sort of stepwise pattern right
and so just to sort of plot this where 1 over n here or 2 over n here and so forth you know you uh up and up until you get to the value
of one and so importantly the the thing to note about this is that fatn of X is a valid
CDF right so not only is it going to be uh sort of a good estimate right in that sense that we saw with the converging Point wise but it's also going to be a valid CDF and what's it going to be the
CDF for well it's going to be the CDF for the discrete distribution which is equally likely to take X1
X2 up to xn right so if we Define a discrete random variable which has an equal probability of taking on any of the
values that we've observed then FN is exactly the CDF for that right uh so it's essentially a probability a discrete uniform probability
distribution that assigns 1/n Mass to each of the options that we've actually looked at and because it's this valid CDF what we can sort of think about is it's defining this valid distribution
and so we can start to work with that distribution directly right and in particular we're going to be calling this the empirical distribution right so it's going to be
called the empirical distribution and the empirical CDF is the CDF of the empirical distribution right so there's sort of two ways that we can go about thinking about the ecdf on one hand we
can think about it as an estimate of the true population CDF and it's a non-parametric estimate right so it's going to work sort of no matter which uh data set we we're working with what
population distribution we're working with it's also going to be the valid CDF of the empirical distribution right so if you had a discrete distribution that
took on each of these values with equal probability this is going to be the CDF for it right and so both of these uh formulations are going to be important
for us to really understand the properties of uh the ecdf now there are some other important uh features of the ecdf that are worth at
least highlighting we won't actually go about sort of uh proving these or demonstrating them in detail that they sort of require a little bit more Machinery than I think is worth uh sort of going through in this class but they
are important results nonetheless right to at least justify why we're thinking about uh this distribution and so the first is that we are going to have
convergence in distribution of the ecdf uh at any given point right so we can essentially use the central limit theorem because again we have this sample average and so if we think about sort of one way that we can formulate
the central limit theorem what we can say is that the square root of n Times by FN of x minus F ofx right we know from our Central limit
theorem that this is going to generally converge in distribution to the normal distribution it's just going to converge in distribution to a norm normal and it's going to be a normal with zero mean
and the variance is actually going to be given by the beri variance here so that's f ofx 1 minus F ofx right and so that's sort of an
intriguing uh result that we have there um and this is going to be useful for instance we can use this to formulate uh confidence intervals around our ecdf right so that's sort of a
helpful uh place that we can we can sort of apply uh this technique too now both the uh Central limit theorem and the laws of
large numbers when applied to the the um empirical CDF give us results that rely on pointwise convergence right so this is saying for each value of x this is
going to be true there are actually a lot stronger results that we can get from this and so I'll just State the next uh sort of result and this result gives rise to to
many other results and it's a lot more powerful even if at sort of uh the outset it doesn't look like it's a whole lot different
so one way that you can you can sort of write this is uh we can say that the norm here and I'll explain exactly what
this notation means in a second the uh the infinity Norm of the uh empirical CDF minus the actual CDF converges
almost surely to zero right and so what am I actually saying here well this right here we can take to be the
supremum from X uh of the difference between the uh empirical CDF and the
actual CDF right and so remember that the supremum is an upper bound on the difference between these things so it's sort of like saying the maximum value here um and so the maximum difference
between these two points for all of the possible values is going to converge almost surely to zero now this is sort of a much more powerful statement than
uh the statement regarding pointwise convergence right and so why how can we sort of think about that well it's not an exact Apples to Apples comparison but
if you think about a function like say FN of X is going to be say x^2 / n
right now if you fix any value of x right so if you fix X then as X or as n goes to
Infinity FN of X goes to zero right so if you're taking just any specific value of x then this will
always tend towards zero however if you think about taking uh sort of the maximum
difference between FN ofx and zero right so that's the thing we're saying it's going to ultimately end up converging to so if you take the supremum of this over
X right and then you take the limit as n goes to Infinity here right so what we're thinking about is we first take the maximum difference and then we
take the limit this is going to be Infinity right because the issue is that as X sort of grows we can't sort of keep Pace with that uh Over N there's never
going to be a time that if you take n big enough it won't matter what x is whereas in this case we are saying that even if you were to First find what's the maximum difference and then you take
n off to Infinity that that's still going to go to zero right so hopefully this sort of gives some intuition as to why this is a stronger result But ultimately it is a stronger result and so this sort of says that we have this
uniform form convergence there's a lot more uh sort of additional theoretical results that you can take and if you take sort of a an advanced statistics course you'll you'll definitely deal
with this a lot more there but the general idea here is that the empirical CDF is going to be a good estimator it's going to be a well- behaved estimator
for the true CDF and in particular what we need is we need our sample size to be going off to infinity and so just sort of as a follow on to to that point uh we can pull back open the course
notes here and I'm running the exact same analysis that we saw above except now I'm going to be considering 222,000 data points right we can see we're generating exactly the same uh
Data before and so these first plots are exactly the same as what we saw above but then what you can see is that in this case where we're taking 200 data points right and we look at the normal
curve right here you can hardly even see a difference between the normal CDF and uh the the Imp iCal CDF that we have here right they're essentially one and
the same and uh you know we don't have the actual CDF plotted here but you can see it's sort of converging to this line if we go up to 20,000 points now they are literally coinciding with one another and so we can sort of trust that
this one is also going to be coinciding here as well so as the sample size increases then we have uh this this sort of position whereby the empirical CDF is
going to be essentially a perfect representation of the true CDF right and so again there's sort of these two framings for it and we've been you know jumping back and forth between them but
on one hand the empirical CDF is a true CDF of a discret distribution based on the data that we've actually observed on the other hand the empirical CDF is
going to be a uh distribution uh an estimate of the true distribution function right and so either of those framings are going to be useful depending on exactly what it is
that we're trying uh to do here and so how does this all help us well on one hand this is just another example of a non-parametric estimator right the empirical CDF is a non-parametric
estimator of the distribution function however the nonparametric estimator of the distribution function the empirical CDF here is also going to give us a way
to generally form uh non-parametric estimators for any parameters that we care about right and so what we can think about here is if we want to ask
at say Theta of f right so again we're sort of framing the parameter that we're looking to estimate as a functional over the
distributions this was the setup we were in last time then what we can do is we can
use Theta hat is equal to Theta of fat n right and so what what are we actually saying here well we know that Theta of f
is just sort of saying this is going to be some functional from the cdfs to the real values right so every CDF gets
mapped to some real value we've said that this is just a CDF right the empirical CDF is just some CDF and so we could apply that same functional to the
empirical CDF and in doing so that's going to produce us a real value and what we're saying is that that real
value is going to be uh an approximate an estimate for the true parameter value right and so sort of what is the what's
the the rationale here well very informally the idea is that if we take uh the limit as n goes to to Infinity of
f of n then what we're going to sort of get is that this is going to attend or tend towards F right and so the idea is
that well if we take the limit uh if we take a uh function and we apply it to both sides or functional
right so we do the limit as uh n goes to Infinity of fat of n then that should sort of uh tend to be Theta of f right
and here our hope is that if theta's sort of a well- behaved functional then maybe this will be sort of valid to do
right where we can pull through the limit now I will sort of point out that this step is not always going to be justified right we need Theta to sort of
be well behaved and there's going to be plenty of examples where uh you're sort of not justified in doing this but that's the general Theory right is that
we know this is true and so if we know that's true we can sort of hope to be able to work in this direction right and the really nice part about this is that this is going to give
us sort of a non-parametric way of estimating any parameter that we can sort of frame as a functional here right and so uh you know how can we do this
well like we were saying before if you suppose that uh Theta of f is going to be an
integral of say G ofx time d f of x right so it's it's this integral over
the function well then this integral right here if applied to uh fat of n right so Theta of fat of n we know it's
going to be this integral over the reals of g ofx d fat n of X and what we've said is that this integral right
here because we're moving to the discret case the integral becomes a summation and then we sort of have to sum up of the mass function multiplying by the function that we care about and so our
Mass function is going to be 1/ n for all of the values that we've observed so this is going to be equal to the sum uh
from I = 1 to n of say g x I * by 1 n which is 1/ n the sum from I = 1 to n of
G of x i right so any parameter that we can write out as the integral of some function is going to be sort of uh
represented as the average the sample average of this based on our data and so why might this be useful well for instance if we wanted to estimate say
the variance of x X right then if we want to frame this as Theta F right this is the integral of uh x
minus the expected value of x^ SAR DF ofx right and so the way that we can do
this then is Theta of fat of n is going to be 1/ n the sum from I = 1 to n of x
i - xar s right and so immediately we sort of get this plug-in estimator for the parameter of Interest now note this is not the sample variance estimator but it's going
to be sort of an estimator uh that's going to be reasonable that's going to work well um if you were to just take uh say the expected value of x then this is
just going to be uh the integral of x DF ofx right and so then
Theta of f hat of n is going to be 1 / n the sum from IAL 1 to n of x i and so then we get back to our sample mean here
right and so this uh plug-in principle is going to become sort of a really really effective way of generating these non-parametric estimators and the idea
is essentially that what we're wanting to do is we want uh Theta of f right but we don't
sort of know how are we going to go directly uh to sort of an estimate for that so that what we say is that we can do sort of f here is going to be well
estimated by F of N and so then we can go well Theta of f of n should hopefully give us sort of this Theta hat that we
need right and so it's not always going to work but it's sort of this uh intriguing little idea here and it's going to be this idea that requires a couple of leaps for for us right so it's
first to start thinking about parameters as functionals of the distributions and the second thing that it's going to require is us to recognize that the empirical CDF is going to be a good
estimator uh for this right and so just to sort of finish out the lecture today we'll just sort of um plug through this little piece of the notes Here uh again I would encourage you to actually sort
of you know read through these in a little bit more detail um but so what we're thinking about is again uh any functional that we want to specify here
right uh so for instance the expected value then the idea is that we can take our estimator to be given by uh the the
plugin estimator or we plug in our empirical CDF into the functional right so we can see here the expected value we can see here the um the variance one of
the things that we'll do sometimes by way of notation is we'll write the expected value with the distrib bution sort of indicated here right so this is the expect value with respect to F the
reason that that's useful is that then when we want to plug in um f hat of n then you know that we need to take the expected value with respect to the distribution fat of n which is just
going to be the sample mean here as a general rule we don't need to make very strong uh assumptions here we'll need Theta hat or Theta to be a
well- behaved uh functional in order for this to produce good estimates but it's going to work well enough for us and so that's sort of all that I'm going to talk about in terms of forming non-parametric estimators when we come
back from the break we'll start talking about how we can use this to start doing sort of these non-parametric Monte Carlo uh techniques based on uh what we'll call the bootstrap right and so again if
you're interested you can start to read ahead in the notes that this will be posted to the course website and so you can sort of um see that there uh but
just note that this is you know a very very small piece of what non parametric statistics has to offer if you want to get sort of uh more involved in any of this it's going to require sort of a lot more studying right you could take a
full course or or multiple courses on these topics and so it's okay if you sort of felt like this moved pretty quickly uh My Hope here is to mostly give you a flavor for what we're doing here so that when we start to see the
bootstrap which I think will be a lot more understandable uh you can sort of tie it back uh to what we talked about here uh but again that's everything sort of wanted to talk about today hope you
all have a great break and I will see you all afterwards please do not hesitate to reach out if you have any questions about anything and yeah I will see you all soon
Loading video analysis...