Why variance divided by n 1




















Because it is customary, and results in an unbiased estimate of the variance. However, it results in a biased low estimate of the standard deviation, as can be seen by applying Jensen's inequality to the concave function, square root. So what's so great about having an unbiased estimator? It does not necessarily minimize mean square error. Teach your students to think, rather than to regurgitate and mindlessly apply antiquated notions from a century ago.

The estimator of the population variance is biased when applied on a sample of the population. In order to adjust for that bias on needs to divide by n-1 instead of n. One can show mathematically that the estimator of the sample variance is unbiased when we divide by n-1 instead of n. A formal proof is provided here:. Initially it was the mathematical correctness that led to the formula, I suppose.

However, if one wants to add intuition to a formula the already mentioned suggestions appear reasonable. First, observations of a sample are on average closer to the sample mean than to the population mean. The variance estimator makes use of the sample mean and as a consequence underestimates the true variance of the population.

Dividing by n-1 instead of n corrects for that bias. Furthermore, dividing by n-1 make the variance of a one-element sample undefined rather than zero. At the suggestion of whuber , this answer has been copied over from another similar question. Bessel's correction is adopted to correct for bias in using the sample variance as an estimator of the true variance. The bias in the uncorrected statistic occurs because the sample mean is closer to the middle of the observations than the true mean, and so the squared deviations around the sample mean systematically underestimates the squared deviations around the true mean.

To see this phenomenon algebraically, just derive the expected value of a sample variance without Bessel's correction and see what it looks like. In regression analysis this is extended to the more general case where the estimated mean is a linear function of multiple predictors, and in this latter case, the denominator is reduced further, for the lower number of degrees-of-freedom. This also agrees with defining variance of a random variable as the expectation of the pairwise energy, i.

To go from the random variable defintion of variance to the defintion of sample variance is a matter of estimating a expectation by a mean which is can be justified by the philosophical principle of typicality: The sample is a typical representation the distribution.

Note, this is related to, but not the same as estimation by moments. To answer this question, we must go back to the definition of an unbiased estimator. An unbiased estimator is one whose expectation tends to the true expectation.

The sample mean is an unbiased estimator. To see why:. Suppose that you have a random phenomenon. Oddly, the variance would be null with only one sample. This makes no sense. The illusion of a zero-squared-error can only be counterbalanced by dividing by the number of points minus the number of dofs. This issue is particularly sensitive when dealing with very small experimental datasets.

Generally using "n" in the denominator gives smaller values than the population variance which is what we want to estimate. This especially happens if the small samples are taken. If you are looking for an intuitive explanation, you should let your students see the reason for themselves by actually taking samples! Watch this, it precisely answers your question. There is one constraint which is that the sum of the deviations is zero. I think it's worth pointing out the connection to Bayesian estimation.

You want to draw conclusions about the population. The Bayesian approach would be to evaluate the posterior predictive distribution over the sample, which is a generalized Student's T distribution the origin of the T-test. The generalized Student's T distribution has three parameters and makes use of all three of your statistics.

If you decide to throw out some information, you can further approximate your data using a two-parameter normal distribution as described in your question. From a Bayesian standpoint, you can imagine that uncertainty in the hyperparameters of the model distributions over the mean and variance cause the variance of the posterior predictive to be greater than the population variance. I'm jumping VERY late into this, but would like to offer an answer that is possibly more intuitive than others, albeit incomplete.

The non-bold numeric cells shows the squared difference. My goodness it's getting complicated! I thought the simple answer was You just don't have enough data outside to ensure you get all the data points you need randomly. The n-1 helps expand toward the "real" standard deviation. Sign up to join this community. The best answers are voted up and rise to the top.

Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. Ask Question. Asked 11 years ago. Active 10 months ago. Viewed k times. Improve this question. Tal Galili Tal Galili You ask them "why this? Watch this, it precisely answers you question. Add a comment. Active Oldest Votes. Improve this answer. Michael Lew Michael Lew In essence, the correction is n-1 rather than n-2 etc because the n-1 correction gives results that are very close to what we need.

More exact corrections are shown here: en. What if it overestimates? Show 1 more comment. Dror Atariah 2 2 silver badges 15 15 bronze badges. Why is it that the total variance of the population would be the sum of the variance of the sample from the sample mean and the variance of the sample mean itself?

How come we sum the variances? See here for intuition and proof. It is biased in that it produces an underestimation of the true variance. We simulate a population of data points from a uniform distribution with a range from 1 to Below I show the histogram that represents our population.

The variance is 8. To start, we can draw a single sample of size 5. Say we do that and get the following values: 7, 6, 3, 5, 5. In the former case, this will result in 1. Below I show the results of draws from our population.

I simulated drawing samples of size 2 to 10, each different times. We see that the biased measure of variance is indeed biased. The average variance is lower than the true variance indicated by the dashed line , for each sample size. We also see that the unbiased variance is indeed unbiased. On average, the sample variance matches that of the population variance. The results of using the biased measure of variance reveals several clues for understanding the solution to the bias. We see that the amount of bias is larger when the sample size of the samples is smaller.

So the solution should be a function of sample size, such that the required correction will be smaller as the sample size increases. Ideally we would estimate the variance of the sample by subtracting each value from the population mean. This is where the bias comes in.

In fact, the mean of a sample minimizes the sum of squared deviations from the mean. This means that the sum of deviations from the sample mean is always smaller than the sum of deviations from the population mean. The only exception to that is when the sample mean happens to be the population mean.

Below are two graphs. In each graph I show 10 data points that represent our population. I also highlight two data points from this population, which represents our sample. In the left graph I show the deviations from the sample mean and in the right graph the deviations from the population mean.

We see that in the left graph the sum of squared deviations is much smaller than in the right graph. The sum is smaller when using the sample mean compared to using the population mean. This is true for any sample you draw from the population again, except when the sample mean happens to be the same as the population mean. The difference is small now, but using the sample mean still results in a smaller sum compared to using the population mean.

In short, the source of the bias comes from using the sample mean instead of the population mean. The sample mean is always guaranteed to be in the middle of the observed data, thereby reducing the variance, and creating an underestimation. Now that we know that the bias is caused by using the sample mean, we can figure out how to solve the problem.

Looking at the previous graphs, we see that if the sample mean is far from the population mean, the sample variance is smaller and the bias is large. If the sample mean is close to the population mean, the sample variance is larger and the bias is small. So, the more the sample mean moves around the population mean, the greater the bias.

In other words, besides the variance of the data points around the sample mean, there is also the variance of the sample mean around the population mean.



0コメント

  • 1000 / 1000