What is the difference between prediction and confidence intervals




















I once asked Clive Granger why he confused the two concepts, and he dismissed my objection as fussing about trivialities. I disagreed with him then, and I still do.

I have seen someone compute a confidence interval for the mean, and use it as if it was a prediction interval for a future observation. The trouble is, confidence intervals for the mean are much narrower than prediction intervals, and so this gave him an exaggerated and false sense of the accuracy of his forecasts. So I ask statisticians to please preserve this distinction. And I ask econometricians to stop being so sloppy about terminology.

I rather hoped he would come to accept my point of view. For example, suppose we fit a simple linear regression model that uses the number of bedrooms to predict the selling price of a house:. We use the following formula to calculate a confidence interval :.

We use the following formula to calculate a prediction interval :. Notice that the formula for a prediction interval contains an extra one in the square root portion, which means the standard error will always be larger than a confidence interval.

Thus, a prediction interval will always be wider than a confidence interval. Suppose we have the following dataset that shows the number of bedrooms and the selling price for 20 houses in a particular neighborhood:.

Now suppose we fit a simple linear regression model to this dataset in R:. Sometimes, confidence intervals are not the best option. Let's look at the characteristics of some different types of intervals, and consider when and where they should be used. Specifically, we'll look at confidence intervals, prediction intervals, and tolerance intervals.

A confidence interval refers to a range of values that is likely to contain the value of an unknown population parameter, such as the mean, based on data sampled from that population. Collected randomly, two samples from a given population are unlikely to have identical confidence intervals. But if the population is sampled again and again, a certain percentage of those confidence intervals will contain the unknown population parameter.

The percentage of these confidence intervals that contain this parameter is the confidence level of the interval. Confidence intervals are most frequently used to express the population mean or standard deviation, but they also can be calculated for proportions, regression coefficients, occurrence rates Poisson , and for the differences between populations in hypothesis tests.

In relation to the parameter of interest, confidence intervals only assess sampling error—the inherent error in estimating a population characteristic from a sample. Larger sample sizes will decrease the sampling error, and result in smaller narrower confidence intervals. If you could sample the entire population, the confidence interval would have a width of 0: there would be no sampling error, since you have obtained the actual parameter for the entire population!

In addition, confidence intervals only provide information about the mean, standard deviation, or whatever your parameter of interest happens to be. It tells you nothing about how the individual values are distributed. What does that mean in practical terms? It means that the confidence interval has some serious limitations. To draw a conclusion like that requires a different type of interval A prediction interval is a confidence interval for predictions derived from linear and nonlinear regression models.

There are two types of prediction intervals. Given specified settings of the predictors in a model, the confidence interval of the prediction is a range likely to contain the mean response. Like regular confidence intervals, the confidence interval of the prediction represents a range for the mean, not the distribution of individual data points.

With respect to the light bulbs, we could test how different manufacturing techniques Slow or Quick and filaments A or B affect bulb life.

After fitting a model, we can use statistical software to forecast the life of bulbs made using filament A under the Quick method.



0コメント

  • 1000 / 1000