Predict in R: Model Predictions and Confidence Intervals

您所在的位置：网站首页 › r语言intervals › Predict in R: Model Predictions and Confidence Intervals

Predict in R: Model Predictions and Confidence Intervals

2024-06-29 20:35| 来源: 网络整理| 查看: 265

The main goal of linear regression is to predict an outcome value on the basis of one or multiple predictor variables.

In this chapter, we鈥檒l describe how to predict outcome for new observations data using R.. You will also learn how to display the confidence intervals and the prediction intervals.

Contents:

Build a linear regression Prediction for new data set Confidence interval Prediction interval Prediction interval or confidence interval? Discussion References

The Book:

Machine Learning Essentials: Practical Guide in R Build a linear regression

We start by building a simple linear regression model that predicts the stopping distances of cars on the basis of the speed.

# Load the data data("cars", package = "datasets") # Build the model model ## ## Call: ## lm(formula = dist ~ speed, data = cars) ## ## Coefficients: ## (Intercept) speed ## -17.58 3.93

The linear model equation can be written as follow: dist = -17.579 + 3.932*speed.

Note that, the units of the variable speed and dist are respectively, mph and ft.

Prediction for new data set

Using the above model, we can predict the stopping distance for a new speed value.

Start by creating a new data frame containing, for example, three new speed values:

new.speeds

You can predict the corresponding stopping distances using the R function predict() as follow:

predict(model, newdata = new.speeds) ## 1 2 3 ## 29.6 57.1 76.8 Confidence interval

The confidence interval reflects the uncertainty around the mean predictions. To display the 95% confidence intervals around the mean the predictions, specify the option interval = "confidence":

predict(model, newdata = new.speeds, interval = "confidence") ## fit lwr upr ## 1 29.6 24.4 34.8 ## 2 57.1 51.8 62.4 ## 3 76.8 68.4 85.2

The output contains the following columns:

fit: the predicted sale values for the three new advertising budget lwr and upr: the lower and the upper confidence limits for the expected values, respectively. By default the function produces the 95% confidence limits.

For example, the 95% confidence interval associated with a speed of 19 is (51.83, 62.44). This means that, according to our model, a car with a speed of 19 mph has, on average, a stopping distance ranging between 51.83 and 62.44 ft.

Prediction interval

The prediction interval gives uncertainty around a single value. In the same way, as the confidence intervals, the prediction intervals can be computed as follow:

predict(model, newdata = new.speeds, interval = "prediction") ## fit lwr upr ## 1 29.6 -1.75 61.0 ## 2 57.1 25.76 88.5 ## 3 76.8 44.75 108.8

The 95% prediction intervals associated with a speed of 19 is (25.76, 88.51). This means that, according to our model, 95% of the cars with a speed of 19 mph have a stopping distance between 25.76 and 88.51.

Note that, prediction interval relies strongly on the assumption that the residual errors are normally distributed with a constant variance. So, you should only use such intervals if you believe that the assumption is approximately met for the data at hand.

Prediction interval or confidence interval?

A prediction interval reflects the uncertainty around a single value, while a confidence interval reflects the uncertainty around the mean prediction values. Thus, a prediction interval will be generally much wider than a confidence interval for the same value.

Which one should we use? The answer to this question depends on the context and the purpose of the analysis. Generally, we are interested in specific individual predictions, so a prediction interval would be more appropriate. Using a confidence interval when you should be using a prediction interval will greatly underestimate the uncertainty in a given predicted value (P. Bruce and Bruce 2017).

The R code below creates a scatter plot with:

The regression line in blue The confidence band in gray The prediction band in red # 0. Build linear model data("cars", package = "datasets") model

Discussion

In this chapter, we have described how to use the R function predict() for predicting outcome for new data.

References

Bruce, Peter, and Andrew Bruce. 2017. Practical Statistics for Data Scientists. O鈥橰eilly Media.

【本文地址】

Predict in R: Model Predictions and Confidence Intervals

Predict in R: Model Predictions and Confidence Intervals

今日新闻

推荐新闻