If you're seeing this message, it means we're having trouble loading external resources on our website.

Jeżeli jesteś za filtrem sieci web, prosimy, upewnij się, że domeny *.kastatic.org i *.kasandbox.org są odblokowane.

Główna zawartość

Intuicja stojąca za R-kwadrat

When we first learned about the correlation coefficient, r, we focused on what it meant rather than how to calculate it, since the computations are lengthy and computers usually take care of them for us.
We'll do the same with r2 and concentrate on how to interpret what it means.
In a way, r2 measures how much prediction error is eliminated when we use least-squares regression.

Predicting without regression

We use linear regression to predict y given some value of x. But suppose that we had to predict a y value without a corresponding x value.
Without using regression on the x variable, our most reasonable estimate would be to simply predict the average of the y values.
Here's an example, where the prediction line is simply the mean of the y data:
Notice that this line doesn't seem to fit the data very well. One way to measure the fit of the line is to calculate the sum of the squared residuals—this gives us an overall sense of how much prediction error a given model has.
So without least-squares regression, our sum of squares is 41.1879
Would using least-squares regression reduce the amount of prediction error? If so, by how much? Let's see!

Predicting with regression

Here's the same data with the corresponding least-squares regression line and summary statistics:
Equationrr2
y^=0.5x+1.50.8160.6659
This line seems to fit the data pretty well, but to measure how much better it fits, we can look again at the sum of the squared residuals:
Using least-squares regression reduced the sum of the squared residuals from 41.1879 to 13.7627.
So using least-squares regression eliminated a considerable amount of prediction error. How much though?

R-squared measures how much prediction error we eliminated

Without using regression, our model had an overall sum of squares of 41.1879. Using least-squares regression reduced that down to 13.7627.
So the total reduction there is 41.187913.7627=27.4252.
We can represent this reduction as a percentage of the original amount of prediction error:
41,187913,762741,1879=27,425241,187966,59%
If you look back up above, you'll see that r2=0.6659.
R-squared tells us what percent of the prediction error in the y variable is eliminated when we use least-squares regression on the x variable.
As a result, r2 is also called the coefficient of determination.
Many formal definitions say that r2 tells us what percent of the variability in the y variable is accounted for by the regression on the x variable.
It seems pretty remarkable that simply squaring r gives us this measurement. Proving this relationship between r and r2 is pretty complex, and is beyond the scope of an introductory statistics course.

Chcesz dołączyć do dyskusji?

Na razie brak głosów w dyskusji
Rozumiesz angielski? Kliknij tutaj, aby zobaczyć więcej dyskusji na angielskiej wersji strony Khan Academy.