Product Design, Manufacturing & Innovation Resources
Home » Coefficient of Determination (R²)

Coefficient of Determination (R²)

1900
  • Karl Pearson
Statistician analyzing regression model data in an office setting.

(generated image for illustration only)

A statistic indicating the goodness of fit of a model, representing the proportion of the variance in the dependent variable that is predictable from the independent variable(s). An R² of 1 indicates a perfect fit, while 0 indicates no linear relationship. It is calculated as \(R^2 \equiv 1 – \frac{SS_{res}}{SS_{tot}}\), where \(SS_{res}\) is the residual sum of squares.

The coefficient of determination, R-squared, is a key metric for evaluating regression models. It provides an intuitive measure of how much of the variability in the outcome is captured by the model. It is derived from two key components. The first is the Total Sum of Squares (\(SS_{tot} = \sum_i (y_i – \bar{y})^2\)), which measures the total variance in the dependent variable \(y\). The second is the Residual Sum of Squares (\(SS_{res} = \sum_i (y_i – \hat{y}_i)^2\)), which measures the variance left unexplained by the model, where \(\hat{y}_i\) is the predicted value.

The formula \(R^2 = 1 – SS_{res}/SS_{tot}\) can be interpreted as the percentage of total variance that is ‘explained’ by the regression model. For instance, an R² of 0.75 means that 75% of the variability in the outcome can be accounted for by the predictors in the model. In simple linear regression, R² is simply the square of Pearson’s correlation coefficient (r) between the observed and predicted values.

However, R² has a significant limitation: it never decreases when a new predictor variable is added to the model, even if the new variable is irrelevant. This can be misleading and encourage overfitting. To counteract this, the Adjusted R-squared is often used. It modifies the R² value to account for the number of predictors in the model, providing a more accurate measure of goodness of fit for multiple regression.

UNESCO Nomenclature: 1209
– Statistics

Type

Abstract System

Disruption

Substantial

Usage

Widespread Use

Precursors

  • Concept of variance and standard deviation
  • Method of least squares
  • Pearson’s product-moment correlation coefficient
  • Analysis of variance (ANOVA) principles

Applications

  • evaluating the performance of predictive models in science and engineering
  • model selection in econometrics and social sciences
  • quantifying the proportion of variance explained by a set of predictors
  • validating financial models for risk assessment

Patents:

NA

Potential Innovations Ideas

Due to scrapping bot traffic, currently more than 40k per day, this content is reserved to community members.
> Login < or > Register < (100% free) to access this, so as all other restricted content and tools.

Related to: r-squared, coefficient of determination, goodness of fit, model evaluation, explained variance, sum of squares, regression diagnostics, statistical significance, adjusted r-squared, correlation.

Historical Context

Coefficient of Determination (R²)

1854
1884
1896
1900
1903
1914
1924
1854
1854
1895
1899
1900
1911
1922
1925

(if date is unknown or not relevant, e.g. "fluid mechanics", a rounded estimation of its notable emergence is provided)

Related Invention, Innovation & Technical Principles

Full size images and downloads are only available, 100% free, for registered members.

> Login <