Product Design, Manufacturing & Innovation Resources
Home » Logistic Regression

Logistic Regression

1960
  • David Cox
Statistician analyzing logistic regression data for medical and financial applications.

(generated image for illustration only)

A regression model for a categorical, typically binary, dependent variable. Instead of modeling the outcome directly, it models the probability of the outcome using the logistic (sigmoid) function. The model predicts the log-odds of the event as a linear combination of the independent variables: \(\ln(\frac{p}{1-p}) = \beta_0 + \beta_1 x_1 + \dots + \beta_p x_p\), where p is the probability of the event.

Logistic regression is a fundamental algorithm for binary classification problems. It is a type of Generalized Linear Model (GLM) that extends the ideas of linear regression to cases where the outcome variable is not continuous. Applying linear regression directly to a binary (0/1) outcome is problematic because it can produce predicted probabilities outside the logical [0, 1] range and violates the OLS assumption of constant error variance.

Logistic regression solves this by using a link function to transform the outcome. It models the logarithm of the odds, or ‘logit’, as a linear function of the predictors. The odds are the ratio of the probability of success (\(p\)) to the probability of failure (\(1-p\)). This transformation, \(\text{logit}(p) = \ln(p/(1-p))\), maps the probability from the range [0, 1] to the entire real number line \((-\infty, +\infty)\), making it suitable for a linear model.

To get back to a probability, one applies the inverse of the logit function, which is the logistic or sigmoid function: \(p = \frac{e^{\beta_0 + \beta_1 x_1 + \dots}}{1 + e^{\beta_0 + \beta_1 x_1 + \dots}}\). Unlike linear regression, the parameters (\(\beta\)) are not estimated using least squares. Instead, they are typically found using Maximum Likelihood Estimation (MLE), an iterative process that finds the parameter values that maximize the likelihood of observing the actual data. The model can be extended to handle multi-class problems through multinomial logistic regression.

UNESCO Nomenclature: 1209
– Statistics

Type

Software/Algorithm

Disruption

Substantial

Usage

Widespread Use

Precursors

  • Linear regression
  • Probability theory (Bernoulli distribution)
  • Maximum likelihood estimation (developed by R.A. Fisher)
  • Probit model (an earlier model for binary outcomes)
  • The concept of generalized linear models

Applications

  • medical diagnosis (e.g., predicting disease presence based on symptoms)
  • credit scoring and financial risk assessment
  • spam detection in email clients
  • customer churn prediction in telecommunications and subscription services
  • election outcome prediction

Patents:

NA

Potential Innovations Ideas

Due to scrapping bot traffic, currently more than 40k per day, this content is reserved to community members.
> Login < or > Register < (100% free) to access this, so as all other restricted content and tools.

Related to: logistic regression, classification, binary outcome, sigmoid function, log-odds, maximum likelihood estimation, machine learning, predictive modeling, generalized linear model, categorical data.

Historical Context

Logistic Regression

1950
1952
1956
1960
1967
1967
1970
1950
1950
1953
1960
1960
1967
1970
1970

(if date is unknown or not relevant, e.g. "fluid mechanics", a rounded estimation of its notable emergence is provided)

Related Invention, Innovation & Technical Principles

Full size images and downloads are only available, 100% free, for registered members.

> Login <