家 » Logistic Regression

Logistic Regression

1960

David Cox

A regression model for a categorical, typically binary, dependent variable. Instead of modeling the outcome directly, it models the probability of the outcome using the logistic (sigmoid) function. The model predicts the log-odds of the event as a linear combination of the independent variables: [latex]\ln(\frac{p}{1-p}) = \beta_0 + \beta_1 x_1 + \dots + \beta_p x_p[/latex], where p is the probability of the event.

Logistic regression is a fundamental algorithm for binary classification problems. It is a type of Generalized Linear Model (GLM) that extends the ideas of linear regression to cases where the outcome variable is not continuous. Applying linear regression directly to a binary (0/1) outcome is problematic because it can produce predicted probabilities outside the logical [0, 1] range and violates the OLS assumption of constant error variance.

Logistic regression solves this by using a link function to transform the outcome. It models the logarithm of the odds, or ‘logit’, as a linear function of the predictors. The odds are the ratio of the probability of success ([latex]p[/latex]) to the probability of failure ([latex]1-p[/latex]). This transformation, [latex]\text{logit}(p) = \ln(p/(1-p))[/latex], maps the probability from the range [0, 1] to the entire real number line [latex](-\infty, +\infty)[/latex], making it suitable for a linear model.

To get back to a probability, one applies the inverse of the logit function, which is the logistic or sigmoid function: [latex]p = \frac{e^{\beta_0 + \beta_1 x_1 + \dots}}{1 + e^{\beta_0 + \beta_1 x_1 + \dots}}[/latex]. Unlike linear regression, the parameters ([latex]\beta[/latex]) are not estimated using least squares. Instead, they are typically found using Maximum Likelihood Estimation (MLE), an iterative process that finds the parameter values that maximize the likelihood of observing the actual data. The model can be extended to handle multi-class problems through multinomial logistic regression.

机器学习, 质量保证, 质量控制, 质量管理, 统计分析, 统计过程控制 (SPC)

UNESCO Nomenclature: 1209

- 统计资料

类型

软件/算法

中断

实质性

使用方法

广泛使用

前体

Linear regression
Probability theory (Bernoulli distribution)
Maximum likelihood estimation (developed by R.A. Fisher)
Probit model (an earlier model for binary outcomes)
The concept of generalized linear models

应用

medical diagnosis (e.g., predicting disease presence based on symptoms)
credit scoring and financial risk assessment
spam detection in email clients
顾客流失率预测 in telecommunications and subscription services
election outcome prediction

专利：

潜在的创新想法

级别需要会员

您必须是！！等级！！会员才能访问此内容。

立即加入

已经是会员？在此登录

Related to: logistic regression, classification, binary outcome, sigmoid function, log-odds, maximum likelihood estimation, machine learning, predictive modeling, generalized linear model, categorical data.