Product Design, Manufacturing & Innovation Resources
Home » Algorithmic Confounding

Algorithmic Confounding

2020
  • Sharad Goel
  • Ravi Shroff
  • Jennifer Skeem
  • Christopher Slobogin
Team of data scientists analyzing algorithmic confounding in AI applications.

(generated image for illustration only)

Algorithmic confounding occurs when a proxy variable used by an algorithm is correlated with a protected attribute (like race or gender) and also with the outcome of interest. The algorithm may inadvertently learn to discriminate based on the protected attribute by using the proxy, even if the protected attribute itself is explicitly excluded from the model’s input data.

Algorithmic confounding is a subtle but powerful source of bias. It arises because machine learning models are exceptionally good at finding statistical correlations, even spurious ones. While a developer might remove a sensitive feature like ‘race’ to prevent discrimination, the model can latch onto other features that act as proxies. A classic example is the use of ZIP codes in loan applications. Due to historical residential segregation, ZIP codes can be highly correlated with race. An algorithm might learn that applicants from certain ZIP codes are higher risk, not because of their location, but because the location is a proxy for a racial group that has historically been denied loans, creating a feedback loop of discrimination.

This is distinct from traditional statistical confounding because the algorithm isn’t just being misled; it’s actively learning a discriminatory policy from the data. Identifying and mitigating this requires more than just feature removal. It often involves causal inference techniques to understand the true relationships between variables, or the use of fairness-aware algorithms that can be constrained to ignore the influence of known proxies. The challenge lies in the fact that almost any variable can be a proxy to some extent, making complete elimination difficult.

UNESCO Nomenclature: 1203
– Computer science

Type

Abstract System

Disruption

Incremental

Usage

Widespread Use

Precursors

  • concept of confounding variables in statistics and epidemiology
  • legal doctrine of disparate impact
  • research on redlining and housing discrimination
  • development of machine learning classification algorithms

Applications

  • auditing of pre-trial risk assessment tools like COMPAS
  • development of proxy-aware bias detection methods
  • design of fair credit scoring models that avoid redlining proxies
  • improving fairness in automated hiring systems by identifying and mitigating confounding variables

Patents:

NA

Potential Innovations Ideas

Due to scrapping bot traffic, currently more than 40k per day, this content is reserved to community members.
> Login < or > Register < (100% free) to access this, so as all other restricted content and tools.

Related to: algorithmic confounding, proxy variable, disparate impact, algorithmic bias, machine learning, fairness, redlining, protected attributes, indirect discrimination, causal inference.

Historical Context

1997-04-23
2001
2010
2020
1993
1998
2010
2016

(if date is unknown or not relevant, e.g. "fluid mechanics", a rounded estimation of its notable emergence is provided)

Full size images and downloads are only available, 100% free, for registered members.

> Login <