Statistical tests are the only way in quality and manufacturing to provide objective evidence for decision-making. They help identify variations in processes and distinguish between random fluctuations and actual problems. In engineering, statistics help identify patterns, outliers, and sources of failure in system performance, ensuring data-driven decision-making. By rigorously analyzing experimental results, engineers can validate product designs and manufacturing processes, detecting potential problems before implementation. This systematic approach reduces the risk of unexpected failures and enhances overall safety by ensuring reliability and compliance with international safety standards.
This post will review main statistical tests used in manufacturing and Total Quality Management (TQM).
Note: as they also concern engineering, research and science, the following 2 statistical tests and analyses
- correlation analysis: measures the strength and direction of the relationship between two variables (e.g., Pearson correlation coefficient).
- regression analysis: examines the relationship between variables (e.g., input factors and process output), from simple linear to multiple regression.
are not included here but in a specific article about main 10 algorithms for engineering.
Normality Tests

in the statistical tests world, many common statistical methods (t-tests, ANOVA, linear regression, etc.) assume that the data are normally/Gaussian distributed (or that the residuals/errors are normal). Violating this assumption can make the results unreliable: p-values can be misleading, confidence intervals may be wrong, and the risk of Type I/II errors increases. Note that some tests, like the 1-way ANOVA, can handle reasonably well a non-normal distribution.
Note: if your data is not normal, see real life cases below, you may need to use non-parametric tests (like the Mann-Whitney U test or Kruskal-Wallis test), which don’t assume normality, or transform your data, which are out of the scope of this post.
While several statistical tests exist for this, we will detail here the Shapiro-Wilk test, famous especially for small sample sizes, typically n < 50, but can be used up to 2000.
FYI, other common normality tests:
- Kolmogorov-Smirnov (K-S) test (with Lilliefors correction): works at better with larger sample sizes while being less sensitive than Shapiro-Wilk especially for small datasets
- Anderson-Darling test: is good with all sample sizes and has more sensitivity in the tails (extremes) of the distribution while being more powerful for detecting departures from normality in the extremes.
How-to perform the Shapiro-Wilk normality test
1. Calculate or compute the Shapiro-Wilk test statistic (W): \(W = \frac{\left(\sum_{i=1}^{n} a_i x_{(i)}\right)^2}{\sum_{i=1}^{n} (x_i – \bar{x})^2}\)Note: as the calculation of the \(a_i\) coefficients is nontrivial and generally requires a table or algorithm, which is why the Shapiro-Wilk test is nearly always computed by software such as R, Python’s SciPy, MS Excel add-ons or other dedicated softwares. For a manual calculation, this page provides all the \(a_i\) coefficients and p-value for samples up to 50. The value of W ranges between 0 and 1 (W = 1: perfect normality. W < 1: the further it is from 1, the less normal your data are). 2. W is not enough. It works in conjunction with its corresponding p-value to have the confidence level. In the Shapiro-Wilk table, at the row of the n sample size, look for the closest value to your calculated W and get its corresponding p-value on the top | The numerator represents the squared sum of the weighted ordered sample values. The denominator is the sum of the squared deviations from the sample mean (i.e., the sample variance, scaled by (n-1)). \(x_{(i)}\) = the i-th order statistic (i.e., the i-th smallest value in the sample) \(x_i\) = the i-th observed value \(\bar{x}\) = the sample mean \(a_i\) = constants (weights) calculated from the mean, variances, and covariances of the order statistics of a sample from a standard normal distribution ((N(0,1))), and depend only on n (sample size). n = sample size |
3. Result: if the p-value is greater than the chosen alpha-level (example 0.05), there is statistical evidence that the data tested are normally distributed. | |
For normality testing, it is frequently advised to mix a numerical method with a graphical method such as Henry’s line, Q-Q plots or histograms :
Mind Non-normal Distributions!
While normal/Gaussian distribution is the most frequent case, it should not be automatically assumed. Among daily counter-examples are:
- Wealth and income distribution among individuals. It follows a Pareto (power law) distribution, skewed with a “long tail” of very wealthy individuals.
- City population sizes in a country follow Zipf’s Law (power law), with a few very large cities and many small towns.
- Earthquake magnitudes and frequency are a power law/Gutenberg-Richter distribution: small earthquakes are common, large ones are rare.
- Daily price changes or returns in financial markets: fat-tailed/heavy-tailed distributions, not Gaussian; large deviations occur more frequently than predicted by a normal distribution.
- Word frequencies in language, as the city population above, it follows a Zipf’s Law (power law): Few words are used often, most words are rare.
- Internet traffic/website popularity: power law/long tail: Some sites have millions of hits, most have very few.
- File sizes on computer systems: log-normal or power law, with a few very large files and many small ones.
- Human lifespans/longevity: right-skewed (can model with Weibull or Gompertz distributions), not normal; more people die at older ages.
- Social network connections follow a power law: few users have many connections; most have few.
Most of these are characterized by “few large, many small”, a signature of power laws, heavy tails, exponential or log-normal distributions, and not the symmetrical shape of the Gaussian.
The t-Test (Student’s t-Test)
The t-Test (aka “t of Student”), developed by William Sealy Gosset under the pseudonym “Student” in 1908, is a statistical test used to compare means when sample sizes are small and population variance is unknown. Focusing at comparing the means of two populations, it is one of the most used test in Manufacturing.

Purpose: the t-Test helps engineers and quality professionals determine if there is a statistically significant difference between the means of two groups or between a sample mean and a known standard. It’s commonly used in hypothesis testing to evaluate whether process changes or product modifications have led to real improvements or differences, beyond what could be expected by chance.
Practical examples in the industry:
- In automotive manufacturing, a t-Test might be used to compare the tensile strength of steel from two different suppliers to ensure consistent quality.
- In pharmaceuticals, the t-Test is used to analyze whether a new production process yields tablets with a mean weight significantly different from the standard.
- In electronics, engineers may use the t-Test to verify if a design change in a circuit board results in a measurable improvement in electrical resistance.
How-to the Student’s t-Test
They are many variants of the t-test; the example here will focus on a so-called “two-sample t-test” in its “unpaired” version, comparing the samplings of 2 different productions batches.
- State your null and alternative hypotheses; in this example “there is no difference between means” vs “there are different”
- Collect your data from the 2 production batches being compared and calculate
- the 2 sample means \(\bar{X} = \frac{1}{n_1} \sum_{i=1}^{n_1} X_i\) and \(\bar{Y} = \frac{1}{n_2} \sum_{j=1}^{n_2} Y_j\)
- Calculate the 2 sample variances: \(S_X^2 = \frac{1}{n_1-1} \sum_{i=1}^{n_1} (X_i – \bar{X})^2\) and \(S_Y^2 = \frac{1}{n_2-1} \sum_{j=1}^{n_2} (Y_j – \bar{Y})^2\)
- sample sizes.
- Calculate the test statistic. While the method assumes both samples are independent & both samples are from normally distributed populations, there is still two cases:
- if equal variances assumed (“pooled” t-test;): Pooled variance: \(S_p^2 = \frac{ (n_1-1)S_X^2 + (n_2-1)S_Y^2 }{ n_1 + n_2 – 2 }\)
Test statistic: \(t = \frac{ \bar{X} – \bar{Y} }{ S_p \sqrt{ \frac{1}{n_1} + \frac{1}{n_2} } }\) - if unequal variances (Welch’s t-test): Test statistic: \(t = \frac{ \bar{X} – \bar{Y} }{ \sqrt{ \frac{S_X^2}{n_1} + \frac{S_Y^2}{n_2} } }\) Degrees of freedom (approximate, Welch-Satterthwaite): \(df = \frac{\left( \frac{S_X^2}{n_1} + \frac{S_Y^2}{n_2} \right)^2}{ \frac{ (S_X^2 / n_1)^2 }{ n_1 – 1 } + \frac{ (S_Y^2 / n_2)^2 }{ n_2 – 1 } }\)
- if equal variances assumed (“pooled” t-test;): Pooled variance: \(S_p^2 = \frac{ (n_1-1)S_X^2 + (n_2-1)S_Y^2 }{ n_1 + n_2 – 2 }\)
- Use the calculated ( t ) and degrees of freedom (\(n_1+n_2-2\) for equal variances, or the Welch formula) to look up or compute the p-value from the t-distribution (depending on whether it’s a one-tailed or two-tailed test).
- Result: compare the calculated t-value with the critical t-value from statistical tables based on your chosen confidence level and degrees of freedom; alternatively, use software for the p-value. If the t-statistic exceeds the critical value or the p-value is below your threshold (typically 0.05), reject the null hypothesis.
Link to the t-Test critical values table
The rest of this article is reserved for members
To limit scraping bots (currently 40,000 hits per day!),
we had to restrict access to full articles and tools to registered members only.
to access all the rest.











