3.3.6 Normal Distribution and Standardization¶

Understanding properties of normal distribution and using z-scores for standardization and comparison.

定义¶

The normal distribution is a continuous probability distribution characterized by a bell-shaped, symmetric curve. It is defined by two parameters: the mean \(\mu\) (center of the distribution) and the standard deviation \(\sigma\) (measure of spread). The probability density function is given by \(f(x) = \frac{1}{\sigma\sqrt{2\pi}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\). Standardization (or z-score transformation) converts any normal distribution to the standard normal distribution with mean 0 and standard deviation 1, allowing for easier comparison and probability calculations. The z-score represents how many standard deviations a data point is from the mean, enabling us to use standard normal tables or calculators to find probabilities and percentiles.

核心公式¶

\(z = \frac{x - \mu}{\sigma}\)
\(P(a < X < b) = P\left(\frac{a-\mu}{\sigma} < Z < \frac{b-\mu}{\sigma}\right)\)
\(X = \mu + z\sigma\)
\(\text{Empirical Rule: } P(\mu - \sigma < X < \mu + \sigma) \approx 0.68, P(\mu - 2\sigma < X < \mu + 2\sigma) \approx 0.95, P(\mu - 3\sigma < X < \mu + 3\sigma) \approx 0.997\)
\(\text{Percentile: } x = \mu + z_p \cdot \sigma\)

易错点¶

⚠️ ["Confusing the direction of standardization: students often forget that \(z = \frac{x - \mu}{\sigma}\) (not \(\frac{\mu - x}{\sigma}\)), leading to incorrect signs when calculating z-scores.", "Misinterpreting z-scores as probabilities: a z-score is a standardized value, not a probability. Students must use the standard normal table or calculator to convert z-scores to actual probabilities.", "Incorrectly applying the Empirical Rule: students may forget that the 68-95-99.7 rule applies only to normal distributions, and they may miscalculate intervals by using the wrong multiples of standard deviation (e.g., using 1.5σ instead of 2σ for the 95% interval).", "Failing to check normality assumptions: students sometimes apply z-score methods to non-normal distributions without verifying that the data is approximately normally distributed, leading to invalid conclusions."]