跳转至

3.3.2 Measures of Spread

Computing range, interquartile range (IQR), variance, and standard deviation to quantify data variability.

定义

Measures of spread are statistical quantities that describe how dispersed or scattered the data values are around the center of a distribution. They quantify the variability or dispersion within a dataset. The main measures of spread include:

Range: The difference between the maximum and minimum values in a dataset, calculated as \(\text{Range} = \max(x) - \min(x)\). It provides a simple measure of spread but is sensitive to outliers.

Interquartile Range (IQR): The difference between the third quartile (Q3) and the first quartile (Q1), calculated as \(\text{IQR} = Q_3 - Q_1\). It represents the range of the middle 50% of the data and is resistant to outliers.

Variance: The average of the squared deviations from the mean. For a population, variance measures how far each data point is from the mean on average. For a sample, it uses \(n-1\) in the denominator to provide an unbiased estimate.

Standard Deviation: The square root of the variance. It measures the typical distance of data points from the mean and is expressed in the same units as the original data, making it more interpretable than variance.

核心公式

  • \(\text{Range} = \max(x) - \min(x)\)
  • \(\text{IQR} = Q_3 - Q_1\)
  • \(\sigma^2 = \frac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2\) (population variance)
  • \(s^2 = \frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2\) (sample variance)
  • \(\sigma = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(x_i - \mu)^2}\) (population standard deviation) and \(s = \sqrt{\frac{1}{n-1}\sum_{i=1}^{n}(x_i - \bar{x})^2}\) (sample standard deviation)

易错点

  • ⚠️ Confusing population variance/standard deviation (using \(N\)) with sample variance/standard deviation (using \(n-1\)). Students often forget that sample statistics use \(n-1\) (Bessel's correction) to provide unbiased estimates.
  • ⚠️ Using the range as the primary measure of spread without recognizing that it is highly sensitive to outliers and only depends on two extreme values, ignoring the distribution of all other data points.
  • ⚠️ Calculating standard deviation without first finding the mean correctly, or making arithmetic errors when squaring deviations. Students sometimes forget to take the square root of variance to get standard deviation.
  • ⚠️ Misinterpreting the IQR by thinking it represents the spread of all data rather than just the middle 50%, or incorrectly identifying Q1 and Q3 when the dataset has an odd or even number of values.