跳转至

3.3.1 Measures of Center

Understanding and calculating mean, median, and mode to describe the central tendency of a dataset.

定义

Measures of center are statistical values that represent the central tendency or typical value of a dataset. The three primary measures of center are:

Mean (Average): The sum of all data values divided by the number of values. It is the most commonly used measure of center and is sensitive to all data points, including outliers.

Median: The middle value when data is arranged in order from least to greatest. If there is an even number of values, the median is the average of the two middle values. The median is resistant to outliers and is useful for skewed distributions.

Mode: The value that appears most frequently in a dataset. A dataset can have no mode, one mode (unimodal), or multiple modes (bimodal or multimodal). The mode is the only measure of center that can be used with categorical data. \nEach measure provides different insights into the data distribution and is appropriate for different types of data and situations.

核心公式

  • \(\text{Mean} = \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = \frac{x_1 + x_2 + \cdots + x_n}{n}\)
  • \(\text{Median} = \begin{cases} x_{\frac{n+1}{2}} & \text{if } n \text{ is odd} \ \frac{x_{\frac{n}{2}} + x_{\frac{n}{2}+1}}{2} & \text{if } n \text{ is even} \end{cases}\)
  • \(\text{Mode} = \text{the value with the highest frequency in the dataset}\)
  • \(\text{Weighted Mean} = \frac{\sum_{i=1}^{n} w_i x_i}{\sum_{i=1}^{n} w_i}\)
  • \(\text{Relationship: For symmetric distributions, Mean} = \text{Median} = \text{Mode}\)

易错点

  • ⚠️ Confusing median with mean: Students often forget that the median is the middle value (not the average of all values) and may incorrectly calculate it by averaging all data points instead of finding the middle position.
  • ⚠️ Forgetting to order data before finding the median: The median requires data to be arranged in ascending or descending order first. Many students attempt to find the median without sorting, leading to incorrect answers.
  • ⚠️ Misidentifying the mode or claiming there is no mode: Students may overlook that a dataset can have multiple modes (bimodal or multimodal), or they may incorrectly state there is no mode when all values appear with equal frequency (all values are modes).
  • ⚠️ Not recognizing how outliers affect different measures: Students fail to understand that the mean is heavily influenced by outliers while the median is resistant to them, leading to incorrect conclusions about which measure best represents the data.