3.3.4 Data Visualization and Displays¶
Creating and interpreting histograms, box plots, dot plots, and stem-and-leaf plots for single-variable data.
定义¶
Data Visualization and Displays refer to graphical representations of single-variable (univariate) data that help identify patterns, distributions, and key statistical features. The main types include:
Histograms: Graphical displays showing the frequency distribution of continuous or grouped data. The x-axis represents class intervals (bins), and the y-axis represents frequency or relative frequency. Bars are adjacent with no gaps, indicating continuous data.
Box Plots (Box-and-Whisker Plots): Visual summaries displaying the five-number summary of a dataset: minimum value, first quartile (\(Q_1\)), median (\(Q_2\)), third quartile (\(Q_3\)), and maximum value. The box spans from \(Q_1\) to \(Q_3\), with a line at the median. Whiskers extend to the minimum and maximum values (or to boundaries determined by the interquartile range for outlier detection).
Dot Plots: Simple displays where each data point is represented by a dot along a number line. Multiple dots at the same value are stacked vertically, making it easy to see the distribution and identify modes.
Stem-and-Leaf Plots: Displays that organize data by separating each value into a "stem" (leading digit(s)) and a "leaf" (trailing digit). The stem is listed vertically, and leaves are arranged horizontally, preserving the original data values while showing distribution shape. \nThese displays are essential tools for exploratory data analysis, allowing students to visualize center, spread, shape, and outliers in single-variable datasets.
核心公式¶
- \(\text{Relative Frequency} = \frac{\text{Frequency of Class}}{\text{Total Number of Observations}}\)
- \(\text{Interquartile Range (IQR)} = Q_3 - Q_1\)
- \(\text{Outlier Boundaries: Lower} = Q_1 - 1.5 \times \text{IQR}, \quad \text{Upper} = Q_3 + 1.5 \times \text{IQR}\)
- \(\text{Class Width} = \frac{\text{Maximum Value} - \text{Minimum Value}}{\text{Number of Classes}}\)
- \(\text{Frequency Density} = \frac{\text{Frequency}}{\text{Class Width}}\)
易错点¶
- ⚠️ Confusing frequency with relative frequency in histograms: Students often forget to convert frequencies to relative frequencies (proportions) when comparing distributions with different sample sizes, leading to incorrect interpretations of histogram heights.
- ⚠️ Misidentifying outliers in box plots: Students may incorrectly identify outliers by using the maximum/minimum values instead of applying the \(1.5 \times \text{IQR}\) rule, or they may fail to recognize that whiskers extend only to the farthest non-outlier data point.
- ⚠️ Misinterpreting stem-and-leaf plots: Students sometimes reverse the stem and leaf, read the plot incorrectly, or fail to account for the place value of the stem, resulting in incorrect data values.
- ⚠️ Choosing inappropriate class widths for histograms: Students may create histograms with too many or too few classes, obscuring the true shape of the distribution. The number of classes should typically be between 5 and 20, depending on the dataset size.