跳转至

3.4.1 Scatter Plots and Bivariate Data

Creating and interpreting scatter plots to visualize relationships between two quantitative variables.

定义

A scatter plot is a graphical representation of the relationship between two quantitative variables, where each data point is plotted as an ordered pair \((x, y)\) on a coordinate plane. The independent variable (explanatory variable) is typically plotted on the horizontal axis (x-axis), while the dependent variable (response variable) is plotted on the vertical axis (y-axis). Scatter plots are used to visualize patterns, trends, and associations between bivariate data. The strength and direction of the relationship can be assessed by observing how closely the points cluster around a potential trend line. A positive association indicates that as one variable increases, the other tends to increase; a negative association indicates that as one variable increases, the other tends to decrease. The relationship may be linear (points follow a straight-line pattern) or nonlinear (points follow a curved or more complex pattern).

核心公式

  • \(\text{Correlation coefficient: } r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2} \sqrt{\sum_{i=1}^{n}(y_i - \bar{y})^2}}\)
  • \(\text{Slope of least-squares regression line: } b = r \cdot \frac{s_y}{s_x}\)
  • \(\text{Least-squares regression line: } \hat{y} = a + bx\), where \(a = \bar{y} - b\bar{x}\)
  • \(\text{Residual: } e_i = y_i - \hat{y}_i\)
  • \(\text{Coefficient of determination: } R^2 = r^2\)

易错点

  • ⚠️ Confusing correlation with causation: A strong correlation between two variables does not imply that one variable causes changes in the other; there may be confounding variables or the relationship may be coincidental.
  • ⚠️ Misinterpreting the direction of association: Students may incorrectly identify whether a relationship is positive or negative by not carefully observing whether points trend upward (positive) or downward (negative) from left to right.
  • ⚠️ Ignoring outliers and their impact: Failing to recognize that outliers can significantly affect the correlation coefficient and regression line, potentially distorting the overall pattern of the relationship.
  • ⚠️ Incorrectly reading or plotting points: Making errors when identifying coordinates from a scatter plot or when plotting new data points, leading to incorrect interpretations of the relationship between variables.