3.6.2 Bias and Validity in Data Collection¶

Identifying sources of bias in surveys and studies, and assessing whether data collection methods produce valid and reliable results.

定义¶

Bias and Validity in Data Collection refers to the systematic evaluation of data collection methods to identify sources of error and assess the quality of results. Bias is any systematic deviation from the true value that consistently skews results in one direction, while validity measures whether a study actually measures what it claims to measure. Key concepts include:

Bias Types: - Selection Bias: Occurs when the sample is not representative of the population, leading to systematic differences between the sample and population - Response Bias: Results from how respondents answer questions, including social desirability bias, non-response bias, and measurement error - Sampling Bias: Arises from the sampling method itself, such as convenience sampling or voluntary response sampling - Undercoverage: When certain groups in the population are not adequately represented in the sampling frame - Nonresponse Bias: When individuals selected for the sample refuse to participate or cannot be reached

Validity and Reliability: - Internal Validity: The extent to which the study design allows for causal conclusions about the relationship between variables - External Validity: The extent to which results can be generalized to other populations, settings, or times - Reliability: The consistency and reproducibility of measurements; a measure can be reliable without being valid - Validity requires reliability, but reliability does not guarantee validity

Valid Data Collection Methods: - Random sampling reduces selection bias and increases representativeness - Stratified sampling ensures adequate representation of subgroups - Cluster sampling is efficient for geographically dispersed populations - Double-blind studies minimize response bias and experimenter bias - Careful survey design with clear, unbiased questions improves data quality

核心公式¶

\(\text{Sampling Error} = \text{Sample Statistic} - \text{Population Parameter}\)
\(\text{Bias} = E[\hat{\theta}] - \theta\)
\(\text{Mean Squared Error (MSE)} = \text{Variance} + \text{Bias}^2\)
\(\text{Margin of Error} = z^* \cdot \sqrt{\frac{p(1-p)}{n}}\)
\(\text{Confidence Interval} = \text{Point Estimate} \pm (\text{Critical Value}) \times (\text{Standard Error})\)

易错点¶

⚠️ Confusing bias with random error: Students often fail to recognize that bias is systematic and directional, while random error is unpredictable and averages to zero. A biased sample can have low variability but still be systematically wrong.
⚠️ Assuming random sampling automatically eliminates all bias: While random sampling reduces selection bias, it does not address response bias, nonresponse bias, or measurement error. Students must consider all potential sources of bias.
⚠️ Treating reliability and validity as equivalent: Students may incorrectly assume that a reliable measurement is automatically valid. A survey can consistently measure something (reliable) but measure the wrong thing or measure it incorrectly (invalid).
⚠️ Overlooking the impact of sample size on bias: Students sometimes believe that increasing sample size reduces bias, when in fact bias is independent of sample size. Large biased samples are still biased; only random error decreases with larger samples.