3.6.1 Sampling Methods and Study Design¶
Understanding different sampling techniques (random, stratified, cluster) and evaluating the validity of study designs (observational vs. experimental).
定义¶
Sampling Methods and Study Design encompasses the techniques used to collect data from a population and the frameworks for conducting statistical studies.
Sampling Methods are techniques for selecting a subset of individuals from a population: - Random Sampling: Every member of the population has an equal and independent chance of being selected. This is the gold standard for reducing bias and is often achieved through simple random sampling where each possible sample of size \(n\) has equal probability of selection. - Stratified Sampling: The population is divided into homogeneous subgroups (strata) based on a characteristic, and random samples are taken from each stratum. This ensures representation across important subgroups and reduces sampling variability when strata are internally homogeneous. - Cluster Sampling: The population is divided into clusters (usually geographic or natural groupings), and entire clusters are randomly selected. This is cost-effective for geographically dispersed populations but may introduce more variability if clusters are heterogeneous. - Systematic Sampling: Every \(k\)-th individual is selected from an ordered population list, where \(k = \frac{N}{n}\) (population size divided by desired sample size). - Convenience Sampling: Selecting individuals who are easily accessible. This is non-random and often introduces bias.
Study Design refers to the overall structure and methodology of a statistical study: - Observational Studies: Researchers observe and record data without manipulating variables. Subjects are studied in their natural state. These studies can identify associations but cannot establish causation due to potential confounding variables. - Experimental Studies: Researchers actively manipulate an independent variable (treatment) and randomly assign subjects to treatment and control groups. Random assignment helps ensure groups are comparable, allowing researchers to establish causal relationships. - Randomized Controlled Trials (RCTs): A type of experimental design where subjects are randomly assigned to treatment or control groups, considered the gold standard for establishing causation.
Key Concepts: - Bias: Systematic error that causes results to deviate from the true population parameter. Sources include selection bias (non-random sampling), response bias (subjects' answers are influenced), and measurement bias. - Confounding Variable: A variable that affects the outcome and is associated with the treatment but is not the treatment itself, making it difficult to determine the true effect of the treatment. - Placebo Effect: The phenomenon where subjects show improvement simply because they believe they are receiving treatment, not because of the treatment itself. - Blinding: A technique where subjects (single-blind) or both subjects and researchers (double-blind) are unaware of treatment assignments, reducing bias.
核心公式¶
- \(P(\text{individual selected}) = \frac{n}{N}\) where \(n\) is sample size and \(N\) is population size
- \(\text{Sampling Error} = \text{Sample Statistic} - \text{Population Parameter}\)
- \(\text{Standard Error of Sample Mean} = \frac{\sigma}{\sqrt{n}}\) where \(\sigma\) is population standard deviation
- \(\text{Margin of Error} = z^* \cdot \frac{\sigma}{\sqrt{n}}\) where \(z^*\) is the critical value for confidence level
- \(\text{Relative Risk (RR)} = \frac{P(\text{outcome}|\text{treatment})}{P(\text{outcome}|\text{control})}\) used to compare treatment effectiveness
易错点¶
- ⚠️ Confusing observational studies with experimental studies: Students often forget that observational studies cannot establish causation, even if a strong association is found. Only random assignment in experiments can help control for confounding variables and support causal claims.
- ⚠️ Misunderstanding stratified vs. cluster sampling: Stratified sampling divides the population into homogeneous groups and samples from EACH stratum (ensuring all groups are represented), while cluster sampling divides into heterogeneous clusters and samples entire CLUSTERS (may miss some groups entirely).
- ⚠️ Ignoring bias in non-random sampling methods: Students may not recognize that convenience sampling, voluntary response sampling, and other non-random methods introduce selection bias that cannot be corrected through statistical analysis, making results unreliable for the entire population.
- ⚠️ Overlooking the importance of random assignment: In experiments, students sometimes focus only on the treatment effect without recognizing that random assignment is what allows causal conclusions—without it, confounding variables could explain the observed differences.