[Stata] Understanding and Implementing Survey Weights in Stata

​Survey weighting is a statistical technique employed to adjust survey data to better represent the target population. This process involves assigning weights to survey responses to correct for biases arising from sampling design, non-response, or other discrepancies between the sample and the population. The primary objective of weighting is to ensure that survey estimates accurately reflect the characteristics and opinions of the entire population.​

Why Survey Weights Matter

Most large-scale surveys employ complex sampling designs rather than simple random sampling for practical and economic reasons. These designs typically include:

  1. Stratification – Dividing the population into subgroups before sampling
  2. Clustering – Sampling groups of elements rather than individuals
  3. Unequal selection probabilities – Oversampling certain population subgroups
  4. Non-response – Adjusting for individuals who don’t participate

You can also find an example in the description of non-response errors from the General Social Survey.

Without proper weighting, analyses would produce biased estimates that fail to represent the target population accurately. Survey weights help to:

  • Correct for unequal probabilities of selection
  • Adjust for non-response bias
  • Account for post-stratification to known population totals
  • Enable valid inferences about the entire population

Types of Survey Weights

  • Base Weights: Base weights are the inverse of the probability of selection. They compensate for the unequal chances of being included in the sample.
  • Non-response Adjustment Weights: These weights adjust for individuals who were selected but did not participate, helping to reduce potential non-response bias.
  • Post-stratification Weights: Post-stratification weights align sample demographics with known population totals from sources like the census.
  • Final Weights: Final weights typically combine all the above adjustments into a single weight variable that should be used in the analysis (e.g., finalwt in NHANES data).
  • Subsample Weights: Many surveys collect certain measures on only a subset of participants. Special subsample weights (like leadwt in NHANES data) account for this additional stage of sampling.

Implementing Survey Weights in Stata

Setting Up the Survey Design

Stata
* Basic survey setup
svyset psu [pweight=finalwt], strata(stratavar)

* For surveys with multiple stages
svyset psu [pweight=finalwt], strata(stratavar) || secondarysamplingunit

Common Analytical Techniques

Descriptive Statistics

Stata
* Weighted means
svy: mean varlist
 
* Weighted proportions
svy: proportion categorical_var
svy: tabulate var1 var2

Regression Analyses

Stata
* Weighted linear regression
svy: regress y x1 x2 x3

* Weighted logistic regression
svy: logistic outcome predictors

Using Weights in NHANES Example

NHANES employs a complex design with oversampling of certain demographic groups. It provides multiple weights:

  • finalwgt – This is the primary weight variable used for general analyses, such as estimating means and proportions for the full NHANES sample.
  • leadwt – This weight is specific to analyses involving lead exposure data, as only a subset of participants was tested for lead levels.

To properly apply weights in Stata, use the svyset command. This tells Stata to apply the final sampling weight when running analyses. However, NHANES also includes primary sampling units (PSUs) and strata, which should be incorporated if available:

Stata
webuse nhanes2 

* For analyzed examined participants
svyset psu [pweight=finalwgt], strata(strata)

* For blood lead analyses (subsample)
svyset psu [pweight=leadwt], strata(strata)

Reference

For more advanced understanding, you can also watch this video:

SBE CCC: Using weights when analyzing survey data: Descriptive Statistics vs. Regression Modeling
  • March 19, 2025