[Stata] Mediation Analysis with medsem in Stata
Mediation analysis helps us understand the “how” or the “why” behind a relationship between an independent variable (X) and a dependent variable (Y). Instead of X directly causing Y, we investigate if X influences Y through an intermediate variable, known as a mediator (M). The classic pathway is X → M → Y.
Let’s think about an example. Structural and interpersonal discrimination can create significant barriers to help-seeking, especially in mental health and social service settings. This model proposes that discriminatory experiences (X) may lead to internalized skepticism, fear, or a sense of exclusion that reduces an individual’s trust or sense of fit within service systems (M). This, in turn, can shape whether and how individuals engage with available resources (Y). Mediation analysis allows for the examination of how perceived exclusion mediates the relationship between discrimination and service utilization, highlighting the psychological mechanisms through which inequality becomes embodied in behavioral outcomes.
Understanding Mediation Approaches
Mediation analysis helps researchers understand how or why one variable affects another. One of the most widely taught and cited approaches to testing mediation came from Baron and Kenny’s 1986 paper, published in the Journal of Personality and Social Psychology. Their procedure shaped how an entire generation of researchers conducted mediation analysis. But while foundational, it also introduced assumptions that later proved problematic.
Baron and Kenny proposed that to establish mediation, you must go through four steps using regression:
- Total Effect (X → Y): Show that the independent variable (X) significantly predicts the outcome (Y).
→ If there’s no total effect, they argued, there’s no reason to test for mediation. - X Predicts Mediator (X → M): Show that X significantly predicts the mediator (M).
- Mediator Predicts Y (M → Y controlling for X): Show that M significantly predicts Y, even when X is included in the model.
- Reduction in X’s Effect (X → Y controlling for M): Show that the direct effect of X on Y is smaller (or non-significant) when M is included—suggesting that M carries part of the effect.
Mediation was said to occur if the effect of X on Y was reduced when M was accounted for. If the direct path became non-significant, this was often referred to as “full mediation.” If it remained significant but smaller, “partial mediation.” Baron and Kenny’s framework was influential because it was intuitive and easy to implement with standard regression. But it has several key limitations, including no formal test of the indirect effect.
But this approach has serious limitations. In 2010, Zhao, Lynch, and Chen published a widely cited critique arguing that mediation can exist even when the total effect is not statistically significant. Their work shifted the field’s focus from the total effect to the indirect effect—the core of what mediation is trying to test.
Zhao and colleagues argued that the key question in mediation is whether X affects Y through a mediator M—not whether X affects Y overall. Based on this logic, they proposed five types of relationships:
- Complementary mediation: Both direct and indirect effects are significant and in the same direction.
- Competitive mediation: Both are significant but point in opposite directions.
- Indirect-only mediation: Only the indirect effect is significant.
- Direct-only non-mediation: Only the direct effect is significant.
- No-effect non-mediation: Neither is significant.
More contemporary approaches leverage Structural Equation Modeling (SEM). SEM offers a more flexible and robust framework, allowing for simultaneous estimation of all paths, providing overall model fit statistics, and more accurately estimating standard errors, especially for indirect effects (often via bootstrapping or Monte Carlo methods).
A crucial point in mediation modeling is the assumption of temporal precedence: X must occur before M, and M must occur before Y, for a causal interpretation to be strong. This is why mediation analysis is ideally suited for longitudinal data, where measurements are taken over time, allowing researchers to establish this sequence (Cole & Maxwell, 2003). However, cross-sectional data (where all variables are measured at a single point in time) are frequently used. This can be acceptable if the theoretical basis for the proposed causal order is very strong and the possibility of reverse causality (e.g., Y influencing M, or M influencing X) is conceptually minimal or highly implausible (Hayes, 2018). Always interpret cross-sectional mediation with this caution in mind.
Stata’s user-written command medsem builds upon the sem
command. It doesn’t re-run regressions in the old style; instead, it takes the results from an already-estimated SEM model and applies interpretative frameworks, including logic similar to an adjusted Baron & Kenny approach or the more modern Zhao, Lynch, and Chen (2010) criteria, to analyze the mediation effects.
Stata medsem
package
1. Installing medsem (If you haven’t already)
ssc install medsem, replace
2. Loading Data
We’ll use Stata’s built-in nhanes2.dta dataset. This dataset comes from the Second National Health and Nutrition Examination Survey.
webuse nhanes2, clear
3. Defining and Running the SEM Model
Let’s hypothesize that age (X) influences systolic blood pressure (bpsystol, Y), and this relationship is partly mediated by body weight (weight, M).
- X (Independent): age
- M (Mediator): weight
- Y (Dependent): bpsystol
The SEM model for this simple mediation is:
- Path a: age → weight
- Path b: weight → bpsystol (controlling for age)
- Path c’: age → bpsystol (the direct effect, controlling for weight)
In Stata’s sem syntax:
sem (weight <- age) (bpsystol <- weight age)
- (weight <- age): weight is regressed on age.
- (bpsystol <- weight age): bpsystol is regressed on weight and age.
Run this command. Stata will output the SEM results.
4. Using medsem for Mediation Analysis
Now that the sem model has been estimated, medsem uses those results.
medsem, indep(age) med(weight) dep(bpsystol)
- indep(age): Specifies age as the independent variable.
- med(weight): Specifies weight as the mediator.
- dep(bpsystol): Specifies bpsystol as the dependent variable.
5. Interpreting the medsem Output (Default: Adjusted Baron & Kenny Logic)
By default, medsem applies logic based on an adjusted Baron & Kenny approach (as per Iacobucci et al., 2007, cited in medsem help). It examines the significance of:
- Path X → M (coefficient for age predicting weight in the sem output)
- Path M → Y (coefficient for weight predicting bpsystol in the sem output)
- Path X → Y (direct effect; coefficient for age predicting bpsystol while controlling for weight in the sem output)
Based on the significance of these paths (and implicitly a Sobel test for the indirect effect a*b), it will classify the type of mediation (e.g., “no mediation,” “complete mediation,” “partial mediation”).
6. Alternative: Zhao, Lynch & Chen (ZLC) Approach with Monte Carlo
A more contemporary and often preferred framework is by Zhao, Lynch, and Chen (2010). This approach offers a more nuanced classification of mediation types and typically relies on a more robust test of the indirect effect, like Monte Carlo simulation (which medsem uses for this option).
First, re-run your sem command, perhaps quietly if you just want the medsem output.
medsem, indep(age) med(weight) dep(bpsystol) zlc mcreps(1000)
- zlc: Tells medsem to use the Zhao et al. (2010) framework.
- mcreps(1000): Specifies 1000 Monte Carlo replications to generate a confidence interval and p-value for the indirect effect. For publication, 5000 or 10000 replications are often recommended.
The ZLC output provides:
- The Indirect effect (a*b): The product of the path age → weight and weight → bpsystol.
- A p-value for this indirect effect (from the Monte Carlo test).
- The Direct effect (c’): The path age → bpsystol controlling for weight.
- A p-value for the direct effect.
- A classification based on ZLC, such as:
- “Indirect-only mediation” (full mediation: indirect effect is significant, direct effect is not).
- “Complementary mediation” (partial mediation: indirect and direct effects are significant and in the same direction).
- “Competitive mediation” (partial mediation: indirect and direct effects are significant but in opposite directions).
- Or types of non-mediation.
7. Other Useful Options (with ZLC or default)
stand
: Displays results based on standardized coefficients from the sem model.rit
: Reports the ratio of the indirect effect to the total effect (Indirect / (Indirect + Direct)).rid
: Reports the ratio of the indirect effect to the direct effect (Indirect / Direct).
8. Adding Covariates
If you need to control for other variables, include them in your initial sem model. medsem will then analyze the specified X, M, Y relationship within that broader model. For example, to control for female (gender, where 1=female, 0=male in nhanes2):
sem (weight <- age female) (bpsystol <- weight age female)
medsem, indep(age) med(weight) dep(bpsystol) zlc mcreps(1000)
The medsem command itself doesn’t list female, as it’s a covariate for the overall model, not part of the specific X-M-Y chain you’re asking medsem to focus on.
References
- Baron, R. M., & Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173–1182.
- Cole, D. A., & Maxwell, S. E. (2003). Testing mediational models with longitudinal data: Questions and tips in the use of structural equation modeling. Journal of Abnormal Psychology, 112(4), 558–577.
- Hayes, A. F. (2018). Introduction to mediation, moderation, and conditional process analysis: A regression-based approach (2nd ed.). The Guilford Press.
- Iacobucci, D., Saldanha, N., & Deng, X. (2007). A Mediation on Mediation: Evidence That Structural Equation Models Perform Better Than Regressions. Journal of Consumer Psychology, 17(2), 140-154. (Cited in medsem help)
- Zhao, X., Lynch, J. G. Jr., & Chen, Q. (2010). Reconsidering Baron and Kenny: Myths and Truths about Mediation Analysis. Journal of Consumer Research, 37(August), 197-206. (Cited in medsem help)