[Stata] Propensity Score Matching: psmatch2, teffects

Ref: https://www.summitllc.us/propensity-score-matching

Propensity score matching (PSM) is a statistical technique that allows us to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM is widely used in observational studies where random assignment to treatments is not feasible.

Unlike a perfectly designed Randomized Controlled Trial (RCT) where treatment is assigned randomly, in observational studies, people often select into treatment, or are selected based on certain characteristics. This creates selection bias.

Imagine you’re evaluating a job training program. You compare the wages of people who chose to attend the program with those who didn’t. You find the attendees have higher wages! Was it the program? Maybe. But maybe the people who signed up were already more motivated, had better existing skills, or more supportive networks – factors that would lead to higher wages regardless of the training. You’re comparing motivated individuals (apples) with potentially less motivated ones (oranges). Your simple comparison mixes the true treatment effect with these pre-existing differences. These underlying factors that influence both participation and the outcome are called confounders.

Why We Need Propensity Score Matching (PSM)

This is where Propensity Score Matching (PSM) comes in. It’s a statistical technique designed to reduce selection bias in observational studies by mimicking some of the characteristics of an RCT.

The core idea is to create a comparison group (controls) that is as similar as possible to the treatment group on observed pre-treatment characteristics. If we can achieve this “balance” on observable confounders, we can be more confident that any remaining difference in outcomes between the groups is due to the treatment itself.

How does PSM achieve this?

Instead of trying to match individuals on every single observable characteristic (which becomes impossible with many variables – the “curse of dimensionality”), PSM summarizes all relevant observed confounders into a single number: the Propensity Score.

The Propensity Score is the predicted probability that an individual receives the treatment, given their set of observed baseline characteristics.

P(Treatment = 1 | Covariates)

Think of it like this: If two individuals – one who received the treatment and one who didn’t – have the same propensity score, it means that based on everything we could observe about them before the treatment, they had a similar likelihood of receiving it. They are comparable in terms of their observable characteristics.

PSM then uses these scores to match each treated individual with one or more control individuals who have very similar propensity scores.

Propensity scores: Everything you need to know in 5min

Watch this video on YouTube.

General steps for propensity score matching

Strictly speaking, if it is not an RCT, we do not use the term “control group.” Instead, we use “comparison group.”

Step 1: Identify the treatment and comparison groups

Determine which group received the treatment or intervention (treatment group) and which group did not (comparison group).

Step 2: Select confounding variables

Identify the variables that may influence both the treatment assignment and the outcome of interest. These are called confounding variables.

Step 3: Estimate propensity scores

Using logistic regression or other suitable methods, estimate the probability that each individual will receive the treatment based on their confounding variables. This probability is referred to as the propensity score.

Step 4: Match individuals based on propensity scores

Match each individual in the treatment group with one or more individuals in the comparison group who have similar propensity scores. This can be achieved using various methods, such as: a. One-to-one matching: Each treated individual is matched with a comparison individual having the closest propensity score. b. Many-to-one matching: Each treated individual is matched with multiple comparison individuals with similar propensity scores. c. Caliper matching: Each treated individual is matched with control individuals within a specified range (caliper) of propensity scores.

Step 5: Assess balance

Check if the matched treatment and comparison groups are balanced in terms of the confounding variables. This can be done by comparing the distributions of these variables between the two groups using statistical tests or graphical methods.

Step 6: Estimate the treatment effect

After obtaining balanced groups, estimate the treatment effect by comparing the outcomes between the matched treatment and comparison groups. This can be done using appropriate statistical methods, such as t-tests, regression analysis, or survival analysis, depending on the nature of the outcome variable.

Understanding with an example

Suppose we want to study the effect of a new job training program on annual income.

Step 1: Identify the treatment and comparison groups

Treatment group: Individuals who participated in the job training program
Comparison group: Individuals who did not participate in the job training program

Step 2: Select confounding variables

Confounding variables: Age, gender, education level, and previous work experience

Step 3: Estimate propensity scores

Using logistic regression, we estimate the propensity scores for each individual based on their age, gender, education level, and previous work experience.
For example, a 35-year-old female with a bachelor’s degree and 5 years of work experience might have a propensity score of 0.6, indicating a 60% probability of participating in the job training program.

Step 4: Match individuals based on propensity scores

One-to-one matching:
- John (treatment) with a propensity score of 0.75 is matched with Sarah (comparison) who has a propensity score of 0.74.
Many-to-one matching:
- Emily (treatment) with a propensity score of 0.8 is matched with Michael (comparison, propensity score 0.81), Jessica (comparison, propensity score 0.79), and David (comparison, propensity score 0.82).
Caliper matching:
- Using a caliper of 0.05, Rachel (treatment) with a propensity score of 0.9 is matched with Amanda (comparison, propensity score 0.89) and William (comparison, propensity score 0.92), as their propensity scores are within the specified range.

Step 5: Assess balance

Compare the distributions of age, gender, education level, and previous work experience between the matched treatment and comparison groups.
Use statistical tests (e.g., t-tests for continuous variables, chi-square tests for categorical variables) or graphical methods (e.g., histograms, box plots) to assess balance.

Step 6: Estimate the treatment effect

Compare the annual income between the matched treatment and comparison groups.
Use a t-test to determine if there is a significant difference in mean annual income between the two groups.
For example, if the mean annual income for the treatment group is $45,000 and the mean annual income for the comparison group is $40,000, with a p-value of 0.01, we can conclude that the job training program has a significant positive effect on annual income.

Stata Commands for matching

Differences between teffects, psmatch2, and kmatch:

teffects is a built-in Stata command, while psmatch2 and kmatch are user-written commands.
teffects supports various methods for estimating treatment effects, including propensity score matching, inverse-probability weighting, and regression adjustment. psmatch2 and kmatch focus specifically on propensity score matching.
teffects and psmatch2 allow for easy estimation of the average treatment effect (ATE) and the average treatment effect on the treated (ATT). kmatch focuses on estimating the ATT.
psmatch2 and kmatch provide additional options for assessing balance and overlap, such as common support graphs and covariate balance tables.

In this blog post, we’ll walk through the steps of conducting PSM in Stata using the webuse nlswork dataset. First, we need to load the National Longitudinal Survey of Young Working Women (nlswork) dataset into Stata. This can be done using the webuse command:

Stata

webuse nlswork, clear

Step 1: Specify the treatment, outcome, and confounding variables

Define your outcome variable, treatment variable, and confounders.

Treatment variable: union
Outcome variable: ln_wage
Confounding variables: age, race, msp, collgrad, not_smsa, c_city, south, occ_code, ttl_exp, tenure, hours

Step 2: Perform propensity score matching using the `teffects` command

The psmatch2 command in Stata is used to estimate propensity scores and conduct the matching. Suppose we have a binary treatment variable treat and a set of covariates x1, x2, …, xn. The basic syntax is as follows:

Stata

// basic syntax 
ssc install psmatch2
psmatch2 treat x1 x2 x3 xn, out(outcome) common

In our example, we can perform the matching using the code:

Stata

// PSM code - Outcome: Wage / Treatment: Union 
psmatch2 union age race msp collgrad not_smsa c_city south occ_code ttl_exp tenure hours, out(ln_wage) common logit

This command will match each treated observation (union member) with one or more non-treated observations (non-union members) based on the propensity score, which is calculated from the specified confounders.

The difference in ATT is approximately 0.197. This means that, on average, being in a union is associated with an increase in wages by about 19.7% after propensity score matching.

You can also interpret the significance of this difference between the treated and comparison groups by using t-statistics. The psmatch2 output table provides the T-statistic (“T-stat”) for the estimated Average Treatment Effect on the Treated (ATT), but not the exact p-value. However, you can assess statistical significance using common rules of thumb, especially when the sample size is reasonably large (allowing the t-distribution to approximate the standard normal distribution).

General Rules of Thumb (for two-tailed tests):

Significance at p < 0.001 (0.1% level): If the absolute value of the T-stat is greater than approximately 3.291 (i.e., |T-stat| > 3.291).
Significance at p < 0.05 (5% level): If the absolute value of the T-stat is greater than approximately 1.96 (i.e., |T-stat| > 1.96).
Significance at p < 0.01 (1% level): If the absolute value of the T-stat is greater than approximately 2.58 (i.e., |T-stat| > 2.58).
Significance at p < 0.10 (10% level): If the absolute value of the T-stat is greater than approximately 1.645 (i.e., |T-stat| > 1.645).

In this case example, for both unmatched (t = 28.67) and ATT (t = 16.49) cases, we see significant differences at the level of 1% (p < .001).

Advanced: Average Treatment Effect on the Treated (ATET)

Average Treatment Effect (ATE): This measures the expected effect of the treatment across the entire population, regardless of whether they received the treatment or not. It answers the question, “What would be the average effect of the treatment if we were to apply it to the whole population?”
Average Treatment Effect on the Treated (ATET): This measures the effect of the treatment only on those who actually received the treatment. It answers the question, “What is the average effect of the treatment on those individuals who were actually treated?”

By default, the teffects psmatch command performs the analysis based on the average treatment effect (ATE). The teffects psmatch command with the atet option provides the Average Treatment Effect on the Treated (ATET):

Stata

// PSM code - Outcome: Wage / Treatment: Union 
teffects psmatch (ln_wage) (union age race msp collgrad not_smsa c_city south occ_code ttl_exp tenure hours), atet

ATET for Union Membership:

The coefficient for union is 0.198, with a standard error of 0.01.
This suggests that being in a union increases the natural logarithm of wages by about 19.8% for union members, compared to what their wages would have been if they were not in a union.

Step 3: Assess the balance of confounding variables after matching

In PSM, each treated unit is matched with one or more non-treated units that have a similar propensity score—the predicted probability of receiving the treatment, conditional on observed covariates.

A key goal of PSM is covariate balance: after matching, the distribution of covariates should be similar between the treated and comparison groups, approximating what would be achieved through randomization. Covariate balance is typically assessed by comparing standardized mean differences, variance ratios, or conducting statistical tests before and after matching.

To assess the quality of matching, you can use the psgraph and pstest command to check for balance in the covariates after matching. In other words, we should perform the same model before running pstest.

By using psgraph command, you can see the propensity score histogram by treatment status.

The graph displays histograms representing the distribution of estimated propensity scores for two groups:

The blue bars (below the horizontal line) represent the distribution of propensity scores for the Untreated group (the comparison group).
The red bars (above the horizontal line) represent the distribution of propensity scores for the Treated group.
The x-axis shows the range of the calculated propensity scores, and the height of the bars indicates the number (or frequency) of individuals within each propensity score bin for each group.

Stata

psgraph

The graph shows a substantial region of overlap between the propensity score distributions of the treated and untreated groups.

While the untreated group (blue bars) is more concentrated at lower propensity scores (roughly < 0.2) and the treated group (red bars) shows more observations at higher propensity scores (roughly > 0.6), there is considerable overlap across most of the propensity score range shown, particularly between approximately 0.2 and 0.4. In these middle ranges, there are sizable numbers of both treated and untreated individuals. This visual overlap suggests that for many individuals in the treated group, there are individuals in the untreated group with similar propensity scores, and vice versa, which is essential for matching.

It looks great, but it is difficult to interpret with statistical significance. So, we will use pstest command for that purpose. The pstest command in Stata provides a balance test after propensity score matching. It checks whether the covariates in the treated and comparison groups are balanced, meaning they have similar distributions, which is crucial for unbiased estimation of treatment effects. Here’s an interpretation of your output:

Stata

pstest, graph

%bias: This column shows the percentage bias for each covariate between the treated and comparison groups. After matching, the biases should be lower, indicating better balance.
t-test: This tests whether the means of each covariate are statistically different between the treated and comparison groups. A high p-value (p>|t|) suggests no significant difference.
V(T)/V©: The variance ratio compares the variances of each covariate in the treated and comparison groups. A ratio close to 1 indicates a similar variance.

From the output, it appears that the matching has improved the balance between the treated and comparison groups, as indicated by the reduced %bias across covariates. The variance ratios are also close to 1 for most covariates, except for a few marked with an asterisk (*), which indicates that the variance ratio is outside the acceptable range of [0.94; 1.06].

We focus on analyzing t-tests between the treated and the comparison. It appears that the south and ttl_exp, tenure, and race variables are statistically significantly different between groups (p < .05). We may need to consider alternative matching algorithms (e.g., nearest neighbor matching) or adjustments to covariates to improve this output.