[Stata] Regressions with interaction effects (continuous x continuous) and plotting interaction

In this blog post, I will show you how to run a continuous by continuous interaction in Stata and how to plot it using marginsplot. A continuous by continuous interaction is a statistical concept used in regression models to test whether the effect of one continuous predictor variable on the outcome variable depends on the value of another continuous predictor variable. For example, you may want to know whether the effect of the size of a household (houssize) on self-rated health (hlthstat) varies by age.

The conceptual model could be drawn like this:

Statistically, the model can be put like this, comprising main and interaction effects together.

To illustrate this, I will use the nhanes2 dataset, which contains data from the second National Health and Nutrition Examination Survey conducted in the United States from 1976 to 1980. You can load this dataset in Stata by typing:

Stata
webuse nhanes2

To run a continuous by continuous interaction in Stata, you need to use c. before the variables in the regression command. This tells Stata to treat them as continuous variables and not as factors. For example, if you want to fit a linear regression model as follows:

  • Dependent variable: self-rated health (hlthstat)
  • Independent variables: size of household (houssize), age (age)
    • Control variables: sex (sex), race (race), and region (region)

Control variables are with i. operator since they should be treated as nominal variables.

Stata
regress hlthstat c.houssiz##c.age i.sex i.race i.region

This command will estimate the coefficients for the main effects of houssize, age, as well as the two-way interactions among them, while controlling for sex, race, and region. The output will look like this:

To interpret the coefficients, you need to consider the interaction terms as well as the main effects.

In terms of the main effect, the coefficient for household size is -.0596, which means that for a given age and other covariates, a one-unit increase in household size is associated with a -.0596-unit decrease in self-reported health status on average. The age variable also has negative coefficients, -0.29. A main effect is the effect of one independent variable on the outcome variable, averaging across the levels of the other independent variable(s).

However, this effect is not constant across different values of age, as indicated by the interaction term.

The coefficient for c.houssiz#c.age is .0010, which means that for a given set of covariates other than age, the effect of household size on self-reported health status increases by .0010 units for every one-unit increase in age on average. In other words, household size has a more negative impact on health status for younger people than for older people.

The coefficients for sex, race, and region are interpreted as dummy variables that compare each category to the reference category (male for sex, white for race, and NE for region). For example, the coefficient for sex is -.0612, which means that for a given set of covariates other than sex, females have a -.0612-unit lower self-reported health status than males on average. The coefficients for the interactions among sex, race, and region are not estimated in this model, as they are not specified in the regression command.

Visualizing interactions

To visualize these interactions, you can use marginsplot after running margins. For example, if you want to plot the predicted self-reported health status for different values of household size and age, you can type:

1st step: margins command

First, you need to identify the minimum and maximum (range) of the variables in the interaction effect.

Stata
codebook houssiz
codebook age 

Then, specifying the variables in interaction effect in the at() option results in predictions at each combination of values.

Stata
margins, at(age=(20(10)80) houssiz=(1(5)15)) vsquish 

In the option for the number of lines, the common method is to draw three lines for M-1SD, M, and M+1SD. You can check out the following post:

2nd step: marginsplot command

The following is the default option.

Stata
marginsplot
marginsplot, x(varname) // you can swithch x-axis 

The following is the option with no confidence interval (noci) and specified xlabel and ylabel ranges and titles with size and pattern options of the lines.

Stata
marginsplot, noci x(age) plotdimension(houssiz) ///
   xlabel(20(10)80, labsize(vsmall)) ylabel(2.5(0.5)4.5, labsize(vsmall))      
   ciopt(color(black%20)) recastci(rarea) 
   title("Age and Household size on Self-rated Health", size(small) color(black)) 
   xtitle("Age", size(small)) ytitle("Self-rated Health", size(small)) 
   plot1opts(msymbol(T)) plot2opts(lpattern(dash) msymbol(D)) plot3opts(lpattern(longdash_dot) msymbol(O))  
   legend(order(1 "Household size = 1" 2 "Household size = 6" 3 "Household size = 11") pos(3) size(vsmall))

For the plot options, you can refer to the following options to change the marker symbol (e.g., square, triangle, or circle), line pattern (e.g., dash, solid, dash_dot), and line color. You can refer to the name of the marker symbol and line pattern here.

https://stats.oarc.ucla.edu/stata/faq/how-can-i-view-different-marker-symbol-options/
https://www.stata.com/manuals13/g-4linepatternstyle.pdf

Now, you can see the interaction effect between age and household size, as the effect of household size turned more positive for the older adults more than 60 years old in comparison to the younger age.

Resources

  • October 8, 2023