[Stata] How to conduct hierarchical regression by using nestreg command

Hierarchical regression (also known as sequential regression or nested regression) is a method for building regression models by adding or removing predictors hierarchically based on their significance in explaining the variance in the dependent variable. It helps to assess whether it is worth adding the additional predictors in order to improve the model fit. For example, the model can be:

Parsimonious models of models are preferred in statistics, which refers to simple models with great explanatory predictive power. Parsimonious models explain data with a minimum number of parameters or predictor variables. In order to determine whether your model is parsimonious or not (whether the simpler model is better or not), there are several tests you can try. A likelihood ratio test is one of the most common.

Hierarchical regression is a method of assessing whether a model’s fit improves by adding a bundle of variables for each model. In this post, I will explain how to analyze hierarchical regression in Stata. To run a hierarchical regression in Stata, there are two ways: the first is to use the estimates and lrtest command, and the second is to use nestreg command.

estimates and lrtest commands

* Model 1: with predictor1 and predictor2
regress dependent_variable predictor1 predictor2
estimates store model1

* Model 2: with predictor1, predictor2, predictor3, and predictor4
regress dependent_variable predictor1 predictor2 predictor3 predictor4
estimates store model2

* Model 3: adding predictor5, predictor6
regress dependent_variable predictor1 predictor2 predictor3 predictor4 predictor5 predictor6
estimates store model3

** Then install ftest package for F-test between models 
ssc install ftest

* Compare models using the F test
ftest model1 model2

* Compare models using the F test
ftest model2 model3
Stata

In the results of the likelihood ratio test (lrtest), the likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods. If the p-value is less than your chosen significance level (e.g., 0.05), you can conclude that the model with the additional predictors is significantly better than the first model. In other words, it significantly improves the model fit.

For your information, if the sample size differs between the two models, the LR test cannot be performed. In this case, the sample size should be matched using the listwise/pairwise deletion or other missing data processing methods. For your information, nestreg command does not work with multiple imputation methods in Stata (mi estimate command).

nestreg command

The nestreg command can be used to simplify the above process. You can simply put the predictors in each model with (). This command provides similar functionality to the previously popular user-written package called hireg, but makes up for its shortcomings (just forget this if you have never heard of hireg).

nestreg, waldtable: reg dependent_variable (predictor1 predictor2) (predictor3 predictor4) (predictor5 predictor6) // report wald test results - waldtest is by default
nestreg, lrtable: reg dependent_variable (predictor1 predictor2) (predictor3 predictor4) (predictor5 predictor6) // report lrtest results
LR test results

The nestreg command outputs the wald test / LR test results, as shown above. Here, the significance of the P value means that the model has a significantly better model fit than the previous model.

This command also gives you the option to select wald test and lrtest. This is a very convenient package! 🙂

nestreg, lrtable: logit dependent_variable (predictor1 predictor2) (predictor3 predictor4) (predictor5 predictor6)

After nestreg: you can just put different types of regressions: betareg, clogit, cloglog, glm, intreg, logistic, logit, nbreg, ologit, oprobit, poisson, probit, qreg, regress, scobit, stcox, stcrreg, stintreg, streg, and tobit.

bys subgroup: nestreg, lrtable: logit dependent_variable (predictor1 predictor2) (predictor3 predictor4) (predictor5 predictor6)

By inserting the bys subgroup command (e.g., gender, age group), the same hierarchical regression can be conducted separately by subgroup.

nestreg, store(model): logit dependent_variable (predictor1 predictor2) (predictor3 predictor4) (predictor5 predictor6)

estimate restore model1
outreg2 using results.xls, alpha(0.001, 0.01, 0.05) 
estimate restore model2
outreg2 using results.xls, alpha(0.001, 0.01, 0.05) append
estimate restore model3
outreg2 using results.xls, alpha(0.001, 0.01, 0.05) append

If you put store(model) after nestreg command, the estimates will be saved as model1, model2, model3, … model_ (number of models in your nestreg command). Using this command and outreg2 command together, you can easily save the nested regression results into one Excel file. Here is an example from my dataset. You can clean the output by changing the variable name, label, and/or model names.

ftest model1 model2
ftest model2 model3
ftest model3 model4
... 
lrtest model1 model2
lrtest model2 model3
lrtest model3 model4
... 

Similar to the example above, we can use the stored estimates to perform an F-test or Likelihood ratio test to compare the fit between the models and get the results. Below are the results of the F-test, which can be interpreted as Model 2 having a significantly better model fit than Model 1.

Sample papers used hierarchical regressions

  1. Zeidler, M. R., Martin, J. L., Kleerup, E. C., Schneider, H., Mitchell, M. N., Hansel, N. N., Sundar, K., Schotland, H., Basner, R. C., Wells, J. M., Krishnan, J. A., Criner, G. J., Cristenson, S., Krachman, S., Badr, M. S., & SPIROMICS Research Group. (2018). Sleep disruption as a predictor of quality of life among patients in the subpopulations and intermediate outcome measures in COPD study (SPIROMICS). Sleep, 41(5). https://doi.org/10.1093/sleep/zsy044
  2. Crandall, A., Cheung, A., Young, A., & Hooper, A. P. (2019). Theory-based predictors of mindfulness meditation mobile app usage: A survey and cohort study. JMIR mHealth and uHealth7(3), e10794. https://doi.org/10.2196/10794
  3. Ehwi, R. J., Maslova, S., & Asante, L. A. (2021). Flipping the page: exploring the connection between Ghanaian migrants’ remittances and their living conditions in the UK. Journal of Ethnic and Migration Studies, 47(19), 4362–4385. https://doi.org/10.1080/1369183X.2021.1945915

Criticism against hierarchical regression

However, one of the drawbacks of hierarchical regression is that some estimation procedures for the regression coefficients and their associated standard errors may be potentially inappropriate when a hierarchical analysis based on theory is performed. Specifically, the hierarchical regression equations, the incremental or hierarchical tests, and the parameter estimation of this procedure may not correspond [1].

Additionally, hierarchical regression assumes that the predictor variables are independent of each other, which may not always be the case in real-world data sets [2]. This assumption can lead to multicollinearity, which can cause problems with the interpretation of the regression coefficients and the accuracy of the model predictions [2].

Therefore, it is important to carefully consider the assumptions and limitations of hierarchical regression before using it in data analysis.

[1] Tisak, J. (1994). Determination of the regression coefficients and their associated standard errors in hierarchical regression analysisMultivariate behavioral research29(2), 185-201.

[2] Corbin, M., Richiardi, L., Vermeulen, R., Kromhout, H., Merletti, F., Peters, S., … & Maule, M. (2012). Hierarchical regression for multiple comparisons in a case-control study of occupational risks for lung cancer. PloS one7(6), e38944.

Reference

Nested Regression OR Hierarchical Regression in Stata | The Data Hall

How to perform hierarchical multiple regression in Stata using ‘nestreg’ command (April 2021) – YouTube

  • March 30, 2023