[Stata] Comparing model fit statistics across regression models (estimates, lrtest, fitstat)

In this blog post, I will show you how to compare the fit of different regression models in Stata using two approaches: comparing the Akaike information criterion (AIC) and the Bayesian information criterion (BIC), and performing a likelihood-ratio test (LR test).

Step 1: Estimate Model and Store the Estimates

First, we need to estimate two regression models and store their estimates for later comparison. For this example, I will use the auto dataset that comes with Stata. The dependent variable is the price of the car, and the independent variables are the weight, the length, the displacement, the gear ratio, and the foreign origin of the car. The first model includes only the weight, the length, and the displacement as predictors, while the second model adds the gear ratio and the foreign origin as well.

Stata
sysuse auto, clear
reg price weight length displacement
est store m1 // store estimates as "m1"
reg price weight length displacement gear_ratio foreign
est store m2 // store estimates as "m2"

Step 2: Compare Model Fit Statistics

Approach 1: Comparing AIC and BIC

One way to compare the fit of different models is to use information criteria such as AIC and BIC. These criteria measure the trade-off between model complexity and model fit. Lower values of AIC and BIC indicate better fit, but they also penalize models that have more parameters.

To compare AIC and BIC across models, we can use the est table command with the stats() option. This command produces a table that displays the coefficients and standard errors of each model, as well as some additional statistics that we can specify. In this case, we want to see the R-squared, AIC, and BIC of each model.

The command to generate the table is:

Stata
est table m1 m2, stats(r2 aic bic) 
// returns table with stats r2, aic, and bic 

From this table, we can see that both AIC and BIC are lower for model 2 than for model 1, indicating that model 2 has a better fit than model 1 after accounting for model complexity. (For AIC / BIC, the lower number means the better model fit ⭐⭐)

Tip. You can also use the user-created command fitstat to see the model fit statistics after the regression command.

Stata
ssc install fitstat 
reg y x1 x2 x3 
fitstat

You can also compare the fitstat by using the following commands.

Stata
sysuse auto, clear
reg price weight length displacement
fitstat, saving(m1) bic
reg price weight length displacement gear_ratio foreign
fitstat, using(m1) bic
Approach 2: Likelihood-ratio test (LR test)

Another way to compare the fit of different models is to use a likelihood-ratio test (LR test). This test compares the log-likelihoods of two nested models and tests whether the more complex model provides a significantly better fit than the simpler model.

To perform an LR test in Stata, we can use the lrtest command with the names of the stored estimates of the models we want to compare.

The command to perform an LR test for model 1 and model 2 is:

Stata
lrtest m1 m2 

The output of this command is:

From this output, we can see that the LR test statistic is 13.80 with 2 degrees of freedom, and the p-value is 0.0010. This means that we can reject the null hypothesis that model 1 fits as well as model 2, and conclude that model 2 provides a significantly better fit than model 1 (when p < 0.05).

  • October 31, 2023