[Stata] Obtaining beta regression coefficients: centering and standardizing variables (beta, listcoef, center)
Beta coefficients and unstandardized coefficients are two different ways of presenting the results of regression analyses.
Why do we need to center variables?
For this question, Dr. Christian Geiser explains the reasons for centering variables before running statistical analyses such as regression or multilevel models. Here is the summary:
- Centering: Centering means subtracting a constant value from the scores before conducting a statistical analysis, usually at the sample mean. The values are grouped around the sample mean, and zero indicates the sample mean.
- Benefits of Centering: Centering makes scores more interpretable, including information about the sample mean in the score. It also makes regression results more interpretable when running regression analysis with interaction terms, quadratic terms, and in multi-level analysis.
- The Intercept in Regression Analysis: The intercept is the expected value of the dependent variable when all predictor variables are zero. The intercept is meaningless when one or more predictor variables have no meaningful zero points, such as personality, intelligence, emotions, and subjective well-being. Centering predictor variables that have no meaningful zero points can make the intercept more interpretable.
- Moderated Regression Analysis: Centering is necessary when running moderated regression analysis with interaction terms because the lower-order slope coefficients are affected by centering or non-centering. Centering can make slope coefficients more interpretable.
- Non-Linear Terms in Regression Analysis: Centering is also necessary for non-linear regression analysis with quadratic or cubic terms to remove collinearity.
- Multi-Level Analysis: Centering is essential in multi-level analysis for the interpretation of results. There are two choices for centering: grand mean centering and group mean centering, which can produce different results. It is important to study and understand this issue in detail.
In conclusion, centering is important for interpretability, particularly with interaction terms, quadratic terms, or multi-level analysis.
Differences between beta coefficients and unstandardized coefficients
If you center variables, they could be interpreted as beta coefficients (or standardized coefficients).
Unstandardized coefficients:
Unstandardized coefficients are calculated directly from the raw data without standardizing the variables. These coefficients represent the change in the dependent variable for a one-unit change in the independent variable, holding all other variables constant. The units of unstandardized coefficients are the same as those of the dependent variable.
- Pros: Unstandardized coefficients provide more interpretable and practical results for specific applications or decision-making processes. They can provide insights into the actual change in the dependent variable associated with a one-unit change in the independent variable. Unstandardized coefficients can be more easily translated into real-world terms, which can be helpful for decision-making and policy recommendations.
- Cons: they may not be directly comparable across different variables in the same model.
Beta coefficients (standardized coefficients):
Beta coefficients are the result of standardizing both the independent and dependent variables before running the regression analysis. They provide a measure of the strength and direction of the relationship between the independent variable(s) and the dependent variable, with all variables measured on the same scale.
- Pros: Beta coefficients are useful for comparing the relative importance of predictor variables in a model. They help in understanding which independent variable has the most influence on the dependent variable. Since the coefficients are standardized, they provide a more straightforward comparison across variables with different scales or units of measurement.
How to get beta coefficients in Stata
Approach 1: Adding beta
option in reg
command
In Stata, you can obtain beta coefficients (standardized coefficients) when running a linear regression analysis by using the beta
option with the regress
command.
reg dv iv1 iv2 iv3 iv4 iv5, beta
One limitation of the beta option mentioned above is that it provides the standardized coefficients but does not provide the confidence intervals for them. You can standardize the variables and run the regression to obtain the confidence intervals for beta coefficients. Doing so will yield the same coefficient sizes as when using the beta option with unstandardized variables in the regression. Further, you will get the corresponding confidence intervals for the beta coefficients.
Approach 2: listcoef
, std after reg
command
net install sg152
reg dv iv1 iv2 iv3 iv4 iv5, beta
listcoef, std
You can see the beta coefficients obtained from listcoef, std
command are the same with regression with beta options.
bStdX: one standard deviation change in x leads to xx in y
bStdY: every one unit change in x leads to xx standard deviation change in Y
bStdXY: one standard deviation change in x leads to xx standard deviation change in Y
Approach 3: Centering / Standardizing Variables and Run Regressions
A. Center command
For your information, the centering or standardization only applies to the continuous variables. You DO NOT need to center/standardize categorical or binary variables. Below are two methods to standardize variables in Stata:
The first method is to use the user-written package, center
. Here is the simple command 🙂 This will replace your original variable with the standardized variable.
ssc install center // install command
center var1 var2 var3 var4, inplace standardize
standardize
option creates a variable containing the standardized values (zero sample mean and unit sample variance). The default is to create a variable containing the centered values (zero sample mean).
B. Use the Loop Function
You can use the loop for the variables you would love to standardize. By using the following command, you can replace the old variable. This will replace your original variable with the standardized variable.
foreach var of varlist var1 var2 var3 var4 {
qui sum `var'
replace `var'= (`var'- r(mean))/r(sd)
}
You can create new standardized (or centered) variables without removing the unstandardized ones. You can do this by using gen `var'_std
as shown below.
foreach var of varlist var1 var2 var3 var4 {
qui sum `var'
gen `var'_std= (`var'- r(mean))/r(sd)
}
Note
For your information, centering and standardizing are similar but different from each other. For more information, please check out this post.