[Stata] One-way ANOVA (oneway and robnova)

ANOVA, or analysis of variance, is a statistical method that tests the differences among the means of two or more groups. It is commonly used to compare the effects of different treatments or factors on a continuous outcome variable.

In this blog post, I will show you how to conduct ANOVA in STATA with an example of webuse nhanes2, a dataset that contains information on the health and nutrition of a sample of US adults.

Stata
webuse nhanes2

oneway command: One-way ANOVA

To conduct ANOVA in STATA, we can use the oneway or anova command. They are basically the same, but oneway has the benefit of putting tab option to see the descriptive statistics together (similar to tabstat results). The syntax is:

Stata
oneway depvar indepvar, tab 

where depvar is the dependent variable, indepvar is an independent variable. The tab option will return the table with summary statistics. For this example, we want to test the differences in the mean systolic blood pressure by race. We can use the following command:

In this instance, with a p-value of 0, we can determine that there’s at least one pair of mean scores between the two reading program groups that are different.

The result of Bartlett’s test for equal variance is given below the ANOVA results. Given that the p-value (prob>chi2) is statistically significant, it indicates that the variances in the mean scores among the groups are likely unequal. Here are more detailed steps for robust ANOVA analysis.

Advanced. When variances are unequal (p<0.05 for Bartlett’s test): robnova command

When variances are unequal, you can use a more robust test, which is not affected by the unequal variances! The user-created command robnova, provides the results from Fisher’s test, Welch’s test, and Brown-Forsythe’s test.

Stata
ssc install robnova
robnova depvar indepvar

Here, according to the results, you can see the differences in bpsystol by race is significant (p<0.001).

You can learn more about the benefits of Welch’s test here: https://statisticsbyjim.com/anova/welchs-anova-compared-to-classic-one-way-anova/

pwmean command: Post-hoc analysis in ANOVA

The next step is to choose a post-hoc test to compare the pairwise differences between the group means. There are many post-hoc tests available, but they differ in their assumptions, methods, and levels of conservatism. Some of the most common post-hoc tests are:

  • sidak: This test adjusts the significance level for each pairwise comparison using a formula that accounts for the number of comparisons and the correlation between them. It is less conservative than Bonferroni but more conservative than Tukey.
  • bonferroni: This test adjusts the significance level for each pairwise comparison by dividing it by the number of comparisons. It is very simple and easy to apply, but it can be too conservative when there are many comparisons or when the groups are correlated.
  • scheffe: This test adjusts the significance level for each pairwise comparison using a formula that depends on the F-statistic from ANOVA and the degrees of freedom. It is very conservative and can be used for any number of comparisons, but it can be too strict when there are few comparisons or when the groups are uncorrelated.
  • tukey: This test adjusts the significance level for each pairwise comparison using a formula that depends on the Studentized range distribution and the degrees of freedom. It is less conservative than Bonferroni and Scheffe, but it can only be used when the group sizes are equal or nearly equal.
  • tukey-kramer: This test is a modification of Tukey’s test that allows for unequal group sizes. It uses a formula that depends on the harmonic mean of the group sizes and the degrees of freedom.
  • fisher-hayter: This test is another modification of Tukey’s test that allows for unequal group sizes. It uses a formula that depends on the minimum group size and the degrees of freedom.

The choice of the post-hoc test depends on several factors, such as:

  • The number of groups and comparisons
  • The size and balance of the groups
  • The correlation and variance of the groups
  • The desired level of significance and power

There is no definitive rule for choosing a post-hoc test, but some general guidelines are:

  • If you have few groups (less than six) and equal or nearly equal group sizes, you can use Tukey’s test or Sidak’s test.
  • If you have many groups (more than six) or unequal group sizes, you can use Tukey-Kramer’s test or Fisher-Hayter’s test.
  • If you want to be more conservative and avoid false positives, you can use Bonferroni’s test or Scheffe’s test.
  • If you want to be more liberal and avoid false negatives, you can use Tukey’s test or Sidak’s test.

In Stata, the built-in command pwmean provides the post-hoc test methods as follows:

You can specify the method in mcompare option as follows.

Stata
pwmean depvar, over(indepvar) mcompare(tukey) effects

Now, you can see the results that there is a significant difference between Black vs. White and Other vs. Black in the dependent variable (hlthstat).

Advanced: tukeyhsd, tkcomp and fhcomp commands

Further, you can install the commands for more advanced methods for the post-hoc ANOVA analyses, such as tukey-kramer and fisher-hayter tests, as described earlier. Please find the following guide from UCLA IDRE for more information on this 🙂

FAQ: How can I do post-hoc pairwise comparisons using Stata? (ucla.edu)

  • September 8, 2023