[Stata] Univariate Statistics: Frequency, Central Tendency, and Variability (tab, tabstat, sum, graph bar, hist, graph box)

Summary

Statistical Method	Stata Code
Frequency analysis (%)	`tab variable` OR `fre variable`
Measures of Central Tendency (Mean and Median)	`sum variable, detail` OR `tabstat variable, stats(mean median)`
Measures of Central Tendency (Mode)	`tab variable, sort` OR `fre variable, ascending`
Distribution(Skewness and Kurtosis)	`sum variable, detail` OR `tabstat variable, stats(skewness kurtosis)`
Measures of dispersion (Standard Deviation, Variance, Range)	`sum variable, detail` OR `tabstat variable, stats(variance sd range)`

Discrete variables

Compared to tab command, fre command returns the frequency table WITH the values with their labels.

Stata

tab varname 
fre varname

Stata

tab varname, sort
fre varname, ascending
fre varname, descending

With tab varname sort option, you can see the frequency sorted by the frequency order. You can also get the ascending or descending order results with fre varname, ascending or descending.

The mode is the first value in the frequency table in the descending order table! So here, the mode is 5 (60-69).

Continuous variables

For continuous variables, it’s better to use the central tendency and variability measures for descriptive statistics. With sum varname, detail command, you can see mean, median, standard deviation, variance, skewness, and kurtosis.

Stata

sum varname, detail 
tabstat varname, stat(mean median sd variance range skewness kurtosis)

Further, tabstat allows us to put multiple variables at once, with specified statistics in the option.

Plots

Pie Chart: `graph pie`

Stata

graph pie, over(varname) plabel(_all name) 
graph pie, over(varname) by(groupname) plabel(_all name) 
graph pie, over(varname) by(groupname) plabel(_all name) scheme(white_tableau)

With by(groupname) option, you can also plot pie charts by subgroup.

With scheme(schemename), you can also specify the color scheme of the chart. You can find the list of schemes and how to use them in this post.

[Stata] Graph: How to customize graph styles in STATA

Bar Graph: `graph bar`

Bar Graph	Histogram
Bar graph represents categorical data.	Histogram represents numerical data (discrete or continuous data).
Equal space between every two consecutive bars.	No space between two consecutive bars. They should be attached to each other.
Data can be arranged in any order.	Data is arranged in the order of range.
The x-axis can represent anything.	The x-axis should represent only continuous data that is in terms of numbers.

You can draw the graphs for the entire sample in the data or by the group (categorical variable) using by(groupname) option. If you want to draw it only for the entire sample, just run it without by(groupname) option.

Stata

graph bar (count), over(varname) by(groupname) ytitle(frequency)
graph hbar (count), over(varname) by(groupname) ytitle(frequency) // hbar for horizontal bar graph

Stata

graph bar (percent), over(varname) by(groupname) ytitle(frequency)
graph hbar (percent), over(varname) by(groupname) ytitle(frequency) // hbar for horizontal bar graph

With the percent option, you can have a graph that is based on the percentage rather than the frequency. It is better to compare the distribution across the groups.

Histogram: `hist`

Stata

hist varname, by(groupname)

Box Plot: `graph box`

A box plot is a type of plot that we can use to visualize the five-number summary of a dataset, which includes:

Lower fence: smallest observed data value that is > P25 – 1.5*(P75 – P25).
The first quartile
The median
The third quartile
Upper fence: largest observed data value that is < P75 + 1.5*(P75 – P25).

Stata

graph box varname
graph box varname, over(groupname)
graph box varname1 varname2, over(groupname) nooutside // nooutside: excludes outliers 
graph box varname1 varname2, over(groupname) horizontal nooutside

With the graph box varname command, sometimes there are dots appearing outside of the upper/lower fences. These are the extreme values and you can remove them in the graph (not remove them in the data), by adding nooutside option.

Tip. `catplot`

The catplot command is a “wrapper” for graph hbar, which allows us to compare the distribution of the variable by group intuitively. The percent() option that allows you to specify what group percentages will be calculated over.

Stata

ssc install catplot
catplot varname, by(groupname) percent(groupname)

You can have it in more than one graph by putting two variables together in the command.

Stata

catplot varname groupname, by(groupname)

Using the following command, you can also change the color of the bars and assign the legend separately 🫡

Stata

catplot varname groupname, percent(groupname)  ///
legend(label(1 "White") label(2 "Black") label (3 "Other")) ///
ysize(3) blabel(bar, format(%9.1f)) /// 1 decimal place
asyvars bar(1, color(purple)) bar(2, color(yellow)) name(g3, replace)

You can learn more about catplot in the following post: https://sscc.wisc.edu/sscc/pubs/stata_bar_graphs.htm

August 10, 2023

[Stata] Univariate Statistics: Frequency, Central Tendency, and Variability (tab, tabstat, sum, graph bar, hist, graph box)

Summary

Discrete variables

Continuous variables

Plots

Pie Chart: `graph pie`

Bar Graph: `graph bar`

Histogram: `hist`

Box Plot: `graph box`

Tip. `catplot`

Related Posts

Leave a ReplyCancel reply

Translate this page into:

Categories

[Stata] Univariate Statistics: Frequency, Central Tendency, and Variability (tab, tabstat, sum, graph bar, hist, graph box)

Summary

Discrete variables

Continuous variables

Plots

Pie Chart: graph pie

Bar Graph: graph bar

Histogram: hist

Box Plot: graph box

Tip. catplot

Share this:

Related Posts

Leave a ReplyCancel reply

Translate this page into:

Categories

Pie Chart: `graph pie`

Bar Graph: `graph bar`

Histogram: `hist`

Box Plot: `graph box`

Tip. `catplot`