# [Stata] Univariate Statistics: Frequency, Central Tendency, and Variability (tab, tabstat, sum, graph bar, hist, graph box)

### Summary

Statistical Method | Stata Code |
---|---|

Frequency analysis (%) | `tab variable` OR `fre variable` |

Measures of Central Tendency (Mean and Median) | `sum variable, detail` OR `tabstat variable, stats(mean median)` |

Measures of Central Tendency (Mode) | `tab variable, sort` OR `fre variable, ascending` |

Distribution(Skewness and Kurtosis) | `sum variable, detail` OR`tabstat variable, stats(skewness kurtosis)` |

Measures of dispersion (Standard Deviation, Variance, Range) | `sum variable, detail` OR `tabstat variable, stats(variance sd range)` |

### Discrete variables

Compared to `tab`

command, `fre`

command returns the frequency table WITH the values with their labels.

```
tab varname
fre varname
```

```
tab varname, sort
fre varname, ascending
fre varname, descending
```

With `tab varname sort`

option, you can see the frequency sorted by the frequency order. You can also get the ascending or descending order results with `fre varname, ascending`

or `descending`

.

The **mode **is the first value in the frequency table in the descending order table! So here, the mode is 5 (60-69).

### Continuous variables

For continuous variables, it’s better to use the central tendency and variability measures for descriptive statistics. With `sum varname, detail`

command, you can see mean, median, standard deviation, variance, skewness, and kurtosis.

```
sum varname, detail
tabstat varname, stat(mean median sd variance range skewness kurtosis)
```

Further, `tabstat`

allows us to put multiple variables at once, with specified statistics in the option.

## Plots

### Pie Chart: `graph pie`

```
graph pie, over(varname) plabel(_all name)
graph pie, over(varname) by(groupname) plabel(_all name)
graph pie, over(varname) by(groupname) plabel(_all name) scheme(white_tableau)
```

With `by(groupname)`

option, you can also plot pie charts by subgroup.

With `scheme(schemename)`

, you can also specify the color scheme of the chart. You can find the list of schemes and how to use them in this post.

### Bar Graph: `graph bar`

Bar Graph | Histogram |
---|---|

Bar graph represents categorical data. | Histogram represents numerical data (discrete or continuous data). |

Equal space between every two consecutive bars. | No space between two consecutive bars. They should be attached to each other. |

Data can be arranged in any order. | Data is arranged in the order of range. |

The x-axis can represent anything. | The x-axis should represent only continuous data that is in terms of numbers. |

You can draw the graphs for the entire sample in the data or by the group (categorical variable) using `by(groupname)`

option. If you want to draw it only for the entire sample, just run it without `by(groupname)`

option.

```
graph bar (count), over(varname) by(groupname) ytitle(frequency)
graph hbar (count), over(varname) by(groupname) ytitle(frequency) // hbar for horizontal bar graph
```

```
graph bar (percent), over(varname) by(groupname) ytitle(frequency)
graph hbar (percent), over(varname) by(groupname) ytitle(frequency) // hbar for horizontal bar graph
```

With the percent option, you can have a graph that is based on the percentage rather than the frequency. It is better to compare the distribution across the groups.

### Histogram: `hist`

`hist varname, by(groupname)`

### Box Plot: `graph box`

A **box plot **is a type of plot that we can use to visualize the five-number summary of a dataset, which includes:

- Lower fence: smallest observed data value that is > P25 – 1.5*(P75 – P25).
- The first quartile
- The median
- The third quartile
- Upper fence: largest observed data value that is < P75 + 1.5*(P75 – P25).

```
graph box varname
graph box varname, over(groupname)
graph box varname1 varname2, over(groupname) nooutside // nooutside: excludes outliers
graph box varname1 varname2, over(groupname) horizontal nooutside
```

With the `graph box varname`

command, sometimes there are dots appearing outside of the upper/lower fences. These are the extreme values and you can remove them in the graph (not remove them in the data), by adding `nooutside`

option.

### Tip. `catplot`

The `catplot`

command is a “wrapper” for `graph hbar`

, which allows us to compare the distribution of the variable by group intuitively. The `percent()`

option that allows you to specify what group percentages will be calculated over.

```
ssc install catplot
catplot varname, by(groupname) percent(groupname)
```

You can have it in more than one graph by putting two variables together in the command.

`catplot varname groupname, by(groupname)`

Using the following command, you can also change the color of the bars and assign the legend separately

```
catplot varname groupname, percent(groupname) ///
legend(label(1 "White") label(2 "Black") label (3 "Other")) ///
ysize(3) blabel(bar, format(%9.1f)) /// 1 decimal place
asyvars bar(1, color(purple)) bar(2, color(yellow)) name(g3, replace)
```

You can learn more about `catplot`

in the following post: https://sscc.wisc.edu/sscc/pubs/stata_bar_graphs.htm