# [Stata] Collapse command for aggregating data per variable (mean, median, sd, and count)

In this blog post, I will show you how to use the collapse command in Stata to create a new dataset that contains summary statistics of your original data. This can be useful when you want to aggregate your data by groups or categories, such as regions, years, or types.

## What is the collapse command?

The collapse command in Stata is used to aggregate a dataset by collapsing it based on some summary statistics of a variable like mean, sum, median, percentile, standard error, etc.

The collapse command has the following syntax:

`collapse (stat) newvar=oldvar (stat) newvar=oldvar ... , by(varlist) options`

where `stat`

is the statistic you want to calculate, such as mean, sum, median, etc.; `newvar`

is the name of the new variable that will store the result; `oldvar`

is the name of the existing variable that you want to summarize; `varlist`

is the list of variables that define the groups or categories over which you want to aggregate; and `options`

are additional options that modify the behavior of the command.

### Example: `lifeexp`

dataset

To understand how a variable can be aggregated, let’s start by loading Stata’s web dataset, `lifeexp`

.

`webuse lifeexp, clear `

For instance, if you want to calculate the mean life expectancy and GNP per capita with their standard deviation by region, you can use the following command. This will create a new dataset with 7 variables, with group variable (`region`

) and 2 for mean, 2 for sd, and 2 for count as required in the syntax.

You should write the variable name to be collapsed on the left side (`lexp_mean`

), and then the variable name to be used for collapse on the right side (`lexp`

). In general, I collapse the mean with standard deviation and count them together to take the number of observations and their variation into account.

`collapse (mean) lexp_mean=lexp gnppc_mean=gnppc (sd) lexp_sd=lexp gnppc_sd=gnppc (count) lexp_count=lexp gnppc_count=gnppc, by(region)`

### Tips

**Always save the dataset with**The collapse command will replace your dataset loaded.`save "filename.dta"`

before running`collapse`

command.**You can put multiple groups in the**For example, you can put`by()`

options.`by(year region)`

to see the change by`year`

by`region`

.