[Stata] Data cleaning 2: Labeling variables and values (label define, label values, numlabel, and uselabel)

Why do we need to label variables and values?

Let’s see the example from STATA data browser: Data includes numeric values

  • What do the variable names mean? Can you guess what “g1” and “q1a” mean?
  • What do the numbers for each variable mean? Can you guess what the numbers in q1a mean?

This is why we need LABELS! We have no clue about what the variable name means, and what the value label means without its label.

So, we can label two things in STATA:

  • Variable label: meaning of the variable (e.g., q1a = life satisfaction)
  • Value Label: meaning of the values ( e.g., 1 in q1a means “not satisfied” and 4 means “very satisfied”

You can browse with/without value labels in STATA data browser by right-clicking your mouse and then going to Data -> Value labels -> Hide/Show value labels.

Check the labels with codebook and fre

Using the command codebook, you can see both variable label and value labels.

Stata
codebook varname 

Here we can see the terms that we are going to use. It might be a little bit too many concepts to understand at first sight, but you are going to get it by practicing it! 🙌

Stata
ssc install fre // install fre command first time user 
fre varname

Compared to tab command, fre command returns the frequency table WITH the values with their lables.

A. Labeling variables

The label variable command defines the label of the variable. You can overwrite the variable label. 

Stata
lab var varname "Variable Label"

// Example 
lab var phq9 "PHQ-9 Depression sum score"

B. Labeling values

The most common method for labeling values is to use the commands 1) label define and 2) label values. The label define command creates the value label:

Example: Binary variable

The label values command associates the variable and label.

Stata
la def labelname 0no1 “yes” 
la val varname labelname 
fre varname // you can check if the labels were well assigned for values 

Error message: label __ is already defined

When you get an error message “label __ is already defined,” you can drop and redefine the label or modify the label as follows.

Stata
* label drop: drop the defined label 
label drop label_name

* label define, modify: modify the defined label
label define label_name 2 “don’t know”, modify 
// this will modify the label for the value 2 in value label. You can modify it anyway you need. 

Tip. numlabel, add

One bad thing about tab command, is that it does not show the values assigned for their labels. For example, in the following results, we have no clue about whether the value of Domestic and Foreign is 0, 1, or 1, 2, or anything else.

Some people use fre command to see it with their values and labels together, but there is a way to add the numbers (values) with the labels with one line of command as follows.

Stata
numlabel, add // add numbers before labels 

You can see the process of making changes on the labels with their values (Domestic -> 0. Domestic), which allows us to have a better idea about values when we run the tab command.

If you want to remove them again, you can find the label_name and then remove them with the following command.

Stata
codebook varname
numlabel label_name, remove // remove numbered labels

Other commands to manage labels

Stata
* label list:  List the name and contents of only the value label 
label list label_name

* labelbook: a codebook describing value labels
labelbook label_name

Further, if you run the command uselabel, it will return the dataset with variables, values, and labels. It would be useful for you to create the codebook. It will replace your dataset, so please make sure to save your data before running it.

Stata
uselabel

If you are interested in recoding or reverse coding with the labels, please check this post out for labrec command.

  • June 2, 2023