[Stata] Data cleaning 2: Labeling variables and values (label define, label values, numlabel, and uselabel)
Why do we need to label variables and values?
Let’s see the example from STATA data browser: Data includes numeric values.
- What do the variable names mean? Can you guess what “g1” and “q1a” mean?
- What do the numbers for each variable mean? Can you guess what the numbers in q1a mean?
This is why we need LABELS! We have no clue about what the variable name means, and what the value label means without its label.
So, we can label two things in STATA:
- Variable label: meaning of the variable (e.g., q1a = life satisfaction)
- Value Label: meaning of the values ( e.g., 1 in q1a means “not satisfied” and 4 means “very satisfied”
You can browse with/without value labels in STATA data browser by right-clicking your mouse and then going to Data -> Value labels -> Hide/Show value labels.
Check the labels with codebook
and fre
Using the command codebook, you can see both variable label and value labels.
codebook varname
Here we can see the terms that we are going to use. It might be a little bit too many concepts to understand at first sight, but you are going to get it by practicing it! 🙌
ssc install fre // install fre command first time user
fre varname
Compared to tab
command, fre
command returns the frequency table WITH the values with their lables.
A. Labeling variables
The label variable
command defines the label of the variable. You can overwrite the variable label.
lab var varname "Variable Label"
// Example
lab var phq9 "PHQ-9 Depression sum score"
B. Labeling values
The most common method for labeling values is to use the commands 1) label define
and 2) label values
. The label define
command creates the value label:
Example: Binary variable
The label values command associates the variable and label.
la def labelname 0 “no” 1 “yes”
la val varname labelname
fre varname // you can check if the labels were well assigned for values
Error message: label __ is already defined
When you get an error message “label __ is already defined,” you can drop
and redefine the label or modify
the label as follows.
* label drop: drop the defined label
label drop label_name
* label define, modify: modify the defined label
label define label_name 2 “don’t know”, modify
// this will modify the label for the value 2 in value label. You can modify it anyway you need.
Tip. numlabel, add
One bad thing about tab
command, is that it does not show the values assigned for their labels. For example, in the following results, we have no clue about whether the value of Domestic and Foreign is 0, 1, or 1, 2, or anything else.
Some people use fre
command to see it with their values and labels together, but there is a way to add the numbers (values) with the labels with one line of command as follows.
numlabel, add // add numbers before labels
You can see the process of making changes on the labels with their values (Domestic -> 0. Domestic), which allows us to have a better idea about values when we run the tab
command.
If you want to remove them again, you can find the label_name and then remove them with the following command.
codebook varname
numlabel label_name, remove // remove numbered labels
Other commands to manage labels
* label list: List the name and contents of only the value label
label list label_name
* labelbook: a codebook describing value labels
labelbook label_name
Further, if you run the command uselabel
, it will return the dataset with variables, values, and labels. It would be useful for you to create the codebook. It will replace your dataset, so please make sure to save your data before running it.
uselabel
If you are interested in recoding or reverse coding with the labels, please check this post out for labrec
command.