[Stata] Data cleaning 8: List up specific types of variables and drop all missing variables (findname command)
📒Cox, N. J. (2010). Speaking stata: finding variables. The Stata Journal, 10(2), 281-296.
The findname command in STATA is a user-written program that allows you to list variables in your dataset that match certain name patterns or other properties. For example, you can use findname to find variables that start with a certain letter, contain a certain word, have a certain label, or have a certain type or value. You can also use findname to store the names of the selected variables in a local macro for further use. It is really useful for data cleaning and can save you a lot of time!
1. Find a list of specific types of variables
The findname command is available from SSC by typing ssc install findname
in STATA.
ssc install findname
findname, type(string) detail
findname, type(numeric) detail
findname, type(float) detail
Tip. Convert string variables to numeric variables
2. Find a list of variables with 0 observations (all missing values)
findname, all(missing(@))
drop `r(varlist)' // Then drop them
3. Assign negative values as missing values and drop all missing values
findname, any(@ < 0) // find variable list with any negative values
recode `r(varlist)' (-2=.)(-1=.)(-9=.) // assign missing values for cases (may differ by dataset)
* Then, run the same command for #2
findname, all(missing(@))
drop `r(varlist)' // Then drop them
4. Drop suppressed variables
I found findname
command is also really useful when I drop the variables that the variables are included but suppressed or masked, which means the variable has no information but “DATA SUPPRESSED” or “MASKED” as follows.
If I would love to delete all the variables that the data are suppressed for confidentiality, I can use findname
command as follows. In this example, I am trying to search for the list of variables that include the text in the value label “confidentiality.”
findname, vallabeltext(*CONFIDENTIALITY*) insensitive local(VALUES)
drop `r(varlist)' // Then drop them
5. Drop variables with a specific number of missing values
Finally, you can use nmissing
command to drop the variables with a specific number of missing values. Let’s say I want to drop the variables with 95% values missing, in the dataset with observations number 30,000. In this case, I can run the command nmissing, min(1500)
to drop variables with 1500 observations are missing.
ssc install nmissing // install for first time use
nmissing, min(99999) // change the number for criteria
drop `r(varlist)'
Ref: https://stackoverflow.com/questions/53524643/drop-variables-with-all-missing-values