[Stata] Data Cleaning 9: Removing and Adding Prefix/Suffix of Variable Names

In data analysis projects, especially when working with large datasets from various sources, variable names can often be lengthy, inconsistent, or include prefixes/suffixes that make them difficult to work with. Cleaning up variable names by removing or adding prefixes/suffixes can greatly improve the readability and organization of your dataset.

Stata provides powerful string functions that allow you to manipulate variable names with ease. In this blog post, we’ll explore how to use the rename command. Cleaning up variable names is an important step in the data preparation process. It not only makes your data easier to work with but also facilitates collaboration with others and ensures consistency across different parts of your analysis. Whether you’re dealing with a small dataset or a massive one, mastering these techniques will streamline your workflow and save you time in the long run.

Adding specific prefixes/suffixes in all variables

Let’s say acs is the prefix that I would love to try adding to all variable names in my dataset.

rename * acs* 

Then, it will adds acs prefix to ALL variable names.

In case we want to add the prefix acs_ for the variables that start with “region” in their variable names, you can use the command like this:

rename region* acs_region*

Working with the suffixes is precisely the same logic, but just put your * (asterisk) into the front of the command.

rename * *acs // adding suffix acs to all variables 
rename *result *result_t1 // adding suffix _t1 for the variables ending with result

Removing specific prefixes and suffixes in all variables

Removing prefix

rename acs* * 
// Let's say acs is prefix (e.g., acsv1 acsv2 acev3) 
// this command will remove acs from all variable names 

Removing suffix

rename *result * //removing suffix result

Using loop: foreach command

Using the for each command, you can specify the list of variables you would love to add/remove prefixes or suffixes. Let’s say I would love to add prefix t1_ for the list of variables: (hgb hct tibc iron hlthstat heartatk diabetes sizplace finalwgt leadwt corpuscl trnsfern albumin vitaminc zinc copper) in my dataset.

foreach v of varlist hgb hct tibc iron hlthstat heartatk diabetes sizplace finalwgt leadwt corpuscl trnsfern albumin vitaminc zinc copper { 
rename `v' t1_`v'

You can use this loop for your case, such as adding or removing prefixes or suffixes in your list of variables.


Adding suffix to all variables after a given variable – Statalist


Removing characters from end of variable name – Statalist

Stata Basics: foreach and forvalues | UVA Library (virginia.edu)

Working across variables using foreach | Stata Learning Modules (ucla.edu)

  • June 14, 2023