[Python] How to Match State, County, and MSA Codes from Census Tract GEOIDs in Python

Often we need to aggregate or merge tract-level data up to county or MSA (Metropolitan Statistical Area) levels. Fortunately, tract GEOIDs contain enough information to derive both state and county codes—and from there, we can link to MSA codes using an official delineation file.

Understanding U.S. Geographic Units: From Tracts to MSAs

If you’re working with U.S. demographic or spatial data—whether from the Census, American Community Survey (ACS), or health datasets—one of the first challenges is understanding how different geographic units relate to one another. Terms like tract, county, MSA, and ZIP Code are often used interchangeably; however, they refer to distinct systems with different logics.

Some follow a strict hierarchy. Others don’t. And when your data is at the tract level—which it often is—you’ll need to know how to move up the hierarchy to counties, states, or metro areas for aggregation, analysis, or mapping.

Here’s how it’s structured:

Perfect Nesting Hierarchy

Some geographic units in the U.S. are cleanly nested, meaning each smaller unit fits fully inside the next larger one, with no overlaps.

  • State → contains multiple counties
  • County → contains multiple census tracts
  • Census tract → the smallest stable unit used for small-area statistics (populations of ~1,200–8,000)

This means:

  • Every tract belongs to exactly one county
  • Every county belongs to exactly one state
  • The GEOID of a tract (11-digit code) contains both state and county FIPS codes, so you can reliably extract them using simple string slicing.

Example:

Each tract is identified by an 11-digit GEOID with the following structure: sscccctttttt

Understanding Geographic Identifiers (GEOIDs)

Where:

  • ss = 2-digit state FIPS code
  • ccc = 3-digit county FIPS code
  • tttttt = 6-digit tract code

For example:

Python
06075010100 
  → CA (06), 
    San Francisco County (075), 
    Tract 010100

This means we don’t need to look up state or county codes separately—they can be extracted from the GEOID string.

Cross-Boundary and Functional Units

Other units don’t follow administrative boundaries. They exist to reflect how people live, work, and move—not how political lines are drawn.

MSAs (Metropolitan Statistical Areas)

  • Defined by the Office of Management and Budget (OMB)
  • Group entire counties into functional metro regions based on commuting and labor market ties
  • Must have at least one urban area with 50,000+ people

Example: The New York-Newark-Jersey City MSA includes counties from New York, New Jersey, and Pennsylvania. Even though these counties span three states, they function together as one metro region. The same is true for the Los Angeles-Long Beach-Anaheim MSA, which includes both LA and Orange Counties.

MSAs are widely used in policy, health services research, housing, and employment analysis. Many federal datasets are reported at the MSA level, not by city or county.

Congressional Districts (CDs)

Congressional districts are the basis for U.S. House of Representatives elections. Each state is divided into a number of districts based on population, and boundaries are redrawn every 10 years through redistricting.

  • Unlike counties or tracts, CDs often cut across county and tract lines
  • A single tract can be split between two congressional districts, depending on how lines are drawn
  • Used heavily in political science, voter equity research, and policy representation analysis

Cities and Places

Cities (incorporated places) can span multiple counties and grow over time through annexation. For example, Kansas City spans both Missouri and Kansas. But cities aren’t used consistently for statistical data because their boundaries change and don’t align with stable units like tracts.

ZIP Codes

ZIP Codes are created by the U.S. Postal Service for mail delivery—not for statistics or geography.

  • They do not align with city, county, or tract boundaries.
  • Some ZIP Codes cover disconnected areas (e.g., P.O. Boxes or military bases).
  • Most ZIP Codes stay within one state, but a few rare cases (like Fort Campbell, ZIP 42223) do cross state lines, usually for logistical reasons.

This makes ZIP Codes unreliable for spatial joins unless using approximated crosswalks (e.g., ZIP-to-ZCTA or ZIP-to-tract).


Step-by-Step Guide

In the sections below, we walk through how to use Python to match a Census Tract GEOID to its corresponding state, county, and Metropolitan Statistical Area (MSA). This is useful when your dataset contains only tract-level identifiers (e.g., 06075010100) and you need to attach higher-level geographic context for analysis or aggregation.

All you need is the GEOID column (= Census Tract codes). From there, we’ll extract state and county codes directly, then merge with an official county-to-MSA crosswalk file to assign MSA codes and names.

1. Load your tract-level dataset

Python
import pandas as pd

# Replace with your own tract-level CSV file
df = pd.read_csv('your_tract_data.csv', dtype=str)

# Extract state and county FIPS from the GEOID
df['state_fips'] = df['GEOID'].str[:2]
df['county_fips'] = df['GEOID'].str[:5]

Now we have state and county codes as separate columns, which can be used to merge with higher-level geographic crosswalks.

2. Download and prepare the county-to-MSA crosswalk

MSAs are defined by OMB and published by the U.S. Census Bureau. Each MSA contains one or more entire counties. The latest delineation file is available here:

👉 2023 CBSA Delineation File (Excel)

Once downloaded, load and clean it:

Python
# Load the Excel file (header row starts on row 3, i.e., header=2)
cbsa_df = pd.read_excel('list1_2023.xlsx', sheet_name='List 1', header=2, dtype=str)
cbsa_df.columns = cbsa_df.columns.str.strip()

# Construct county FIPS for merging
cbsa_df['county_fips'] = cbsa_df['FIPS State Code'].str.zfill(2) + cbsa_df['FIPS County Code'].str.zfill(3)

# Select the relevant columns
cbsa = cbsa_df[['county_fips', 'CBSA Code', 'CBSA Title']]

3. Merge tract data with MSA codes

Now merge the original tract-level data with the CBSA file using county_fips:

Python
df_merged = df.merge(cbsa, on='county_fips', how='left')
df_merged.to_csv('tract_data_with_msa.csv', index=False)

The merged file will include:

  • The original tract-level data
  • CBSA Code: The MSA/micropolitan area code
  • CBSA Title: The name of the CBSA

If a tract is not part of any MSA (i.e., it lies in a rural, non-core-based county), the CBSA Code and CBSA Title will be NaN.

Optional: Label or Filter Rural Tracts

We can mark whether a tract falls inside or outside a metropolitan area:

Python
df_merged['is_metro'] = df_merged['CBSA Code'].notna()

Useful References

Understanding Geographic Identifiers (GEOIDs)

4 Getting your data | Public Health Disparities Geocoding Project 2.0 Training Manual

  • June 1, 2025