[Python] GIS (2): Working with the Census API in Python / American Community Survey datasets

For most cases of geographic mapping in social science research, we need demographic data: who lives where and what their socioeconomic circumstances are. The U.S. Census Bureau collects this information via American Community Survey and makes it freely available through an Application Programming Interface (API).

This ability to connect service locations with population characteristics reflects a core principle in social work: understanding context. The ecological systems framework (Bronfenbrenner, 1979) positions neighborhood and community characteristics as critical determinants of individual outcomes. More recently, scholars have documented how spatial mismatch between service supply and population need contributes to health disparities (See my paper here). Census data provides the empirical foundation for this kind of spatial equity analysis.

Social work research has increasingly recognized the role of place in shaping outcomes. Sampson’s work on neighborhood effects (2012) demonstrated that community-level factors like concentrated poverty and social cohesion independently predict health and wellbeing beyond individual characteristics. Scholars studying mental health disparities have found that geographic access to care, measured in travel time or distance, significantly predicts service utilization, particularly for low-income populations and communities of color (Cummings et al., 2013).

This post covers how to access Census data programmatically (easily) using Python. By the end, you’ll be able to request specific variables for specific geographies, merge that data with shapefiles, and create community profiles that inform needs assessments and program planning.

Introduction to Census Data

The Census Bureau conducts two main data collection efforts relevant to this research.

  1. The Decennial Census, conducted every ten years, attempts to count every person living in the United States. It provides the most accurate population counts but asks only a handful of questions.
  2. American Community Survey (ACS), conducted continuously and released annually, samples a portion of the population but asks detailed questions about income, education, employment, language use, disability status, housing, and more.

For social work research, the ACS is typically more useful because it contains the variables we care about: poverty rates, health insurance coverage, English proficiency, educational attainment, and similar indicators. The tradeoff is that ACS data comes with margins of error because it’s based on a sample rather than a complete count. For small geographies like census tracts, these margins of error can be substantial (Spielman et al., 2014), something to keep in mind when interpreting results.

The ACS releases data in three forms: 1-year, 3-year, and 5-year estimates. Each represents a tradeoff between currency and reliability.

Estimate TypeData Collection PeriodAvailable ForBest Used When
1-yearSingle yearAreas with 65,000+ populationYou need the most current data and are analyzing large geographies (states, large counties, big cities)
3-yearThree years combinedAreas with 20,000+ populationDiscontinued after 2013
5-yearFive years combinedAll areas, including census tractsYou need small-area data (tracts, block groups) or are analyzing smaller communities

For neighborhood-level analysis using census tracts, the 5-year estimates are your only option. The tradeoff is that 5-year estimates blend data across a longer time period, which can obscure recent changes. If you’re studying a neighborhood that gentrified rapidly, the 5-year estimate smooths across pre- and post-gentrification conditions.

The margins of error also differ. Spielman and colleagues (2014) found that ACS margins of error at the tract level can be substantial, sometimes exceeding 20-30% of the estimate itself for small populations. When comparing tracts, always check whether differences exceed the margins of error before drawing conclusions.

Census data is organized hierarchically by geography. The nation contains states, states contain counties, counties contain census tracts, and tracts contain block groups. Each level nests perfectly within the level above it. Census tracts are designed to contain roughly 4,000 people and are the most common unit for neighborhood-level analysis. They’re small enough to capture local variation but large enough to have reasonably reliable ACS estimates.

Setting Up Census API Access

To use the Census API, you need a free API key. Visit https://api.census.gov/data/key_signup.html and request one. You’ll receive the key by email within a few minutes.

In Python, the census package provides a clean interface to the API. Install it along with us, a helper package for state codes:

Python
!pip install census us

Then set up your connection:

Python
from census import Census
import us

# Replace with your actual API key
api_key = 'YOUR_API_KEY_HERE'
c = Census(api_key)

Understanding Census Variable Codes

Census data is organized into thousands of variables, each with a unique code. Finding the right code is often the hardest part of working with Census data. The codes follow a pattern: a prefix indicating the table, followed by numbers indicating the specific variable.

For ACS data, common prefixes include B01001 (age and sex), B19013 (median household income), B17001 (poverty status), B16004 (language spoken at home), and C27001 (health insurance coverage). The full variable code includes an underscore and a suffix, like B19013_001E, where E indicates an estimate (as opposed to M for margin of error).

The Census Bureau provides a complete list of variables at https://api.census.gov/data/2023/acs/acs5/variables.html (the year can be changed as you see fit). This page is searchable and includes descriptions of what each variable measures. If it loads too slow, you can download and use this excel sheet.

Here are some variables commonly used in social science research:

  • Total population
  • Median household income
  • Population below poverty level
  • No health insurance coverage
  • Renter-occupied housing units

Requesting Data for Specific Geographies

Let’s request median household income for all census tracts in Washtenaw County, Michigan. Each geography requires specific codes: Michigan’s state FIPS code is 26, and Washtenaw County’s code is 161.

Python
from census import Census

api_key = 'YOUR_API_KEY_HERE'
c = Census(api_key)

# Request median household income for Washtenaw County census tracts
# Using 2022 ACS 5-year estimates
washtenaw_income = c.acs5.state_county_tract(
    fields=['NAME', 'B19013_001E'],
    state_fips='26',
    county_fips='161',
    tract='*'  # asterisk means "all tracts"
)

# Convert to DataFrame
import pandas as pd
df = pd.DataFrame(washtenaw_income)
df.columns = ['name', 'median_income', 'state', 'county', 'tract']
print(df.head())

The state_county_tract method requests data at the tract level. The fields parameter specifies which variables to retrieve. The asterisk in tract='*' means “give me all tracts in this county.”

The result is a list of dictionaries that we convert to a pandas DataFrame. Each row represents one census tract, with columns for the tract name, the requested variable, and geographic identifiers.

Building a GEOID for Merging with Shapefiles

Census shapefiles identify each geographic unit with a GEOID, a concatenation of state, county, and tract codes. To merge our Census data with a shapefile, we need to construct matching GEOIDs:

Python
# Create GEOID by concatenating state + county + tract
df['GEOID'] = df['state'] + df['county'] + df['tract']
print(df[['name', 'median_income', 'GEOID']].head())

The GEOID for a tract in Washtenaw County looks like “26161400100”, where 26 is Michigan, 161 is Washtenaw County, and 400100 is the tract number.

Merging Census Data with Shapefiles

Now we can combine our Census data with geographic boundaries:

Python
import geopandas as gpd

# Download tract boundaries for Michigan
tracts_url = "https://www2.census.gov/geo/tiger/TIGER2023/TRACT/tl_2023_26_tract.zip"
michigan_tracts = gpd.read_file(tracts_url)

# Filter to Washtenaw County
washtenaw_tracts = michigan_tracts[michigan_tracts['COUNTYFP'] == '161'].copy()

# Merge Census data with shapefile
# The shapefile uses 'GEOID' as the column name
washtenaw_merged = washtenaw_tracts.merge(df, on='GEOID', how='left')

print(washtenaw_merged[['GEOID', 'median_income']].head())

The merge function joins the two datasets on the GEOID column. We use how='left' to keep all tracts from the shapefile even if some don’t have matching Census data.

Creating a Community Data Profile

Let’s build a more complete community profile by requesting multiple variables:

Python
from census import Census
import pandas as pd
import geopandas as gpd

api_key = 'YOUR_API_KEY_HERE'
c = Census(api_key)

# Request multiple variables for Detroit metro counties
# Wayne=163, Oakland=125, Macomb=099, Washtenaw=161
metro_data = []

for county_fips in ['163', '125', '099', '161']:
    data = c.acs5.state_county_tract(
        fields=[
            'NAME',
            'B01001_001E',  # Total population
            'B19013_001E',  # Median household income
            'B17001_002E',  # Population below poverty
            'B06007_002E',  # Speak only English
            'B06007_003E',  # Speak Spanish
        ],
        state_fips='26',
        county_fips=county_fips,
        tract='*'
    )
    metro_data.extend(data)

# Convert to DataFrame
df = pd.DataFrame(metro_data)
df.columns = ['name', 'total_pop', 'median_income', 'poverty_pop', 
              'english_only', 'spanish_speakers', 'state', 'county', 'tract']

# Create GEOID
df['GEOID'] = df['state'] + df['county'] + df['tract']

# Convert to numeric
for col in ['total_pop', 'median_income', 'poverty_pop', 'english_only', 'spanish_speakers']:
    df[col] = pd.to_numeric(df[col], errors='coerce')

# Calculate derived measures
df['poverty_rate'] = (df['poverty_pop'] / df['total_pop'] * 100).round(1)
df['non_english'] = df['total_pop'] - df['english_only']
df['non_english_rate'] = (df['non_english'] / df['total_pop'] * 100).round(1)
df['spanish_rate'] = (df['spanish_speakers'] / df['total_pop'] * 100).round(1)

print(df[['name', 'total_pop', 'median_income', 'poverty_rate', 'non_english_rate', 'spanish_rate']].head(10))

This code loops through four counties in the Detroit metro area, requesting population, income, poverty, and language data for each tract. We then calculate derived measures like poverty rate and the percentage of residents with limited English proficiency (LEP).

Visualizing Census Data on Maps

Now we can create choropleth maps showing how these variables are distributed:

Python
import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as cx

# Load and filter tract boundaries
tracts_url = "https://www2.census.gov/geo/tiger/TIGER2023/TRACT/tl_2023_26_tract.zip"
michigan_tracts = gpd.read_file(tracts_url)

metro_counties = ['163', '125', '099', '161']
metro_tracts = michigan_tracts[michigan_tracts['COUNTYFP'].isin(metro_counties)].copy()

# Merge with Census data
metro_merged = metro_tracts.merge(df, on='GEOID', how='left')

# Handle missing and invalid values
# Census uses negative values (like -666666666) to indicate missing data
metro_merged['median_income'] = pd.to_numeric(metro_merged['median_income'], errors='coerce')
metro_merged.loc[metro_merged['median_income'] < 0, 'median_income'] = None

# Convert to Web Mercator for basemap
metro_merged = metro_merged.to_crs(epsg=3857)

# Create map of median household income
fig, ax = plt.subplots(figsize=(12, 12))

metro_merged.plot(
    column='median_income',
    ax=ax,
    legend=True,
    legend_kwds={'label': 'Median Household Income ($)', 'shrink': 0.6},
    cmap='Blues',
    edgecolor='white',
    linewidth=0.2,
    missing_kwds={'color': 'lightgray', 'label': 'No data'}
)

cx.add_basemap(ax, source=cx.providers.CartoDB.PositronNoLabels, alpha=0.5)

ax.set_title('Median Household Income by Census Tract\nDetroit Metro Area (2022 ACS 5-Year Estimates)', fontsize=14)
ax.set_axis_off()
plt.tight_layout()
plt.show()

A few notes on the code above. The Census Bureau uses special codes like -666666666 to indicate that data is unavailable for a tract, often because the sample size is too small for a reliable estimate. We convert these to None so they display as gray (“No data”) rather than throwing off the color scale. The missing_kwds parameter in GeoPandas controls how these missing values appear on the map.

The column parameter tells GeoPandas which variable to use for coloring the map. The cmap parameter specifies the color scheme. The missing_kwds parameter handles tracts where data is missing.

Overlaying Facility Locations on Demographic Data

The real power of combining Census data with facility locations is the ability to identify mismatches between where services are and where they’re needed. This approach operationalizes what researchers call “spatial accessibility” (Guagliardo, 2004). Studies of mental health service distribution have consistently found that facilities tend to cluster in higher-income areas, even when need is greater in lower-income communities (Cook et al., 2017).

Python
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as cx

# Load facility data
facilities = pd.read_csv('MI_MH_facilities_detroit_metro.csv')
facilities_gdf = gpd.GeoDataFrame(
    facilities,
    geometry=gpd.points_from_xy(facilities.longitude, facilities.latitude),
    crs="EPSG:4326"
).to_crs(epsg=3857)

# Filter to facilities with Spanish services
spanish_facilities = facilities_gdf[facilities_gdf['spanish'] == 1]

# Create map showing LEP population with Spanish-service facilities
fig, ax = plt.subplots(figsize=(12, 12))

# Layer 1: Tracts colored by Spanish LEP population
metro_merged.plot(
    column='spanish_lep',
    ax=ax,
    legend=True,
    legend_kwds={'label': 'Spanish-Speaking LEP Population', 'shrink': 0.6},
    cmap='Oranges',
    edgecolor='white',
    linewidth=0.2,
    missing_kwds={'color': 'lightgray'}
)

# Layer 2: Facilities with Spanish services
spanish_facilities.plot(
    ax=ax,
    color='#2563eb',
    markersize=100,
    edgecolor='white',
    linewidth=1.5,
    zorder=5,
    label='Facilities with Spanish services'
)

cx.add_basemap(ax, source=cx.providers.CartoDB.PositronNoLabels, alpha=0.4)

ax.set_title('Spanish-Speaking LEP Population and Mental Health Facilities\nwith Spanish Language Services', fontsize=14)
ax.legend(loc='lower right', frameon=True, facecolor='white')
ax.set_axis_off()
plt.tight_layout()
plt.show()

This map overlays facilities offering Spanish-language services on top of census tracts shaded by their Spanish-speaking population. If blue points cluster in lightly-shaded areas while heavily-shaded areas have few points, we have evidence of a service mismatch.

Handling Common Issues

  1. Missing data. Some census tracts have missing values, often because the population is too small for reliable estimates or the tract contains only group quarters (prisons, nursing homes). Use the missing_kwds parameter in GeoPandas to handle these gracefully.
  2. Margins of error. ACS estimates come with uncertainty. For any variable ending in E (estimate), there’s a corresponding variable ending in M (margin of error). When differences between areas are small, check whether they exceed the margins of error before drawing conclusions.
  3. Variable availability. Not all variables are available at all geographic levels. Some detailed tables are only available at the county or state level. The API will return an error if you request an unavailable variable.
  4. Rate limits. The Census API has rate limits. If you’re making many requests, add delays between calls or request multiple variables in a single call rather than making separate calls for each variable.

Resources

The Census Bureau’s API documentation (https://www.census.gov/data/developers/data-sets.html) lists all available datasets and their variables.

The census package documentation (https://github.com/datamade/census) provides examples for different types of geographic queries.

For finding variable codes, the Census Reporter website (https://censusreporter.org/) offers a more user-friendly interface than the official variable list.

Social Explorer (https://www.socialexplorer.com/) provides an alternative interface for exploring Census data, with some features available for free.

  • January 9, 2026