[Python] GIS (2): Working with the Census API in Python / American Community Survey datasets
For most cases of geographic mapping in social science research, we need demographic data: who lives where and what their socioeconomic circumstances are. The U.S. Census Bureau collects this information via American Community Survey and makes it freely available through an Application Programming Interface (API).
This ability to connect service locations with population characteristics reflects a core principle in social work: understanding context. The ecological systems framework (Bronfenbrenner, 1979) positions neighborhood and community characteristics as critical determinants of individual outcomes. More recently, scholars have documented how spatial mismatch between service supply and population need contributes to health disparities (See my paper here). Census data provides the empirical foundation for this kind of spatial equity analysis.
Social work research has increasingly recognized the role of place in shaping outcomes. Sampson’s work on neighborhood effects (2012) demonstrated that community-level factors like concentrated poverty and social cohesion independently predict health and wellbeing beyond individual characteristics. Scholars studying mental health disparities have found that geographic access to care, measured in travel time or distance, significantly predicts service utilization, particularly for low-income populations and communities of color (Cummings et al., 2013).
This post covers how to access Census data programmatically (easily) using Python. By the end, you’ll be able to request specific variables for specific geographies, merge that data with shapefiles, and create community profiles that inform needs assessments and program planning.
Introduction to Census Data
The Census Bureau conducts two main data collection efforts relevant to this research.
- The Decennial Census, conducted every ten years, attempts to count every person living in the United States. It provides the most accurate population counts but asks only a handful of questions.
- American Community Survey (ACS), conducted continuously and released annually, samples a portion of the population but asks detailed questions about income, education, employment, language use, disability status, housing, and more.
For social work research, the ACS is typically more useful because it contains the variables we care about: poverty rates, health insurance coverage, English proficiency, educational attainment, and similar indicators. The tradeoff is that ACS data comes with margins of error because it’s based on a sample rather than a complete count. For small geographies like census tracts, these margins of error can be substantial (Spielman et al., 2014), something to keep in mind when interpreting results.
The ACS releases data in three forms: 1-year, 3-year, and 5-year estimates. Each represents a tradeoff between currency and reliability.
| Estimate Type | Data Collection Period | Available For | Best Used When |
|---|---|---|---|
| 1-year | Single year | Areas with 65,000+ population | You need the most current data and are analyzing large geographies (states, large counties, big cities) |
| 5-year | Five years combined | All areas, including census tracts | You need small-area data (tracts, block groups) or are analyzing smaller communities |
For neighborhood-level analysis using census tracts, the 5-year estimates are your only option. The tradeoff is that 5-year estimates blend data across a longer time period, which can obscure recent changes. If you’re studying a neighborhood that gentrified rapidly, the 5-year estimate smooths across pre- and post-gentrification conditions.
The margins of error also differ. Spielman and colleagues (2014) found that ACS margins of error at the tract level can be substantial, sometimes exceeding 20-30% of the estimate itself for small populations. When comparing tracts, always check whether differences exceed the margins of error before drawing conclusions.
Census data is organized hierarchically by geography. The nation contains states, states contain counties, counties contain census tracts, and tracts contain block groups. Each level nests perfectly within the level above it. Census tracts are designed to contain roughly 4,000 people and are the most common unit for neighborhood-level analysis. They’re small enough to capture local variation but large enough to have reasonably reliable ACS estimates.
Setting Up Census API Access

To use the Census API, you need a free API key. Visit https://api.census.gov/data/key_signup.html and request one. You’ll receive the key by email within a few minutes.
In Python, the census package provides a clean interface to the API. Install it along with us, a helper package for state codes:
!pip install census usThen set up your connection:
from census import Census
import us
# Replace with your actual API key
api_key = 'YOUR_API_KEY_HERE'
c = Census(api_key)Understanding Census Variable Codes
Census data is organized into thousands of variables, each with a unique code. Finding the right code is often the hardest part of working with Census data. The codes follow a pattern: a prefix indicating the table, followed by numbers indicating the specific variable.
For ACS data, common prefixes include B01001 (age and sex), B19013 (median household income), B17001 (poverty status), and C27001 (health insurance coverage). The letter prefix indicates the table type: B tables (Base) provide the most detailed breakdowns, C tables (Collapsed) combine categories for smaller margins of error, and S tables (Subject) present pre-calculated percentages and summary statistics. For most research purposes, B tables offer the granularity you need, but C tables are useful when sample sizes are small.
The full variable code includes an underscore and a suffix, like B19013_001E, where E indicates an estimate (as opposed to M for margin of error).
The Census Bureau provides a complete list of variables at https://api.census.gov/data/2023/acs/acs5/variables.html (the year can be changed as you see fit). This page is searchable and includes descriptions of what each variable measures. If it loads too slow, you can download and use this excel sheet.
Here are some variables commonly used in social science research:
- Total population
- Median household income
- Population below poverty level
- No health insurance coverage
- Renter-occupied housing units
Requesting Data for Specific Geographies
Let’s request median household income for all census tracts in Washtenaw County, Michigan. Each geography requires specific codes: Michigan’s state FIPS code is 26, and Washtenaw County’s code is 161.
from census import Census
api_key = 'YOUR_API_KEY_HERE'
c = Census(api_key)
# Request median household income for Washtenaw County census tracts
# Using 2022 ACS 5-year estimates
washtenaw_income = c.acs5.state_county_tract(
fields=['NAME', 'B19013_001E'],
state_fips='26',
county_fips='161',
tract='*' # asterisk means "all tracts"
)
# Convert to DataFrame
import pandas as pd
df = pd.DataFrame(washtenaw_income)
df.columns = ['name', 'median_income', 'state', 'county', 'tract']
print(df.head())The state_county_tract method requests data at the tract level. The fields parameter specifies which variables to retrieve. The asterisk in tract='*' means “give me all tracts in this county.”

The result is a list of dictionaries that we convert to a pandas DataFrame. Each row represents one census tract, with columns for the tract name, the requested variable, and geographic identifiers.
Building a GEOID for Merging with Shapefiles
Census shapefiles identify each geographic unit with a GEOID, a concatenation of state, county, and tract codes. To merge our Census data with a shapefile, we need to construct matching GEOIDs:
# Create GEOID by concatenating state + county + tract
df['GEOID'] = df['state'] + df['county'] + df['tract']
print(df[['name', 'median_income', 'GEOID']].head())The GEOID for a tract in Washtenaw County looks like “26161400100”, where 26 is Michigan, 161 is Washtenaw County, and 400100 is the tract number.
Merging Census Data with Shapefiles
Now we can combine our Census data with geographic boundaries:
import geopandas as gpd
# Download tract boundaries for Michigan
tracts_url = "https://www2.census.gov/geo/tiger/TIGER2023/TRACT/tl_2023_26_tract.zip"
michigan_tracts = gpd.read_file(tracts_url)
# Filter to Washtenaw County
washtenaw_tracts = michigan_tracts[michigan_tracts['COUNTYFP'] == '161'].copy()
# Merge Census data with shapefile
# The shapefile uses 'GEOID' as the column name
washtenaw_merged = washtenaw_tracts.merge(df, on='GEOID', how='left')
print(washtenaw_merged[['GEOID', 'median_income']].head())The merge function joins the two datasets on the GEOID column. We use how='left' to keep all tracts from the shapefile even if some don’t have matching Census data.

Creating a Community Data Profile
Let’s build a more complete community profile by requesting multiple variables:
from census import Census
import pandas as pd
import geopandas as gpd
api_key = 'YOUR_API_KEY_HERE'
c = Census(api_key)
# Request multiple variables for Detroit metro counties
# Wayne=163, Oakland=125, Macomb=099, Washtenaw=161
metro_data = []
for county_fips in ['163', '125', '099', '161']:
data = c.acs5.state_county_tract(
fields=[
'NAME',
'B01001_001E', # Total population
'B19013_001E', # Median household income
'B17001_002E', # Population below poverty
'B06007_002E', # Speak only English
'B06007_003E', # Speak Spanish
],
state_fips='26',
county_fips=county_fips,
tract='*'
)
metro_data.extend(data)
# Convert to DataFrame
df = pd.DataFrame(metro_data)
df.columns = ['name', 'total_pop', 'median_income', 'poverty_pop',
'english_only', 'spanish_speakers', 'state', 'county', 'tract']
# Create GEOID
df['GEOID'] = df['state'] + df['county'] + df['tract']
# Convert to numeric
for col in ['total_pop', 'median_income', 'poverty_pop', 'english_only', 'spanish_speakers']:
df[col] = pd.to_numeric(df[col], errors='coerce')
# Calculate derived measures
df['poverty_rate'] = (df['poverty_pop'] / df['total_pop'] * 100).round(1)
df['non_english'] = df['total_pop'] - df['english_only']
df['non_english_rate'] = (df['non_english'] / df['total_pop'] * 100).round(1)
df['spanish_rate'] = (df['spanish_speakers'] / df['total_pop'] * 100).round(1)
print(df[['name', 'total_pop', 'median_income', 'poverty_rate', 'non_english_rate', 'spanish_rate']].head(10))This code loops through four counties in the Detroit metro area, requesting population, income, poverty, and language data for each tract. We then calculate derived measures like poverty rate and the percentage of residents with limited English proficiency (LEP).
Visualizing Census Data on Maps
Now we can create choropleth maps showing how these variables are distributed:
import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as cx
# Load and filter tract boundaries
tracts_url = "https://www2.census.gov/geo/tiger/TIGER2023/TRACT/tl_2023_26_tract.zip"
michigan_tracts = gpd.read_file(tracts_url)
metro_counties = ['163', '125', '099', '161']
metro_tracts = michigan_tracts[michigan_tracts['COUNTYFP'].isin(metro_counties)].copy()
# Merge with Census data
metro_merged = metro_tracts.merge(df, on='GEOID', how='left')
# Handle missing and invalid values
# Census uses negative values (like -666666666) to indicate missing data
metro_merged['median_income'] = pd.to_numeric(metro_merged['median_income'], errors='coerce')
metro_merged.loc[metro_merged['median_income'] < 0, 'median_income'] = None
# Convert to Web Mercator for basemap
metro_merged = metro_merged.to_crs(epsg=3857)
# Create map of median household income
fig, ax = plt.subplots(figsize=(12, 12))
metro_merged.plot(
column='median_income',
ax=ax,
legend=True,
legend_kwds={'label': 'Median Household Income ($)', 'shrink': 0.6},
cmap='Blues',
edgecolor='white',
linewidth=0.2,
missing_kwds={'color': 'lightgray', 'label': 'No data'}
)
cx.add_basemap(ax, source=cx.providers.CartoDB.PositronNoLabels, alpha=0.5)
ax.set_title('Median Household Income by Census Tract\nDetroit Metro Area (2022 ACS 5-Year Estimates)', fontsize=14)
ax.set_axis_off()
plt.tight_layout()
plt.show()A few notes on the code above. The Census Bureau uses special codes like -666666666 to indicate that data is unavailable for a tract, often because the sample size is too small for a reliable estimate. We convert these to None so they display as gray (“No data”) rather than throwing off the color scale. The missing_kwds parameter in GeoPandas controls how these missing values appear on the map.

The column parameter tells GeoPandas which variable to use for coloring the map. The cmap parameter specifies the color scheme. The missing_kwds parameter handles tracts where data is missing.
Collecting Data Across Multiple Years
The examples above request data for a single year. But we often requires tracking changes over time: how has poverty shifted? Are language access gaps widening or narrowing? The Census API lets you request historical ACS data by changing the year in your API call. Python command – for loop – can make your life easier too.
Here’s how to collect state-level SNAP (food stamps) data across multiple years using the requests library directly:
import requests
import pandas as pd
API_KEY = 'YOUR_API_KEY_HERE'
YEARS = range(2013, 2023) # ACS 5-year available from 2009
VARIABLES = "NAME,B22001_002E,B01001_001E" # SNAP recipients, total population
all_data = []
for year in YEARS:
url = f"https://api.census.gov/data/{year}/acs/acs5"
params = {
"get": VARIABLES,
"for": "state:*",
"key": API_KEY
}
response = requests.get(url, params=params)
if response.status_code == 200:
data = response.json()
df = pd.DataFrame(data[1:], columns=data[0])
df["year"] = year
all_data.append(df)
print(f"{year} done")
else:
print(f"{year} failed: {response.status_code}")
# Combine all years
result = pd.concat(all_data, ignore_index=True)
# Clean up column names and types
result = result.rename(columns={
"NAME": "state_name",
"state": "state_fips",
"B22001_002E": "snap_recipients",
"B01001_001E": "total_pop"
})
result[["snap_recipients", "total_pop"]] = result[["snap_recipients", "total_pop"]].apply(pd.to_numeric)
result["pct_snap"] = (result["snap_recipients"] / result["total_pop"] * 100).round(2)
# Reorder columns
result = result[["year", "state_fips", "state_name", "total_pop", "snap_recipients", "pct_snap"]]
print(result.head(10))Output:
year state_fips state_name total_pop snap_recipients pct_snap
0 2013 01 Alabama 4799918 856692 17.85
1 2013 02 Alaska prior720936 72633 10.07
2 2013 04 Arizona 6479703 997711 15.40
3 2013 05 Arkansas 2933369 457716 15.60
4 2013 06 California 37966138 4378304 11.53
5 2013 08 Colorado 5183562 479875 9.26
6 2013 09 Connecticut 3583561 392792 10.96
7 2013 10 Delaware 916229 126829 13.84
8 2013 11 Washington DC 633736 99470 15.70
9 2013 12 Florida 19259543 3290845 17.09
This approach uses the requests library directly instead of the census package, which gives you more control over the API URL structure. The pattern https://api.census.gov/data/{year}/acs/acs5 lets you swap in any year.
A note on year availability:
| Dataset | Years Available | Notes |
|---|---|---|
| ACS 5-Year | 2009-present | Most complete coverage for small areas |
| ACS 1-Year | 2005-present | Only areas with 65,000+ population |
| Decennial Census | 2000, 2010, 2020 | Complete count, limited variables |
With time series data, you can create trend visualizations, identify communities where conditions are improving or worsening, and provide temporal context for current disparities.
Overlaying Facility Locations on Demographic Data
The real power of combining Census data with facility locations is the ability to identify mismatches between where services are and where they’re needed. This approach operationalizes what researchers call “spatial accessibility” (Guagliardo, 2004). Studies of mental health service distribution have consistently found that facilities tend to cluster in higher-income areas, even when need is greater in lower-income communities (Cook et al., 2017).
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as cx
# Load facility data
facilities = pd.read_csv('MI_MH_facilities_detroit_metro.csv')
facilities_gdf = gpd.GeoDataFrame(
facilities,
geometry=gpd.points_from_xy(facilities.longitude, facilities.latitude),
crs="EPSG:4326"
).to_crs(epsg=3857)
# Filter to facilities with Spanish services
spanish_facilities = facilities_gdf[facilities_gdf['spanish'] == 1]
# Create map showing LEP population with Spanish-service facilities
fig, ax = plt.subplots(figsize=(12, 12))
# Layer 1: Tracts colored by Spanish LEP population
metro_merged.plot(
column='spanish_lep',
ax=ax,
legend=True,
legend_kwds={'label': 'Spanish-Speaking LEP Population', 'shrink': 0.6},
cmap='Oranges',
edgecolor='white',
linewidth=0.2,
missing_kwds={'color': 'lightgray'}
)
# Layer 2: Facilities with Spanish services
spanish_facilities.plot(
ax=ax,
color='#2563eb',
markersize=100,
edgecolor='white',
linewidth=1.5,
zorder=5,
label='Facilities with Spanish services'
)
cx.add_basemap(ax, source=cx.providers.CartoDB.PositronNoLabels, alpha=0.4)
ax.set_title('Spanish-Speaking LEP Population and Mental Health Facilities\nwith Spanish Language Services', fontsize=14)
ax.legend(loc='lower right', frameon=True, facecolor='white')
ax.set_axis_off()
plt.tight_layout()
plt.show()This map overlays facilities offering Spanish-language services on top of census tracts shaded by their Spanish-speaking population. If blue points cluster in lightly-shaded areas while heavily-shaded areas have few points, we have evidence of a service mismatch.

Handling Common Issues
- Missing data. Some census tracts have missing values, often because the population is too small for reliable estimates or the tract contains only group quarters (prisons, nursing homes). Use the
missing_kwdsparameter in GeoPandas to handle these gracefully. - Margins of error. ACS estimates come with uncertainty. For any variable ending in E (estimate), there’s a corresponding variable ending in M (margin of error). When differences between areas are small, check whether they exceed the margins of error before drawing conclusions.
- Variable availability. Not all variables are available at all geographic levels. Some detailed tables are only available at the county or state level. The API will return an error if you request an unavailable variable.
- Rate limits. The Census API has rate limits. If you’re making many requests, add delays between calls or request multiple variables in a single call rather than making separate calls for each variable.
Resources
The Census Bureau’s API documentation (https://www.census.gov/data/developers/data-sets.html) lists all available datasets and their variables.
The census package documentation (https://github.com/datamade/census) provides examples for different types of geographic queries.
For finding variable codes, the Census Reporter website (https://censusreporter.org/) offers a more user-friendly interface than the official variable list.
Social Explorer (https://www.socialexplorer.com/) provides an alternative interface for exploring Census data, with some features available for free.
