[Python] GIS (3): Choropleth Maps using matplotlib and geopandas

Now we focus specifically on choropleth maps: maps where geographic areas are shaded according to a data variable. These are among the most common visualizations in social work research, used to show poverty rates by neighborhood, service utilization by county, or health outcomes by region.

Choropleth maps introduce challenges beyond standard data visualization. The choices you make about geographic boundaries and classification methods can dramatically change what patterns readers perceive, in ways that bar charts and scatter plots do not. This post covers the technical skills to create choropleth maps in Python and the conceptual knowledge to make responsible choices about how to represent spatial data. For a refresher on general color principles and accessibility in data visualization, see the earlier post on matplotlib and seaborn fundamentals.

Choosing Geographic Boundaries

The first decision in any choropleth map is the geographic unit. Common options include states, metropolitan statistical areas (MSAs), counties, zip codes, census tracts, and census block groups. Each has tradeoffs, and the choice affects both what patterns you can detect and what conclusions readers draw (Fotheringham & Wong, 1991).

Census geographies follow a nested hierarchy: the nation contains states, states contain counties, counties contain census tracts, and tracts contain block groups. This means you can aggregate data upward (combining tracts into counties, counties into states) or request data at finer levels depending on your analysis needs.

Geographic Unit	Population Size	Pros	Cons
States	Millions	Useful for national comparisons, policy analysis	Far too large for local variation
Metropolitan Statistical Areas (MSAs)	Varies (50,000+ urban core)	Captures functional economic regions, useful for metro-level comparisons	Can cross state lines, boundaries change over time
Counties	Tens of thousands to millions	Good for state-level analyses, stable boundaries	Too large to capture neighborhood variation
Census tracts	~4,000 people	Small enough for local variation, large enough for reliable ACS estimates, designed to be demographically homogeneous	Homogeneity goal not always achieved
Zip codes	Varies widely	Familiar to general audiences	Designed for mail delivery, not analysis; boundaries don’t align with census units; change over time; can lead to different conclusions than census geographies (Krieger et al., 2002)
Block groups	600-3,000 people	Finest geographic detail available in ACS	Larger margins of error in ACS estimates

Please also refer to this post on matching across geographic units in Python:

[Python] How to Match State, County, and MSA Codes from Census Tract GEOIDs in Python

One critical concept is the modifiable areal unit problem (MAUP): the same underlying phenomenon can appear different depending on how you draw boundaries (Openshaw, 1984). Aggregating data to larger units tends to smooth out variation and can obscure important local patterns. Dark and Bram (2007) demonstrated how MAUP affects analysis of social service accessibility, finding that conclusions about service deserts changed substantially depending on the geographic unit used. There’s no perfect solution, but being aware of MAUP helps you interpret maps critically and communicate limitations to readers.

Anatomy of a Choropleth Map

Before diving into classification methods and color scales, let’s understand the visual components that make up a choropleth map. We build these maps using GeoPandas and matplotlib together: GeoPandas handles spatial data and provides the .plot() method, but under the hood it renders everything using matplotlib. This means all the matplotlib customization skills from the earlier visualization post apply directly here.

We use multiple Python packages to create the choropleth map:

matplotlib creates the figure and axes (fig, ax = plt.subplots())
GeoPandas draws polygons on those axes (gdf.plot(ax=ax, ...))
contextily adds basemap tiles to the same axes (cx.add_basemap(ax, ...))
matplotlib handles titles, legends, and final rendering

Because everything shares the same ax object, you can mix and match: add GeoPandas layers, then matplotlib annotations, then more GeoPandas layers.

Python

import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as cx

fig, ax = plt.subplots(figsize=(12, 10))

# 1. POLYGONS: The geographic shapes filled with color based on data values
metro_merged.plot(
    column='median_income',      # Data variable that determines fill color
    ax=ax,
    cmap='Blues',                # Color map (light-to-dark color progression)
    edgecolor='white',           # Border color between polygons
    linewidth=0.3,               # Border thickness
    alpha=0.8,                   # Transparency (0=invisible, 1=opaque)
    legend=True,
    legend_kwds={'label': 'Median Income ($)', 'shrink': 0.7}
)

# 2. BASEMAP: Background map tiles showing roads, water, labels for context
cx.add_basemap(ax, source=cx.providers.CartoDB.Positron, alpha=0.5)

# 3. TITLE: Describes what the map shows
ax.set_title('Median Household Income by Census Tract', fontsize=14, fontweight='bold')

# 4. AXIS: Usually turned off for maps (coordinates aren't meaningful to readers)
ax.set_axis_off()

# 5. DATA SOURCE: Attribution for transparency
ax.text(0.02, 0.02, 'Source: ACS 2022 5-Year Estimates', 
        transform=ax.transAxes, fontsize=8, color='gray')

plt.tight_layout()
plt.show()

import geopandas as gpd
import matplotlib.pyplot as plt
import contextily as cx

fig, ax = plt.subplots(figsize=(12, 10))

# 1. POLYGONS: The geographic shapes filled with color based on data values
metro_merged.plot(
    column='median_income',      # Data variable that determines fill color
    ax=ax,
    cmap='Blues',                # Color map (light-to-dark color progression)
    edgecolor='white',           # Border color between polygons
    linewidth=0.3,               # Border thickness
    alpha=0.8,                   # Transparency (0=invisible, 1=opaque)
    legend=True,
    legend_kwds={'label': 'Median Income ($)', 'shrink': 0.7}
)

# 2. BASEMAP: Background map tiles showing roads, water, labels for context
cx.add_basemap(ax, source=cx.providers.CartoDB.Positron, alpha=0.5)

# 3. TITLE: Describes what the map shows
ax.set_title('Median Household Income by Census Tract', fontsize=14, fontweight='bold')

# 4. AXIS: Usually turned off for maps (coordinates aren't meaningful to readers)
ax.set_axis_off()

# 5. DATA SOURCE: Attribution for transparency
ax.text(0.02, 0.02, 'Source: ACS 2022 5-Year Estimates', 
        transform=ax.transAxes, fontsize=8, color='gray')

plt.tight_layout()
plt.show()

The key components are:

Component	Purpose	Key Parameters
Polygons	Geographic areas filled with colors based on data	`column` (data variable), `cmap` (colors), `edgecolor`, `linewidth`, `alpha`
Legend	Shows what colors mean	`legend=True`, `legend_kwds` for label and size
Basemap	Provides geographic context (roads, water, place names)	`cx.add_basemap()`, `source` for tile style, `alpha` for transparency
Title	Tells readers what they’re looking at	`ax.set_title()`
Data source	Credits data origin, supports reproducibility	`ax.text()` positioned with `transform=ax.transAxes`

The ax object is central to building layered maps. Every element is added to the same axes, which is why you pass ax=ax to each plotting function.

Classification Methods

Raw data values must be grouped into classes for display. The method you choose affects which patterns are visible. This is not merely a technical choice; Brewer and Pickle (2002) found that different classification schemes led map readers to draw different conclusions about the same public health data.

Method	How It Works	Pros	Cons
Equal interval	Divides data range into equal-sized bins (e.g., $20K increments from $20K-$120K)	Intuitive, easy to explain	Empty classes when data is skewed
Quantiles	Equal number of observations in each class (e.g., 20% per class)	Ensures visual variation across the map	Can group very different values together or separate similar values
Natural breaks (Jenks)	Algorithm finds “natural” groupings by minimizing within-class variance (Jenks, 1967)	Often produces intuitive-looking maps	Breaks are data-dependent, hard to compare across maps
Standard deviation	Shows distance from the mean	Highlights unusually high or low areas	Requires readers to understand standard deviation
Manual breaks	You specify meaningful thresholds (e.g., poverty at 10%, 20%, 30%)	Supports policy-relevant interpretation	Requires domain knowledge

Let’s see these in practice:

Python

import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import mapclassify as mc

# Assume metro_merged is our GeoDataFrame with Census data from Post 2 (https://nariyoo.com/python-how-to-match-state-county-and-msa-codes-from-census-tract-geoids-in-python/)
# Clean the data first
metro_merged['median_income'] = pd.to_numeric(metro_merged['median_income'], errors='coerce')
metro_merged.loc[metro_merged['median_income'] < 0, 'median_income'] = None

# Remove rows with missing values for classification comparison
data_for_classification = metro_merged.dropna(subset=['median_income'])

# Compare different classification methods
fig, axes = plt.subplots(2, 2, figsize=(14, 14))

methods = [
    ('Equal Interval', 'equalinterval'),
    ('Quantiles', 'quantiles'),
    ('Natural Breaks', 'naturalbreaks'),
    ('Standard Deviation', 'stdmean')
]

for ax, (title, scheme) in zip(axes.flatten(), methods):
    data_for_classification.plot(
        column='median_income',
        ax=ax,
        scheme=scheme,
        k=5,  # number of classes
        cmap='Blues',
        edgecolor='white',
        linewidth=0.2,
        legend=True,
        legend_kwds={'loc': 'lower right', 'fontsize': 8}
    )
    ax.set_title(f'{title} Classification', fontsize=12)
    ax.set_axis_off()

plt.suptitle('Same Data, Different Classification Methods', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
import mapclassify as mc

# Assume metro_merged is our GeoDataFrame with Census data from Post 2 (https://nariyoo.com/python-how-to-match-state-county-and-msa-codes-from-census-tract-geoids-in-python/)
# Clean the data first
metro_merged['median_income'] = pd.to_numeric(metro_merged['median_income'], errors='coerce')
metro_merged.loc[metro_merged['median_income'] < 0, 'median_income'] = None

# Remove rows with missing values for classification comparison
data_for_classification = metro_merged.dropna(subset=['median_income'])

# Compare different classification methods
fig, axes = plt.subplots(2, 2, figsize=(14, 14))

methods = [
    ('Equal Interval', 'equalinterval'),
    ('Quantiles', 'quantiles'),
    ('Natural Breaks', 'naturalbreaks'),
    ('Standard Deviation', 'stdmean')
]

for ax, (title, scheme) in zip(axes.flatten(), methods):
    data_for_classification.plot(
        column='median_income',
        ax=ax,
        scheme=scheme,
        k=5,  # number of classes
        cmap='Blues',
        edgecolor='white',
        linewidth=0.2,
        legend=True,
        legend_kwds={'loc': 'lower right', 'fontsize': 8}
    )
    ax.set_title(f'{title} Classification', fontsize=12)
    ax.set_axis_off()

plt.suptitle('Same Data, Different Classification Methods', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

Notice how the same data produces different visual patterns. Equal interval may show most tracts in one or two classes if income is skewed. Quantiles spread color evenly but may group a $40,000 tract with a $70,000 tract. Natural breaks finds clusters in the data. Standard deviation highlights outliers.

Choosing Color Scales

Color choice affects both aesthetics and interpretation. Three types of color scales serve different purposes.

Sequential scales use a single hue progressing from light to dark. These are appropriate when data has a natural order from low to high: income, population, rates. Light colors typically represent low values, dark colors high values. Examples include Blues, Greens, and Oranges in matplotlib.

Diverging scales use two hues meeting at a neutral midpoint. These are appropriate when data has a meaningful center point: deviation from average, change from baseline, values above and below a threshold. The midpoint should represent something meaningful. Examples include RdBu (red-blue) and PRGn (purple-green).

Qualitative scales use distinct hues without implying order. These are appropriate for categorical data: land use types, racial/ethnic plurality, service categories. Don’t use these for continuous data.

Accessibility matters. Approximately 8% of men and 0.5% of women have some form of color vision deficiency, most commonly difficulty distinguishing red from green (Brewer, 2005). Avoid red-green combinations for sequential data. The ColorBrewer website (colorbrewer2.org) provides palettes designed for cartography with options for colorblind-safe schemes.

https://colorbrewer2.org/#type=sequential&scheme=YlOrRd&n=3

Python

import matplotlib.pyplot as plt

# Sequential scale for income (low to high)
fig, axes = plt.subplots(1, 3, figsize=(16, 6))

# Blues - good sequential scale
metro_merged.plot(
    column='median_income',
    ax=axes[0],
    scheme='quantiles',
    k=5,
    cmap='Blues',
    edgecolor='white',
    linewidth=0.2,
    legend=True,
    missing_kwds={'color': 'lightgray'}
)
axes[0].set_title('Sequential: Blues\n(Good for income)', fontsize=11)
axes[0].set_axis_off()

# Diverging scale - appropriate if showing deviation from median
median_income = metro_merged['median_income'].median()
metro_merged['income_diff'] = metro_merged['median_income'] - median_income

metro_merged.plot(
    column='income_diff',
    ax=axes[1],
    scheme='quantiles',
    k=5,
    cmap='RdBu',
    edgecolor='white',
    linewidth=0.2,
    legend=True,
    missing_kwds={'color': 'lightgray'}
)
axes[1].set_title('Diverging: RdBu\n(Good for deviation from median)', fontsize=11)
axes[1].set_axis_off()

# Colorblind-safe sequential
metro_merged.plot(
    column='median_income',
    ax=axes[2],
    scheme='quantiles',
    k=5,
    cmap='YlGnBu',  # Yellow-Green-Blue, colorblind safe
    edgecolor='white',
    linewidth=0.2,
    legend=True,
    missing_kwds={'color': 'lightgray'}
)
axes[2].set_title('Colorblind-Safe: YlGnBu\n(Accessible alternative)', fontsize=11)
axes[2].set_axis_off()

plt.tight_layout()
plt.show()

import matplotlib.pyplot as plt

# Sequential scale for income (low to high)
fig, axes = plt.subplots(1, 3, figsize=(16, 6))

# Blues - good sequential scale
metro_merged.plot(
    column='median_income',
    ax=axes[0],
    scheme='quantiles',
    k=5,
    cmap='Blues',
    edgecolor='white',
    linewidth=0.2,
    legend=True,
    missing_kwds={'color': 'lightgray'}
)
axes[0].set_title('Sequential: Blues\n(Good for income)', fontsize=11)
axes[0].set_axis_off()

# Diverging scale - appropriate if showing deviation from median
median_income = metro_merged['median_income'].median()
metro_merged['income_diff'] = metro_merged['median_income'] - median_income

metro_merged.plot(
    column='income_diff',
    ax=axes[1],
    scheme='quantiles',
    k=5,
    cmap='RdBu',
    edgecolor='white',
    linewidth=0.2,
    legend=True,
    missing_kwds={'color': 'lightgray'}
)
axes[1].set_title('Diverging: RdBu\n(Good for deviation from median)', fontsize=11)
axes[1].set_axis_off()

# Colorblind-safe sequential
metro_merged.plot(
    column='median_income',
    ax=axes[2],
    scheme='quantiles',
    k=5,
    cmap='YlGnBu',  # Yellow-Green-Blue, colorblind safe
    edgecolor='white',
    linewidth=0.2,
    legend=True,
    missing_kwds={'color': 'lightgray'}
)
axes[2].set_title('Colorblind-Safe: YlGnBu\n(Accessible alternative)', fontsize=11)
axes[2].set_axis_off()

plt.tight_layout()
plt.show()

Here’s what each parameter controls:

Parameter	What It Does	Example Values
`column`	Which data variable determines the fill color	`'median_income'`, `'poverty_rate'`
`ax`	Which subplot to draw on (for multi-panel figures)	`axes[0]`, `axes[1]`, `ax`
`scheme`	Classification method for grouping values	`'quantiles'`, `'equalinterval'`, `'naturalbreaks'`
`k`	Number of color classes	`5` is standard, `3-7` is typical range
`cmap`	Color palette name	Sequential: `'Blues'`, `'Oranges'`; Diverging: `'RdBu'`, `'PRGn'`; Colorblind-safe: `'YlGnBu'`, `'viridis'`
`edgecolor`	Border color between polygons	`'white'`, `'black'`, `'none'`
`linewidth`	Border thickness	`0.2` (thin), `0.5` (medium), `1.0` (thick)
`legend`	Whether to show the color legend	`True` or `False`
`missing_kwds`	How to display areas with no data	`{'color': 'lightgray', 'label': 'No data'}`

The fig, axes = plt.subplots(1, 3) line creates a figure with three side-by-side panels. The axes variable becomes a list, so axes[0] is the first panel, axes[1] is the second, and so on. This lets you compare different visualizations of the same data.

Interpreting Spatial Patterns Responsibly

Choropleth maps invite causal interpretation. When we see that high-poverty areas have fewer mental health facilities, it’s tempting to conclude that poverty causes service deserts or that service deserts cause poverty. But correlation on a map is not causation, and several pitfalls await the unwary.

Ecological fallacy occurs when we draw conclusions about individuals from aggregate data. If a census tract has high poverty and few services, we cannot conclude that poor individuals in that tract lack access. Some residents may travel to adjacent areas; some facilities may serve clients from multiple tracts. Aggregate patterns suggest hypotheses but don’t prove individual-level relationships (Robinson, 1950).

Visual area bias affects perception. Large geographic areas draw attention even if they contain few people. Rural tracts in western Washtenaw County are geographically large but sparsely populated; small tracts in Detroit contain far more people. Consider population-weighted representations or cartograms when population distribution matters.

Boundary effects arise because administrative boundaries don’t match how people actually move through space. A facility just across a tract boundary serves residents of both tracts. Simple visual inspection can’t capture these cross-boundary flows.

Missing context is perhaps the biggest risk. A map showing facility locations and income doesn’t show transportation networks, cultural factors affecting service utilization, quality of care, or historical patterns of disinvestment. Maps simplify reality; responsible interpretation requires acknowledging what’s not shown.

A Note on Choropleth Maps

The pitfalls above aren’t reasons to avoid choropleth maps. These visualizations communicate spatial patterns at a glance. A well-designed map can reveal that poverty concentrates in specific neighborhoods, that mental health services cluster in affluent suburbs, or that environmental hazards disproportionately affect communities of color. This visual immediacy makes choropleth maps effective for advocacy, policy briefs, and community presentations.

But this same immediacy creates risks. Research on map interpretation (Monmonier, 1996) has documented how design choices shape perception. The same underlying data can tell different stories depending on how it’s visualized. Classification methods that emphasize variation in one part of the distribution obscure variation elsewhere. Color schemes that work for one audience may be inaccessible to colorblind readers. Geographic boundaries that seem natural are often administrative artifacts that don’t reflect how communities experience space.

Resources

Choosing Colormaps in Matplotlib: https://matplotlib.org/stable/users/explain/colors/colormaps.html

ColorBrewer (https://colorbrewer2.org/) provides color palettes designed for cartography, with filters for colorblind-safe and print-friendly options.

The mapclassify documentation (https://pysal.org/mapclassify/) explains the algorithms behind different classification schemes.

Monmonier’s How to Lie with Maps (1996) remains the classic introduction to critical map reading and responsible cartography.

January 9, 2026

[Python] GIS (3): Choropleth Maps using matplotlib and geopandas

Choosing Geographic Boundaries

Anatomy of a Choropleth Map

Classification Methods

Choosing Color Scales

Interpreting Spatial Patterns Responsibly

A Note on Choropleth Maps

Resources

Related Posts

Leave a ReplyCancel reply

Translate this page into:

Categories

[Python] GIS (3): Choropleth Maps using matplotlib and geopandas

Choosing Geographic Boundaries

Anatomy of a Choropleth Map

Classification Methods

Choosing Color Scales

Interpreting Spatial Patterns Responsibly

A Note on Choropleth Maps

Resources

Share this:

Related Posts

Leave a ReplyCancel reply

Translate this page into:

Categories