[Python] Visualizing Change Over Time: Temporal Data (Time Series Analysis)

In previous posts, we covered visualizing continuous outcomes across groups (box plots, violin plots) and categorical outcomes (count plots, stacked bars, crosstabs). Many questions in social science research involve a third dimension: time. How have caseloads changed over the past year? Did the new intake procedure reduce wait times? What happened to service utilization after the policy change?

This post covers how to work with datetime data in pandas, visualize trends with line plots, aggregate data at different time intervals, and annotate your charts with key events like policy changes or program launches.

Types of Temporal Data

It helps to think about what kind of temporal pattern you’re looking for.

Pattern	Definition	Social Work Examples
Trend	Long-term movement in a particular direction	Caseloads increasing over several years; average session lengths declining; completion rates improving steadily
Seasonality	Patterns that repeat at regular intervals	Crisis calls spiking during winter holidays; youth program enrollment dropping every summer; housing applications peaking at the start of each month
Event	One-time occurrence that causes a shift in the data	New policy takes effect; program launches; pandemic begins; funding gets cut
Noise	Random variation with no underlying pattern	Day-to-day fluctuations in service volume that look dramatic but reflect nothing systematic

The challenge of time series visualization is showing the patterns you care about while not being misled by noise.

Working with Datetime Data in Pandas

Pandas has excellent support for datetime data, but your dates need to be in the right format. When you load data from a CSV, date columns often come in as strings. You need to convert them to datetime objects.

import pandas as pd
import numpy as np

# Create mock data with dates as strings
data = pd.DataFrame({
    'date': ['2024-01-15', '2024-01-16', '2024-01-17', '2024-01-18', '2024-01-19'],
    'clients_served': [45, 52, 48, 55, 41]
})

# Check the data type
print(data['date'].dtype)  # Output: object (meaning string)

# Convert to datetime
data['date'] = pd.to_datetime(data['date'])
print(data['date'].dtype)  # Output: datetime64[ns]

When reading a CSV, you can convert dates during import:

Python

# Convert during import
data = pd.read_csv('service_data.csv', parse_dates=['date'])

# Convert during import
data = pd.read_csv('service_data.csv', parse_dates=['date'])

Once you have datetime columns, you can extract components like year, month, day of week:

Python

data['year'] = data['date'].dt.year
data['month'] = data['date'].dt.month
data['day_of_week'] = data['date'].dt.dayofweek  # 0 = Monday, 6 = Sunday
data['week_of_year'] = data['date'].dt.isocalendar().week

data['year'] = data['date'].dt.year
data['month'] = data['date'].dt.month
data['day_of_week'] = data['date'].dt.dayofweek  # 0 = Monday, 6 = Sunday
data['week_of_year'] = data['date'].dt.isocalendar().week

Basic Line Plots

The line plot is the workhorse of time series visualization. It shows how a value changes over time, with time on the x-axis and the measure on the y-axis.

Python

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data: daily client volume over 3 months
np.random.seed(42)
dates = pd.date_range(start='2024-01-01', end='2024-03-31', freq='D')
base = 50 + np.arange(len(dates)) * 0.1  # Slight upward trend
noise = np.random.normal(0, 5, len(dates))
clients = base + noise

data = pd.DataFrame({
    'date': dates,
    'clients_served': clients.astype(int)
})

plt.figure(figsize=(10, 5))
plt.plot(data['date'], data['clients_served'], color='#0077BB', linewidth=1.5)

plt.xlabel('Date', fontsize=11)
plt.ylabel('Clients Served', fontsize=11)
plt.title('Daily Client Volume (Q1 2024)', fontsize=13)

# Clean up the axes
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data: daily client volume over 3 months
np.random.seed(42)
dates = pd.date_range(start='2024-01-01', end='2024-03-31', freq='D')
base = 50 + np.arange(len(dates)) * 0.1  # Slight upward trend
noise = np.random.normal(0, 5, len(dates))
clients = base + noise

data = pd.DataFrame({
    'date': dates,
    'clients_served': clients.astype(int)
})

plt.figure(figsize=(10, 5))
plt.plot(data['date'], data['clients_served'], color='#0077BB', linewidth=1.5)

plt.xlabel('Date', fontsize=11)
plt.ylabel('Clients Served', fontsize=11)
plt.title('Daily Client Volume (Q1 2024)', fontsize=13)

# Clean up the axes
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

The daily data shows a lot of noise. You can see the general upward trend, but day-to-day variation makes it hard to read.

Temporal Aggregation: Smoothing the Noise

One solution is to aggregate to a coarser time interval. Instead of daily data, show weekly or monthly totals or averages.

Pandas makes this easy with the resample() method. To use it, your datetime column must be the DataFrame index.

Python

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data
np.random.seed(42)
dates = pd.date_range(start='2024-01-01', end='2024-03-31', freq='D')
base = 50 + np.arange(len(dates)) * 0.1
noise = np.random.normal(0, 5, len(dates))
clients = base + noise

data = pd.DataFrame({
    'date': dates,
    'clients_served': clients.astype(int)
})

# Set date as index for resampling
data = data.set_index('date')

# Resample to weekly totals
weekly = data.resample('W').sum()

# Resample to weekly averages
weekly_avg = data.resample('W').mean()

plt.figure(figsize=(10, 5))
plt.plot(weekly.index, weekly['clients_served'], 
         color='#0077BB', linewidth=2, marker='o', markersize=4)

plt.xlabel('Week', fontsize=11)
plt.ylabel('Clients Served (Weekly Total)', fontsize=11)
plt.title('Weekly Client Volume (Q1 2024)', fontsize=13)

plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data
np.random.seed(42)
dates = pd.date_range(start='2024-01-01', end='2024-03-31', freq='D')
base = 50 + np.arange(len(dates)) * 0.1
noise = np.random.normal(0, 5, len(dates))
clients = base + noise

data = pd.DataFrame({
    'date': dates,
    'clients_served': clients.astype(int)
})

# Set date as index for resampling
data = data.set_index('date')

# Resample to weekly totals
weekly = data.resample('W').sum()

# Resample to weekly averages
weekly_avg = data.resample('W').mean()

plt.figure(figsize=(10, 5))
plt.plot(weekly.index, weekly['clients_served'], 
         color='#0077BB', linewidth=2, marker='o', markersize=4)

plt.xlabel('Week', fontsize=11)
plt.ylabel('Clients Served (Weekly Total)', fontsize=11)
plt.title('Weekly Client Volume (Q1 2024)', fontsize=13)

plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

The weekly aggregation smooths out the day-to-day noise, making the trend easier to see.

Common resample frequencies include ‘D’ for daily, ‘W’ for weekly, ‘ME’ for month-end, ‘MS’ for month-start, ‘QE’ for quarter-end, and ‘YE’ for year-end. You can also use numbers: ‘2W’ for every two weeks, ‘3ME’ for every three months.

Choosing the Right Aggregation

The choice of aggregation level depends on your question and your audience.

Daily data shows maximum detail but can be noisy. Use it when you need to identify specific days with unusual values, or when your audience needs precision.
Weekly data smooths short-term noise while preserving monthly patterns. Good for operational dashboards and regular reporting.
Monthly data reveals seasonal patterns and long-term trends. Hides within-month variation. Good for reports to leadership or funders.
Quarterly or yearly data shows only the biggest patterns. Appropriate for strategic planning or historical analysis spanning many years.

There’s a tradeoff: aggregation reduces noise but also hides real variation. A monthly average might mask that one week was a crisis and three weeks were normal.

Comparing Multiple Time Series

Often you want to compare trends across groups: service utilization by program, caseloads by region, outcomes by cohort. Plot multiple lines on the same axes.

Python

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data: monthly caseloads for three programs
np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=24, freq='MS')

data = pd.DataFrame({
    'month': months,
    'Program A': 120 + np.cumsum(np.random.normal(2, 5, 24)),
    'Program B': 80 + np.cumsum(np.random.normal(1, 4, 24)),
    'Program C': 100 + np.cumsum(np.random.normal(0, 6, 24))
})

plt.figure(figsize=(10, 5))

plt.plot(data['month'], data['Program A'], 
         color='#0077BB', linewidth=2, label='Program A')
plt.plot(data['month'], data['Program B'], 
         color='#EE7733', linewidth=2, label='Program B')
plt.plot(data['month'], data['Program C'], 
         color='#009988', linewidth=2, label='Program C')

plt.xlabel('Month', fontsize=11)
plt.ylabel('Active Caseload', fontsize=11)
plt.title('Monthly Caseload by Program (2023-2024)', fontsize=13)
plt.legend(frameon=False)

plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data: monthly caseloads for three programs
np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=24, freq='MS')

data = pd.DataFrame({
    'month': months,
    'Program A': 120 + np.cumsum(np.random.normal(2, 5, 24)),
    'Program B': 80 + np.cumsum(np.random.normal(1, 4, 24)),
    'Program C': 100 + np.cumsum(np.random.normal(0, 6, 24))
})

plt.figure(figsize=(10, 5))

plt.plot(data['month'], data['Program A'], 
         color='#0077BB', linewidth=2, label='Program A')
plt.plot(data['month'], data['Program B'], 
         color='#EE7733', linewidth=2, label='Program B')
plt.plot(data['month'], data['Program C'], 
         color='#009988', linewidth=2, label='Program C')

plt.xlabel('Month', fontsize=11)
plt.ylabel('Active Caseload', fontsize=11)
plt.title('Monthly Caseload by Program (2023-2024)', fontsize=13)
plt.legend(frameon=False)

plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

When comparing multiple series, use colors that are distinguishable for colorblind viewers (like our blue/orange/teal palette) and keep the number of lines manageable. More than four or five lines on one chart becomes hard to read.

Absolute vs. Percent Change

Sometimes you care about absolute values: how many clients did we serve? Other times, you care about relative change: how much did utilization increase?

Percent change from a baseline can be more informative when comparing groups of different sizes. A program that went from 50 to 75 clients (50% increase) and one that went from 200 to 225 (12.5% increase) look similar in absolute terms but very different in relative terms.

Python

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data
np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=12, freq='MS')

data = pd.DataFrame({
    'month': months,
    'Small Program': 50 + np.cumsum(np.random.normal(2, 3, 12)),
    'Large Program': 200 + np.cumsum(np.random.normal(2, 8, 12))
})

# Calculate percent change from first month
data['Small Program (% change)'] = (
    (data['Small Program'] - data['Small Program'].iloc[0]) 
    / data['Small Program'].iloc[0] * 100
)
data['Large Program (% change)'] = (
    (data['Large Program'] - data['Large Program'].iloc[0]) 
    / data['Large Program'].iloc[0] * 100
)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Absolute values
axes[0].plot(data['month'], data['Small Program'], 
             color='#0077BB', linewidth=2, label='Small Program')
axes[0].plot(data['month'], data['Large Program'], 
             color='#EE7733', linewidth=2, label='Large Program')
axes[0].set_xlabel('Month', fontsize=11)
axes[0].set_ylabel('Clients', fontsize=11)
axes[0].set_title('Absolute Caseload', fontsize=12)
axes[0].legend(frameon=False)
axes[0].spines['top'].set_visible(False)
axes[0].spines['right'].set_visible(False)

# Percent change
axes[1].plot(data['month'], data['Small Program (% change)'], 
             color='#0077BB', linewidth=2, label='Small Program')
axes[1].plot(data['month'], data['Large Program (% change)'], 
             color='#EE7733', linewidth=2, label='Large Program')
axes[1].axhline(y=0, color='gray', linestyle='--', linewidth=0.8)
axes[1].set_xlabel('Month', fontsize=11)
axes[1].set_ylabel('% Change from January', fontsize=11)
axes[1].set_title('Percent Change from Baseline', fontsize=12)
axes[1].legend(frameon=False)
axes[1].spines['top'].set_visible(False)
axes[1].spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data
np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=12, freq='MS')

data = pd.DataFrame({
    'month': months,
    'Small Program': 50 + np.cumsum(np.random.normal(2, 3, 12)),
    'Large Program': 200 + np.cumsum(np.random.normal(2, 8, 12))
})

# Calculate percent change from first month
data['Small Program (% change)'] = (
    (data['Small Program'] - data['Small Program'].iloc[0]) 
    / data['Small Program'].iloc[0] * 100
)
data['Large Program (% change)'] = (
    (data['Large Program'] - data['Large Program'].iloc[0]) 
    / data['Large Program'].iloc[0] * 100
)

fig, axes = plt.subplots(1, 2, figsize=(12, 4))

# Absolute values
axes[0].plot(data['month'], data['Small Program'], 
             color='#0077BB', linewidth=2, label='Small Program')
axes[0].plot(data['month'], data['Large Program'], 
             color='#EE7733', linewidth=2, label='Large Program')
axes[0].set_xlabel('Month', fontsize=11)
axes[0].set_ylabel('Clients', fontsize=11)
axes[0].set_title('Absolute Caseload', fontsize=12)
axes[0].legend(frameon=False)
axes[0].spines['top'].set_visible(False)
axes[0].spines['right'].set_visible(False)

# Percent change
axes[1].plot(data['month'], data['Small Program (% change)'], 
             color='#0077BB', linewidth=2, label='Small Program')
axes[1].plot(data['month'], data['Large Program (% change)'], 
             color='#EE7733', linewidth=2, label='Large Program')
axes[1].axhline(y=0, color='gray', linestyle='--', linewidth=0.8)
axes[1].set_xlabel('Month', fontsize=11)
axes[1].set_ylabel('% Change from January', fontsize=11)
axes[1].set_title('Percent Change from Baseline', fontsize=12)
axes[1].legend(frameon=False)
axes[1].spines['top'].set_visible(False)
axes[1].spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

The left panel shows that Large Program serves many more clients. The right panel shows that Small Program is growing faster in relative terms.

Annotating Events and Interventions

One of the most important uses of time series visualization in social work is showing what happened before and after a key event: a policy change, a new program launch, a funding cut, or an external shock like the pandemic.

Use axvline() to draw a vertical line at a specific date, and text() or annotate() to add a label.

Python

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data: service wait times before and after a process change
np.random.seed(42)
dates = pd.date_range(start='2023-07-01', end='2024-06-30', freq='W')

# Wait times drop after the intervention on 2024-01-15
intervention_date = pd.Timestamp('2024-01-15')

wait_times = []
for d in dates:
    if d < intervention_date:
        wait_times.append(np.random.normal(21, 4))  # ~21 days before
    else:
        wait_times.append(np.random.normal(14, 3))  # ~14 days after

data = pd.DataFrame({
    'week': dates,
    'wait_days': wait_times
})

plt.figure(figsize=(10, 5))

plt.plot(data['week'], data['wait_days'], 
         color='#0077BB', linewidth=1.5)

# Add vertical line at intervention
plt.axvline(x=intervention_date, color='#CC3311', linestyle='--', linewidth=2)

# Add text annotation
plt.text(intervention_date + pd.Timedelta(days=7), 28, 
         'New intake\nprocedure\nimplemented',
         fontsize=10, color='#CC3311', verticalalignment='top')

plt.xlabel('Week', fontsize=11)
plt.ylabel('Wait Time (Days)', fontsize=11)
plt.title('Average Wait Time for Initial Assessment', fontsize=13)

plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data: service wait times before and after a process change
np.random.seed(42)
dates = pd.date_range(start='2023-07-01', end='2024-06-30', freq='W')

# Wait times drop after the intervention on 2024-01-15
intervention_date = pd.Timestamp('2024-01-15')

wait_times = []
for d in dates:
    if d < intervention_date:
        wait_times.append(np.random.normal(21, 4))  # ~21 days before
    else:
        wait_times.append(np.random.normal(14, 3))  # ~14 days after

data = pd.DataFrame({
    'week': dates,
    'wait_days': wait_times
})

plt.figure(figsize=(10, 5))

plt.plot(data['week'], data['wait_days'], 
         color='#0077BB', linewidth=1.5)

# Add vertical line at intervention
plt.axvline(x=intervention_date, color='#CC3311', linestyle='--', linewidth=2)

# Add text annotation
plt.text(intervention_date + pd.Timedelta(days=7), 28, 
         'New intake\nprocedure\nimplemented',
         fontsize=10, color='#CC3311', verticalalignment='top')

plt.xlabel('Week', fontsize=11)
plt.ylabel('Wait Time (Days)', fontsize=11)
plt.title('Average Wait Time for Initial Assessment', fontsize=13)

plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

The vertical line draws the eye to the moment of change. Viewers can immediately see that wait times dropped after the new procedure was implemented.

Multiple Annotations

Sometimes you need to mark multiple events. Be careful not to clutter the chart.

Python

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data: crisis calls over 2 years
np.random.seed(42)
dates = pd.date_range(start='2022-01-01', end='2023-12-31', freq='W')

# Base level with some variation
calls = 100 + np.random.normal(0, 10, len(dates))

# Add effects of events
for i, d in enumerate(dates):
    # Spike during pandemic surge (early 2022)
    if pd.Timestamp('2022-01-01') <= d <= pd.Timestamp('2022-03-31'):
        calls[i] += 30
    # Drop after new crisis team starts (mid 2022)
    if d >= pd.Timestamp('2022-07-01'):
        calls[i] -= 15
    # Spike during winter holidays
    if d.month == 12:
        calls[i] += 20

data = pd.DataFrame({
    'week': dates,
    'crisis_calls': calls
})

# Key events
events = [
    (pd.Timestamp('2022-01-15'), 'Omicron\nsurge'),
    (pd.Timestamp('2022-07-01'), 'Mobile crisis\nteam launches'),
]

plt.figure(figsize=(11, 5))

plt.plot(data['week'], data['crisis_calls'], 
         color='#0077BB', linewidth=1.5)

# Add vertical lines and labels for events
colors = ['#CC3311', '#009988']
y_positions = [155, 70]

for (date, label), color, y_pos in zip(events, colors, y_positions):
    plt.axvline(x=date, color=color, linestyle='--', linewidth=1.5, alpha=0.8)
    plt.text(date + pd.Timedelta(days=10), y_pos, label,
             fontsize=9, color=color, verticalalignment='center')

plt.xlabel('Week', fontsize=11)
plt.ylabel('Crisis Line Calls', fontsize=11)
plt.title('Weekly Crisis Line Volume (2022-2023)', fontsize=13)

plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data: crisis calls over 2 years
np.random.seed(42)
dates = pd.date_range(start='2022-01-01', end='2023-12-31', freq='W')

# Base level with some variation
calls = 100 + np.random.normal(0, 10, len(dates))

# Add effects of events
for i, d in enumerate(dates):
    # Spike during pandemic surge (early 2022)
    if pd.Timestamp('2022-01-01') <= d <= pd.Timestamp('2022-03-31'):
        calls[i] += 30
    # Drop after new crisis team starts (mid 2022)
    if d >= pd.Timestamp('2022-07-01'):
        calls[i] -= 15
    # Spike during winter holidays
    if d.month == 12:
        calls[i] += 20

data = pd.DataFrame({
    'week': dates,
    'crisis_calls': calls
})

# Key events
events = [
    (pd.Timestamp('2022-01-15'), 'Omicron\nsurge'),
    (pd.Timestamp('2022-07-01'), 'Mobile crisis\nteam launches'),
]

plt.figure(figsize=(11, 5))

plt.plot(data['week'], data['crisis_calls'], 
         color='#0077BB', linewidth=1.5)

# Add vertical lines and labels for events
colors = ['#CC3311', '#009988']
y_positions = [155, 70]

for (date, label), color, y_pos in zip(events, colors, y_positions):
    plt.axvline(x=date, color=color, linestyle='--', linewidth=1.5, alpha=0.8)
    plt.text(date + pd.Timedelta(days=10), y_pos, label,
             fontsize=9, color=color, verticalalignment='center')

plt.xlabel('Week', fontsize=11)
plt.ylabel('Crisis Line Calls', fontsize=11)
plt.title('Weekly Crisis Line Volume (2022-2023)', fontsize=13)

plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

Shaded Regions for Time Periods

Sometimes an intervention isn’t a single moment but a period. Use axvspan() to shade a region.

Python

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data
np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=24, freq='MS')

enrollment = 150 + np.cumsum(np.random.normal(0, 5, 24))
# Add effect of pilot period
enrollment[6:12] += np.linspace(0, 20, 6)  # Gradual increase during pilot
enrollment[12:] += 20  # Sustained after pilot

data = pd.DataFrame({
    'month': months,
    'enrollment': enrollment
})

plt.figure(figsize=(10, 5))

# Shade the pilot period
pilot_start = pd.Timestamp('2023-07-01')
pilot_end = pd.Timestamp('2023-12-31')

plt.axvspan(pilot_start, pilot_end, alpha=0.2, color='#EE7733', label='Pilot period')

plt.plot(data['month'], data['enrollment'], 
         color='#0077BB', linewidth=2)

plt.xlabel('Month', fontsize=11)
plt.ylabel('Program Enrollment', fontsize=11)
plt.title('Monthly Enrollment with Pilot Period Highlighted', fontsize=13)
plt.legend(frameon=False, loc='upper left')

plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Create mock data
np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=24, freq='MS')

enrollment = 150 + np.cumsum(np.random.normal(0, 5, 24))
# Add effect of pilot period
enrollment[6:12] += np.linspace(0, 20, 6)  # Gradual increase during pilot
enrollment[12:] += 20  # Sustained after pilot

data = pd.DataFrame({
    'month': months,
    'enrollment': enrollment
})

plt.figure(figsize=(10, 5))

# Shade the pilot period
pilot_start = pd.Timestamp('2023-07-01')
pilot_end = pd.Timestamp('2023-12-31')

plt.axvspan(pilot_start, pilot_end, alpha=0.2, color='#EE7733', label='Pilot period')

plt.plot(data['month'], data['enrollment'], 
         color='#0077BB', linewidth=2)

plt.xlabel('Month', fontsize=11)
plt.ylabel('Program Enrollment', fontsize=11)
plt.title('Monthly Enrollment with Pilot Period Highlighted', fontsize=13)
plt.legend(frameon=False, loc='upper left')

plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

The shaded region immediately draws attention to the pilot period, making it easy to compare before, during, and after.

Dealing with Scale Differences

When comparing series with very different scales (e.g., caseload in hundreds vs. completion rate in percentages), you have several options.

Option 1: Dual y-axes. Use twinx() to create a second y-axis. This works but can be confusing if not done carefully.

Python

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=12, freq='MS')

data = pd.DataFrame({
    'month': months,
    'caseload': 200 + np.cumsum(np.random.normal(5, 10, 12)),
    'completion_rate': 0.65 + np.cumsum(np.random.normal(0.01, 0.02, 12))
})

fig, ax1 = plt.subplots(figsize=(10, 5))

# First y-axis: caseload
color1 = '#0077BB'
ax1.set_xlabel('Month', fontsize=11)
ax1.set_ylabel('Active Caseload', color=color1, fontsize=11)
ax1.plot(data['month'], data['caseload'], color=color1, linewidth=2, label='Caseload')
ax1.tick_params(axis='y', labelcolor=color1)

# Second y-axis: completion rate
ax2 = ax1.twinx()
color2 = '#EE7733'
ax2.set_ylabel('Completion Rate', color=color2, fontsize=11)
ax2.plot(data['month'], data['completion_rate'], color=color2, linewidth=2, linestyle='--', label='Completion Rate')
ax2.tick_params(axis='y', labelcolor=color2)

plt.title('Caseload and Completion Rate (2023)', fontsize=13)

ax1.spines['top'].set_visible(False)
ax2.spines['top'].set_visible(False)

fig.tight_layout()
plt.show()

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=12, freq='MS')

data = pd.DataFrame({
    'month': months,
    'caseload': 200 + np.cumsum(np.random.normal(5, 10, 12)),
    'completion_rate': 0.65 + np.cumsum(np.random.normal(0.01, 0.02, 12))
})

fig, ax1 = plt.subplots(figsize=(10, 5))

# First y-axis: caseload
color1 = '#0077BB'
ax1.set_xlabel('Month', fontsize=11)
ax1.set_ylabel('Active Caseload', color=color1, fontsize=11)
ax1.plot(data['month'], data['caseload'], color=color1, linewidth=2, label='Caseload')
ax1.tick_params(axis='y', labelcolor=color1)

# Second y-axis: completion rate
ax2 = ax1.twinx()
color2 = '#EE7733'
ax2.set_ylabel('Completion Rate', color=color2, fontsize=11)
ax2.plot(data['month'], data['completion_rate'], color=color2, linewidth=2, linestyle='--', label='Completion Rate')
ax2.tick_params(axis='y', labelcolor=color2)

plt.title('Caseload and Completion Rate (2023)', fontsize=13)

ax1.spines['top'].set_visible(False)
ax2.spines['top'].set_visible(False)

fig.tight_layout()
plt.show()

Option 2: Separate panels. Use subplots to show each series in its own panel. This avoids confusion about which axis applies to which line.

Python

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=12, freq='MS')

data = pd.DataFrame({
    'month': months,
    'caseload': 200 + np.cumsum(np.random.normal(5, 10, 12)),
    'completion_rate': 0.65 + np.cumsum(np.random.normal(0.01, 0.02, 12))
})

fig, axes = plt.subplots(2, 1, figsize=(10, 7), sharex=True)

axes[0].plot(data['month'], data['caseload'], color='#0077BB', linewidth=2)
axes[0].set_ylabel('Active Caseload', fontsize=11)
axes[0].set_title('Caseload and Completion Rate (2023)', fontsize=13)
axes[0].spines['top'].set_visible(False)
axes[0].spines['right'].set_visible(False)

axes[1].plot(data['month'], data['completion_rate'], color='#EE7733', linewidth=2)
axes[1].set_ylabel('Completion Rate', fontsize=11)
axes[1].set_xlabel('Month', fontsize=11)
axes[1].spines['top'].set_visible(False)
axes[1].spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=12, freq='MS')

data = pd.DataFrame({
    'month': months,
    'caseload': 200 + np.cumsum(np.random.normal(5, 10, 12)),
    'completion_rate': 0.65 + np.cumsum(np.random.normal(0.01, 0.02, 12))
})

fig, axes = plt.subplots(2, 1, figsize=(10, 7), sharex=True)

axes[0].plot(data['month'], data['caseload'], color='#0077BB', linewidth=2)
axes[0].set_ylabel('Active Caseload', fontsize=11)
axes[0].set_title('Caseload and Completion Rate (2023)', fontsize=13)
axes[0].spines['top'].set_visible(False)
axes[0].spines['right'].set_visible(False)

axes[1].plot(data['month'], data['completion_rate'], color='#EE7733', linewidth=2)
axes[1].set_ylabel('Completion Rate', fontsize=11)
axes[1].set_xlabel('Month', fontsize=11)
axes[1].spines['top'].set_visible(False)
axes[1].spines['right'].set_visible(False)

plt.tight_layout()
plt.show()

Separate panels are generally clearer, especially when presenting to audiences who might not be familiar with dual-axis charts.

A Note on Causation

https://sketchplanations.com/correlation-is-not-causation

Annotating a policy change and showing that outcomes improved afterward does not prove the policy caused the improvement. Correlation is not causation, and this is especially true with time series data where many things change simultaneously.

When visualizing before/after comparisons, be honest about the limitations. Other factors might explain the change. The trend might have been happening anyway. Seasonal effects might be at play. Proper causal inference requires more rigorous methods (difference-in-differences, interrupted time series analysis with statistical controls, randomized experiments), but visualization is still valuable for exploration and communication.

Good practice: describe what the chart shows (“Wait times decreased after the new procedure was implemented”) rather than claiming causation (“The new procedure reduced wait times”).

Resources

Pandas datetime documentation: https://pandas.pydata.org/docs/user_guide/timeseries.html

Pandas resample documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html

Matplotlib annotations tutorial: https://matplotlib.org/stable/tutorials/text/annotations.html

Matplotlib axvline documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.axvline.html

Resample frequency aliases:

Alias	Meaning
D	Calendar day
W	Week (Sunday)
ME	Month end
MS	Month start
QE	Quarter end
YE	Year end
h	Hour
min	Minute

January 6, 2026

[Python] Visualizing Change Over Time: Temporal Data (Time Series Analysis)

Types of Temporal Data

Working with Datetime Data in Pandas

Basic Line Plots

Temporal Aggregation: Smoothing the Noise

Choosing the Right Aggregation

Comparing Multiple Time Series

Absolute vs. Percent Change

Annotating Events and Interventions

Multiple Annotations

Shaded Regions for Time Periods

Dealing with Scale Differences

A Note on Causation

Resources

Related Posts

Leave a ReplyCancel reply

Translate this page into:

Categories

[Python] Visualizing Change Over Time: Temporal Data (Time Series Analysis)

Types of Temporal Data

Working with Datetime Data in Pandas

Basic Line Plots

Temporal Aggregation: Smoothing the Noise

Choosing the Right Aggregation

Comparing Multiple Time Series

Absolute vs. Percent Change

Annotating Events and Interventions

Multiple Annotations

Shaded Regions for Time Periods

Dealing with Scale Differences

A Note on Causation

Resources

Share this:

Related Posts

Leave a ReplyCancel reply

Translate this page into:

Categories