[Python] Visualizing Change Over Time: Temporal Data (Time Series Analysis)
In previous posts, we covered visualizing continuous outcomes across groups (box plots, violin plots) and categorical outcomes (count plots, stacked bars, crosstabs). Many questions in social science research involve a third dimension: time. How have caseloads changed over the past year? Did the new intake procedure reduce wait times? What happened to service utilization after the policy change?
This post covers how to work with datetime data in pandas, visualize trends with line plots, aggregate data at different time intervals, and annotate your charts with key events like policy changes or program launches.
Types of Temporal Data
It helps to think about what kind of temporal pattern you’re looking for.
| Pattern | Definition | Social Work Examples |
|---|---|---|
| Trend | Long-term movement in a particular direction | Caseloads increasing over several years; average session lengths declining; completion rates improving steadily |
| Seasonality | Patterns that repeat at regular intervals | Crisis calls spiking during winter holidays; youth program enrollment dropping every summer; housing applications peaking at the start of each month |
| Event | One-time occurrence that causes a shift in the data | New policy takes effect; program launches; pandemic begins; funding gets cut |
| Noise | Random variation with no underlying pattern | Day-to-day fluctuations in service volume that look dramatic but reflect nothing systematic |
The challenge of time series visualization is showing the patterns you care about while not being misled by noise.
Working with Datetime Data in Pandas
Pandas has excellent support for datetime data, but your dates need to be in the right format. When you load data from a CSV, date columns often come in as strings. You need to convert them to datetime objects.
import pandas as pd
import numpy as np
# Create mock data with dates as strings
data = pd.DataFrame({
'date': ['2024-01-15', '2024-01-16', '2024-01-17', '2024-01-18', '2024-01-19'],
'clients_served': [45, 52, 48, 55, 41]
})
# Check the data type
print(data['date'].dtype) # Output: object (meaning string)
# Convert to datetime
data['date'] = pd.to_datetime(data['date'])
print(data['date'].dtype) # Output: datetime64[ns]
When reading a CSV, you can convert dates during import:
# Convert during import
data = pd.read_csv('service_data.csv', parse_dates=['date'])
Once you have datetime columns, you can extract components like year, month, day of week:
data['year'] = data['date'].dt.year
data['month'] = data['date'].dt.month
data['day_of_week'] = data['date'].dt.dayofweek # 0 = Monday, 6 = Sunday
data['week_of_year'] = data['date'].dt.isocalendar().week
Basic Line Plots
The line plot is the workhorse of time series visualization. It shows how a value changes over time, with time on the x-axis and the measure on the y-axis.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Create mock data: daily client volume over 3 months
np.random.seed(42)
dates = pd.date_range(start='2024-01-01', end='2024-03-31', freq='D')
base = 50 + np.arange(len(dates)) * 0.1 # Slight upward trend
noise = np.random.normal(0, 5, len(dates))
clients = base + noise
data = pd.DataFrame({
'date': dates,
'clients_served': clients.astype(int)
})
plt.figure(figsize=(10, 5))
plt.plot(data['date'], data['clients_served'], color='#0077BB', linewidth=1.5)
plt.xlabel('Date', fontsize=11)
plt.ylabel('Clients Served', fontsize=11)
plt.title('Daily Client Volume (Q1 2024)', fontsize=13)
# Clean up the axes
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.tight_layout()
plt.show()
The daily data shows a lot of noise. You can see the general upward trend, but day-to-day variation makes it hard to read.

Temporal Aggregation: Smoothing the Noise
One solution is to aggregate to a coarser time interval. Instead of daily data, show weekly or monthly totals or averages.
Pandas makes this easy with the resample() method. To use it, your datetime column must be the DataFrame index.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Create mock data
np.random.seed(42)
dates = pd.date_range(start='2024-01-01', end='2024-03-31', freq='D')
base = 50 + np.arange(len(dates)) * 0.1
noise = np.random.normal(0, 5, len(dates))
clients = base + noise
data = pd.DataFrame({
'date': dates,
'clients_served': clients.astype(int)
})
# Set date as index for resampling
data = data.set_index('date')
# Resample to weekly totals
weekly = data.resample('W').sum()
# Resample to weekly averages
weekly_avg = data.resample('W').mean()
plt.figure(figsize=(10, 5))
plt.plot(weekly.index, weekly['clients_served'],
color='#0077BB', linewidth=2, marker='o', markersize=4)
plt.xlabel('Week', fontsize=11)
plt.ylabel('Clients Served (Weekly Total)', fontsize=11)
plt.title('Weekly Client Volume (Q1 2024)', fontsize=13)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.tight_layout()
plt.show()
The weekly aggregation smooths out the day-to-day noise, making the trend easier to see.

Common resample frequencies include ‘D’ for daily, ‘W’ for weekly, ‘ME’ for month-end, ‘MS’ for month-start, ‘QE’ for quarter-end, and ‘YE’ for year-end. You can also use numbers: ‘2W’ for every two weeks, ‘3ME’ for every three months.
Choosing the Right Aggregation
The choice of aggregation level depends on your question and your audience.
- Daily data shows maximum detail but can be noisy. Use it when you need to identify specific days with unusual values, or when your audience needs precision.
- Weekly data smooths short-term noise while preserving monthly patterns. Good for operational dashboards and regular reporting.
- Monthly data reveals seasonal patterns and long-term trends. Hides within-month variation. Good for reports to leadership or funders.
- Quarterly or yearly data shows only the biggest patterns. Appropriate for strategic planning or historical analysis spanning many years.
There’s a tradeoff: aggregation reduces noise but also hides real variation. A monthly average might mask that one week was a crisis and three weeks were normal.
Comparing Multiple Time Series
Often you want to compare trends across groups: service utilization by program, caseloads by region, outcomes by cohort. Plot multiple lines on the same axes.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Create mock data: monthly caseloads for three programs
np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=24, freq='MS')
data = pd.DataFrame({
'month': months,
'Program A': 120 + np.cumsum(np.random.normal(2, 5, 24)),
'Program B': 80 + np.cumsum(np.random.normal(1, 4, 24)),
'Program C': 100 + np.cumsum(np.random.normal(0, 6, 24))
})
plt.figure(figsize=(10, 5))
plt.plot(data['month'], data['Program A'],
color='#0077BB', linewidth=2, label='Program A')
plt.plot(data['month'], data['Program B'],
color='#EE7733', linewidth=2, label='Program B')
plt.plot(data['month'], data['Program C'],
color='#009988', linewidth=2, label='Program C')
plt.xlabel('Month', fontsize=11)
plt.ylabel('Active Caseload', fontsize=11)
plt.title('Monthly Caseload by Program (2023-2024)', fontsize=13)
plt.legend(frameon=False)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.tight_layout()
plt.show()
When comparing multiple series, use colors that are distinguishable for colorblind viewers (like our blue/orange/teal palette) and keep the number of lines manageable. More than four or five lines on one chart becomes hard to read.

Absolute vs. Percent Change
Sometimes you care about absolute values: how many clients did we serve? Other times, you care about relative change: how much did utilization increase?
Percent change from a baseline can be more informative when comparing groups of different sizes. A program that went from 50 to 75 clients (50% increase) and one that went from 200 to 225 (12.5% increase) look similar in absolute terms but very different in relative terms.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Create mock data
np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=12, freq='MS')
data = pd.DataFrame({
'month': months,
'Small Program': 50 + np.cumsum(np.random.normal(2, 3, 12)),
'Large Program': 200 + np.cumsum(np.random.normal(2, 8, 12))
})
# Calculate percent change from first month
data['Small Program (% change)'] = (
(data['Small Program'] - data['Small Program'].iloc[0])
/ data['Small Program'].iloc[0] * 100
)
data['Large Program (% change)'] = (
(data['Large Program'] - data['Large Program'].iloc[0])
/ data['Large Program'].iloc[0] * 100
)
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
# Absolute values
axes[0].plot(data['month'], data['Small Program'],
color='#0077BB', linewidth=2, label='Small Program')
axes[0].plot(data['month'], data['Large Program'],
color='#EE7733', linewidth=2, label='Large Program')
axes[0].set_xlabel('Month', fontsize=11)
axes[0].set_ylabel('Clients', fontsize=11)
axes[0].set_title('Absolute Caseload', fontsize=12)
axes[0].legend(frameon=False)
axes[0].spines['top'].set_visible(False)
axes[0].spines['right'].set_visible(False)
# Percent change
axes[1].plot(data['month'], data['Small Program (% change)'],
color='#0077BB', linewidth=2, label='Small Program')
axes[1].plot(data['month'], data['Large Program (% change)'],
color='#EE7733', linewidth=2, label='Large Program')
axes[1].axhline(y=0, color='gray', linestyle='--', linewidth=0.8)
axes[1].set_xlabel('Month', fontsize=11)
axes[1].set_ylabel('% Change from January', fontsize=11)
axes[1].set_title('Percent Change from Baseline', fontsize=12)
axes[1].legend(frameon=False)
axes[1].spines['top'].set_visible(False)
axes[1].spines['right'].set_visible(False)
plt.tight_layout()
plt.show()
The left panel shows that Large Program serves many more clients. The right panel shows that Small Program is growing faster in relative terms.

Annotating Events and Interventions
One of the most important uses of time series visualization in social work is showing what happened before and after a key event: a policy change, a new program launch, a funding cut, or an external shock like the pandemic.
Use axvline() to draw a vertical line at a specific date, and text() or annotate() to add a label.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Create mock data: service wait times before and after a process change
np.random.seed(42)
dates = pd.date_range(start='2023-07-01', end='2024-06-30', freq='W')
# Wait times drop after the intervention on 2024-01-15
intervention_date = pd.Timestamp('2024-01-15')
wait_times = []
for d in dates:
if d < intervention_date:
wait_times.append(np.random.normal(21, 4)) # ~21 days before
else:
wait_times.append(np.random.normal(14, 3)) # ~14 days after
data = pd.DataFrame({
'week': dates,
'wait_days': wait_times
})
plt.figure(figsize=(10, 5))
plt.plot(data['week'], data['wait_days'],
color='#0077BB', linewidth=1.5)
# Add vertical line at intervention
plt.axvline(x=intervention_date, color='#CC3311', linestyle='--', linewidth=2)
# Add text annotation
plt.text(intervention_date + pd.Timedelta(days=7), 28,
'New intake\nprocedure\nimplemented',
fontsize=10, color='#CC3311', verticalalignment='top')
plt.xlabel('Week', fontsize=11)
plt.ylabel('Wait Time (Days)', fontsize=11)
plt.title('Average Wait Time for Initial Assessment', fontsize=13)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.tight_layout()
plt.show()
The vertical line draws the eye to the moment of change. Viewers can immediately see that wait times dropped after the new procedure was implemented.

Multiple Annotations
Sometimes you need to mark multiple events. Be careful not to clutter the chart.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Create mock data: crisis calls over 2 years
np.random.seed(42)
dates = pd.date_range(start='2022-01-01', end='2023-12-31', freq='W')
# Base level with some variation
calls = 100 + np.random.normal(0, 10, len(dates))
# Add effects of events
for i, d in enumerate(dates):
# Spike during pandemic surge (early 2022)
if pd.Timestamp('2022-01-01') <= d <= pd.Timestamp('2022-03-31'):
calls[i] += 30
# Drop after new crisis team starts (mid 2022)
if d >= pd.Timestamp('2022-07-01'):
calls[i] -= 15
# Spike during winter holidays
if d.month == 12:
calls[i] += 20
data = pd.DataFrame({
'week': dates,
'crisis_calls': calls
})
# Key events
events = [
(pd.Timestamp('2022-01-15'), 'Omicron\nsurge'),
(pd.Timestamp('2022-07-01'), 'Mobile crisis\nteam launches'),
]
plt.figure(figsize=(11, 5))
plt.plot(data['week'], data['crisis_calls'],
color='#0077BB', linewidth=1.5)
# Add vertical lines and labels for events
colors = ['#CC3311', '#009988']
y_positions = [155, 70]
for (date, label), color, y_pos in zip(events, colors, y_positions):
plt.axvline(x=date, color=color, linestyle='--', linewidth=1.5, alpha=0.8)
plt.text(date + pd.Timedelta(days=10), y_pos, label,
fontsize=9, color=color, verticalalignment='center')
plt.xlabel('Week', fontsize=11)
plt.ylabel('Crisis Line Calls', fontsize=11)
plt.title('Weekly Crisis Line Volume (2022-2023)', fontsize=13)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.tight_layout()
plt.show()

Shaded Regions for Time Periods
Sometimes an intervention isn’t a single moment but a period. Use axvspan() to shade a region.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
# Create mock data
np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=24, freq='MS')
enrollment = 150 + np.cumsum(np.random.normal(0, 5, 24))
# Add effect of pilot period
enrollment[6:12] += np.linspace(0, 20, 6) # Gradual increase during pilot
enrollment[12:] += 20 # Sustained after pilot
data = pd.DataFrame({
'month': months,
'enrollment': enrollment
})
plt.figure(figsize=(10, 5))
# Shade the pilot period
pilot_start = pd.Timestamp('2023-07-01')
pilot_end = pd.Timestamp('2023-12-31')
plt.axvspan(pilot_start, pilot_end, alpha=0.2, color='#EE7733', label='Pilot period')
plt.plot(data['month'], data['enrollment'],
color='#0077BB', linewidth=2)
plt.xlabel('Month', fontsize=11)
plt.ylabel('Program Enrollment', fontsize=11)
plt.title('Monthly Enrollment with Pilot Period Highlighted', fontsize=13)
plt.legend(frameon=False, loc='upper left')
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.tight_layout()
plt.show()
The shaded region immediately draws attention to the pilot period, making it easy to compare before, during, and after.

Dealing with Scale Differences
When comparing series with very different scales (e.g., caseload in hundreds vs. completion rate in percentages), you have several options.
Option 1: Dual y-axes. Use twinx() to create a second y-axis. This works but can be confusing if not done carefully.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=12, freq='MS')
data = pd.DataFrame({
'month': months,
'caseload': 200 + np.cumsum(np.random.normal(5, 10, 12)),
'completion_rate': 0.65 + np.cumsum(np.random.normal(0.01, 0.02, 12))
})
fig, ax1 = plt.subplots(figsize=(10, 5))
# First y-axis: caseload
color1 = '#0077BB'
ax1.set_xlabel('Month', fontsize=11)
ax1.set_ylabel('Active Caseload', color=color1, fontsize=11)
ax1.plot(data['month'], data['caseload'], color=color1, linewidth=2, label='Caseload')
ax1.tick_params(axis='y', labelcolor=color1)
# Second y-axis: completion rate
ax2 = ax1.twinx()
color2 = '#EE7733'
ax2.set_ylabel('Completion Rate', color=color2, fontsize=11)
ax2.plot(data['month'], data['completion_rate'], color=color2, linewidth=2, linestyle='--', label='Completion Rate')
ax2.tick_params(axis='y', labelcolor=color2)
plt.title('Caseload and Completion Rate (2023)', fontsize=13)
ax1.spines['top'].set_visible(False)
ax2.spines['top'].set_visible(False)
fig.tight_layout()
plt.show()

Option 2: Separate panels. Use subplots to show each series in its own panel. This avoids confusion about which axis applies to which line.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
np.random.seed(42)
months = pd.date_range(start='2023-01-01', periods=12, freq='MS')
data = pd.DataFrame({
'month': months,
'caseload': 200 + np.cumsum(np.random.normal(5, 10, 12)),
'completion_rate': 0.65 + np.cumsum(np.random.normal(0.01, 0.02, 12))
})
fig, axes = plt.subplots(2, 1, figsize=(10, 7), sharex=True)
axes[0].plot(data['month'], data['caseload'], color='#0077BB', linewidth=2)
axes[0].set_ylabel('Active Caseload', fontsize=11)
axes[0].set_title('Caseload and Completion Rate (2023)', fontsize=13)
axes[0].spines['top'].set_visible(False)
axes[0].spines['right'].set_visible(False)
axes[1].plot(data['month'], data['completion_rate'], color='#EE7733', linewidth=2)
axes[1].set_ylabel('Completion Rate', fontsize=11)
axes[1].set_xlabel('Month', fontsize=11)
axes[1].spines['top'].set_visible(False)
axes[1].spines['right'].set_visible(False)
plt.tight_layout()
plt.show()
Separate panels are generally clearer, especially when presenting to audiences who might not be familiar with dual-axis charts.

A Note on Causation

Annotating a policy change and showing that outcomes improved afterward does not prove the policy caused the improvement. Correlation is not causation, and this is especially true with time series data where many things change simultaneously.
When visualizing before/after comparisons, be honest about the limitations. Other factors might explain the change. The trend might have been happening anyway. Seasonal effects might be at play. Proper causal inference requires more rigorous methods (difference-in-differences, interrupted time series analysis with statistical controls, randomized experiments), but visualization is still valuable for exploration and communication.
Good practice: describe what the chart shows (“Wait times decreased after the new procedure was implemented”) rather than claiming causation (“The new procedure reduced wait times”).
Resources
Pandas datetime documentation: https://pandas.pydata.org/docs/user_guide/timeseries.html
Pandas resample documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html
Matplotlib annotations tutorial: https://matplotlib.org/stable/tutorials/text/annotations.html
Matplotlib axvline documentation: https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.axvline.html
Resample frequency aliases:
| Alias | Meaning |
|---|---|
| D | Calendar day |
| W | Week (Sunday) |
| ME | Month end |
| MS | Month start |
| QE | Quarter end |
| YE | Year end |
| h | Hour |
| min | Minute |
