[Python] Comparing Groups: Visualizing Distributions for categorical variables (matplotlib/seaborn)
In the previous post, we covered visualizing distributions of continuous outcomes across groups: box plots, violin plots, and strip plots for comparing measures like PHQ-9 scores or service hours between different client populations. But many outcomes in social science research are not continuous.
Let’s think about some examples. Did the client complete treatment? What was their discharge status? Which service pathway did they follow? These are categorical outcomes, and visualizing them requires different tools.
This post covers how to visualize the distribution of categorical variables, compare categorical outcomes across groups, and examine associations between two categorical variables. We’ll use seaborn’s countplot(), grouped and stacked bar charts, crosstabs, and heatmaps.
Understanding Categorical Variables
Categorical outcomes (also called nominal or qualitative outcomes) are variables where values fall into distinct categories rather than a numerical scale. In social work data, you’ll encounter these constantly.
- Binary outcomes have exactly two categories: completed/not completed, eligible/ineligible, housed/unhoused, screened positive/negative.
- Nominal outcomes have multiple unordered categories: discharge status (completed, dropped out, transferred, referred elsewhere), service type (case management, counseling, crisis intervention), referral source (self, family, court, hospital).
- Ordinal outcomes have multiple ordered categories: satisfaction rating (very dissatisfied, dissatisfied, neutral, satisfied, very satisfied), severity level (mild, moderate, severe). Ordinal data sits between categorical and continuous; for visualization purposes, we often treat it as categorical.
The goal with categorical outcomes is usually to show how many observations fall into each category, what proportion of a group has a particular outcome, or whether there’s an association between two categorical variables.
Counting Categories with countplot()
The most basic task is counting how many observations fall into each category. Seaborn’s countplot() does this automatically.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# Create mock data: client discharge status
np.random.seed(42)
n = 200
data = pd.DataFrame({
'discharge_status': np.random.choice(
['Completed', 'Dropped Out', 'Transferred', 'Referred'],
size=n,
p=[0.45, 0.30, 0.15, 0.10]
)
})
plt.figure(figsize=(8, 5))
sns.countplot(data=data, x='discharge_status',
order=['Completed', 'Dropped Out', 'Transferred', 'Referred'],
palette=['#0077BB', '#EE7733', '#009988', '#CC3311'])
plt.xlabel('Discharge Status', fontsize=11)
plt.ylabel('Number of Clients', fontsize=11)
plt.title('Client Discharge Status', fontsize=13)
plt.tight_layout()
plt.show()

The order parameter controls the sequence of bars. Without it, seaborn will order categories alphabetically or by first appearance in the data. For discharge status, a meaningful order (completed first, then less desirable outcomes) helps viewers interpret the results.
Comparing Counts Across Groups with Hue
When you want to compare categorical outcomes across different groups, add a hue parameter to split each category by another variable.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# Create mock data: discharge status by program type
np.random.seed(42)
n = 300
data = pd.DataFrame({
'program': np.repeat(['Intensive', 'Standard'], n // 2),
'discharge_status': np.concatenate([
np.random.choice(['Completed', 'Dropped Out', 'Transferred'],
size=n // 2, p=[0.55, 0.30, 0.15]),
np.random.choice(['Completed', 'Dropped Out', 'Transferred'],
size=n // 2, p=[0.40, 0.40, 0.20])
])
})
plt.figure(figsize=(9, 5))
sns.countplot(data=data, x='discharge_status', hue='program',
order=['Completed', 'Dropped Out', 'Transferred'],
palette=['#0077BB', '#EE7733'])
plt.xlabel('Discharge Status', fontsize=11)
plt.ylabel('Number of Clients', fontsize=11)
plt.title('Discharge Status by Program Type', fontsize=13)
plt.legend(title='Program')
plt.tight_layout()
plt.show()

This grouped bar chart shows raw counts: how many clients in each program had each discharge status. But raw counts can be misleading when group sizes differ. If the Intensive program served 200 clients and the Standard program served 100, we’d expect higher counts across all categories for Intensive, even if the proportions were identical.
Proportions Over Counts: Normalized Bar Charts
Often the question isn’t “how many” but “what proportion.” What percentage of clients in each program completed treatment? This requires calculating proportions before plotting.
Seaborn doesn’t have a built-in normalization option for countplot, so we need to calculate proportions ourselves using pandas.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# Create mock data with different group sizes
np.random.seed(42)
intensive = pd.DataFrame({
'program': 'Intensive',
'discharge_status': np.random.choice(
['Completed', 'Dropped Out', 'Transferred'],
size=180, p=[0.55, 0.30, 0.15])
})
standard = pd.DataFrame({
'program': 'Standard',
'discharge_status': np.random.choice(
['Completed', 'Dropped Out', 'Transferred'],
size=120, p=[0.40, 0.40, 0.20])
})
data = pd.concat([intensive, standard], ignore_index=True)
# Calculate proportions within each program
proportions = (data.groupby(['program', 'discharge_status'])
.size()
.reset_index(name='count'))
# Add total per program and calculate proportion
totals = proportions.groupby('program')['count'].transform('sum')
proportions['proportion'] = proportions['count'] / totals
plt.figure(figsize=(9, 5))
sns.barplot(data=proportions, x='discharge_status', y='proportion', hue='program',
order=['Completed', 'Dropped Out', 'Transferred'],
palette=['#0077BB', '#EE7733'])
plt.xlabel('Discharge Status', fontsize=11)
plt.ylabel('Proportion of Clients', fontsize=11)
plt.title('Discharge Status by Program Type (Proportions)', fontsize=13)
plt.legend(title='Program')
plt.ylim(0, 0.7)
plt.tight_layout()
plt.show()
Proportions make the comparison fair regardless of group size. We can now see that a higher proportion of Intensive program clients completed treatment compared to Standard program clients.
Stacked Bar Charts: Part-to-Whole Comparisons
Stacked bar charts show how categories compose the whole for each group. Each bar represents one group (like a program), and the segments within the bar represent the categories (like discharge status).
Seaborn doesn’t have a native stacked bar chart function, but we can create one using pandas plotting with seaborn styling.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# Create mock data
np.random.seed(42)
data = pd.DataFrame({
'program': np.repeat(['Intensive', 'Standard', 'Brief'], 100),
'discharge_status': np.concatenate([
np.random.choice(['Completed', 'Dropped Out', 'Transferred'],
size=100, p=[0.55, 0.30, 0.15]),
np.random.choice(['Completed', 'Dropped Out', 'Transferred'],
size=100, p=[0.40, 0.40, 0.20]),
np.random.choice(['Completed', 'Dropped Out', 'Transferred'],
size=100, p=[0.50, 0.35, 0.15])
])
})
# Create crosstab and normalize by row (program)
ct = pd.crosstab(data['program'], data['discharge_status'], normalize='index')
ct = ct[['Completed', 'Dropped Out', 'Transferred']] # Reorder columns
# Set seaborn style
sns.set_style('whitegrid')
# Plot stacked bar chart
ax = ct.plot(kind='bar', stacked=True,
color=['#0077BB', '#EE7733', '#009988'],
figsize=(8, 5),
edgecolor='white',
linewidth=1)
plt.xlabel('Program Type', fontsize=11)
plt.ylabel('Proportion of Clients', fontsize=11)
plt.title('Discharge Status Composition by Program', fontsize=13)
plt.legend(title='Discharge Status', bbox_to_anchor=(1.02, 1), loc='upper left')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()

Stacked bar charts make it easy to compare the overall composition across groups. The Intensive program bar shows more “Completed” (blue) than the Standard program bar. But comparing the middle segments (Dropped Out) is harder because they don’t share a common baseline. This is a known limitation of stacked bar charts.
100% Stacked Bar Charts
A variation is the 100% stacked bar chart, where all bars are the same height and show proportions. We already did this above by using normalize='index' in the crosstab. If you want counts instead, remove that parameter.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# Create mock data with different group sizes
np.random.seed(42)
data = pd.DataFrame({
'program': np.concatenate([
np.repeat('Intensive', 180),
np.repeat('Standard', 120),
np.repeat('Brief', 60)
]),
'discharge_status': np.concatenate([
np.random.choice(['Completed', 'Dropped Out', 'Transferred'],
size=180, p=[0.55, 0.30, 0.15]),
np.random.choice(['Completed', 'Dropped Out', 'Transferred'],
size=120, p=[0.40, 0.40, 0.20]),
np.random.choice(['Completed', 'Dropped Out', 'Transferred'],
size=60, p=[0.50, 0.35, 0.15])
])
})
# Crosstab with raw counts (not normalized)
ct_counts = pd.crosstab(data['program'], data['discharge_status'])
ct_counts = ct_counts[['Completed', 'Dropped Out', 'Transferred']]
sns.set_style('whitegrid')
ax = ct_counts.plot(kind='bar', stacked=True,
color=['#0077BB', '#EE7733', '#009988'],
figsize=(8, 5),
edgecolor='white',
linewidth=1)
plt.xlabel('Program Type', fontsize=11)
plt.ylabel('Number of Clients', fontsize=11)
plt.title('Discharge Status by Program (Counts)', fontsize=13)
plt.legend(title='Discharge Status', bbox_to_anchor=(1.02, 1), loc='upper left')
plt.xticks(rotation=0)
plt.tight_layout()
plt.show()
With counts, bar heights reflect group sizes. The Intensive bar is taller than Brief because more clients were served. This can be informative when you want to show both composition and volume.
Crosstabs: The Foundation for Categorical Analysis
Before visualizing, it’s useful to create a crosstab (cross-tabulation), which is simply a table showing the frequency of each combination of two categorical variables.
import pandas as pd
import numpy as np
# Create mock data
np.random.seed(42)
data = pd.DataFrame({
'referral_source': np.random.choice(
['Self', 'Family', 'Court', 'Hospital'],
size=400, p=[0.30, 0.25, 0.25, 0.20]),
'completed': np.random.choice(
['Yes', 'No'],
size=400, p=[0.50, 0.50])
})
# Basic crosstab: counts
ct = pd.crosstab(data['referral_source'], data['completed'])
print("Counts:")
print(ct)
print()
# Crosstab with row proportions
ct_row = pd.crosstab(data['referral_source'], data['completed'], normalize='index')
print("Row proportions (what % of each referral source completed):")
print(ct_row.round(3))
print()
# Crosstab with column proportions
ct_col = pd.crosstab(data['referral_source'], data['completed'], normalize='columns')
print("Column proportions (what % of completers came from each source):")
print(ct_col.round(3))

Row proportions answer: “Of clients referred from court, what proportion completed treatment?” Column proportions answer: “Of clients who completed treatment, what proportion were court-referred?”
These are different questions, and the choice depends on what you’re trying to understand.
Heatmaps for Crosstabs
When you have many categories, a heatmap can be more readable than a bar chart. The color intensity shows the magnitude of each cell.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# Create mock data: service type by region
np.random.seed(42)
data = pd.DataFrame({
'region': np.random.choice(
['North', 'South', 'East', 'West', 'Central'],
size=500),
'service_type': np.random.choice(
['Case Management', 'Counseling', 'Crisis Intervention',
'Housing Support', 'Employment Services'],
size=500)
})
# Create crosstab
ct = pd.crosstab(data['service_type'], data['region'])
plt.figure(figsize=(9, 6))
sns.heatmap(ct, annot=True, fmt='d', cmap='Blues',
linewidths=0.5, linecolor='white')
plt.xlabel('Region', fontsize=11)
plt.ylabel('Service Type', fontsize=11)
plt.title('Service Utilization by Region', fontsize=13)
plt.tight_layout()
plt.show()
The annot=True parameter displays the count in each cell. fmt='d' formats these as integers. The color gradient helps viewers quickly spot which combinations have high or low counts.

For proportions, use a normalized crosstab:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(42)
data = pd.DataFrame({
'region': np.random.choice(
['North', 'South', 'East', 'West', 'Central'],
size=500),
'service_type': np.random.choice(
['Case Management', 'Counseling', 'Crisis Intervention',
'Housing Support', 'Employment Services'],
size=500)
})
# Normalized by column: within each region, what's the service distribution?
ct_norm = pd.crosstab(data['service_type'], data['region'], normalize='columns')
plt.figure(figsize=(9, 6))
sns.heatmap(ct_norm, annot=True, fmt='.1%', cmap='Blues',
linewidths=0.5, linecolor='white')
plt.xlabel('Region', fontsize=11)
plt.ylabel('Service Type', fontsize=11)
plt.title('Service Distribution Within Each Region', fontsize=13)
plt.tight_layout()
plt.show()
The fmt='.1%' formats values as percentages with one decimal place.

Testing for Association: Chi-Square Test
When examining two categorical variables, a natural question is whether they’re associated. Does discharge status differ by program type, or are the differences we see just random variation?
The chi-square test of independence answers this question. It compares observed frequencies to what we’d expect if the two variables were independent.
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency
# Create mock data
np.random.seed(42)
data = pd.DataFrame({
'program': np.repeat(['Intensive', 'Standard'], 150),
'completed': np.concatenate([
np.random.choice(['Yes', 'No'], size=150, p=[0.60, 0.40]),
np.random.choice(['Yes', 'No'], size=150, p=[0.45, 0.55])
])
})
# Create contingency table
ct = pd.crosstab(data['program'], data['completed'])
print("Contingency Table:")
print(ct)
print()
# Perform chi-square test
chi2, p_value, dof, expected = chi2_contingency(ct)
print(f"Chi-square statistic: {chi2:.3f}")
print(f"Degrees of freedom: {dof}")
print(f"P-value: {p_value:.4f}")
print()
print("Expected frequencies (if independent):")
print(pd.DataFrame(expected,
index=ct.index,
columns=ct.columns).round(1))
A small p-value (typically < 0.05) suggests the variables are associated. The expected frequencies show what we’d see if program type had no relationship with completion.

Note that chi-square tells you whether an association exists, not how strong it is or what direction it takes. For that, you need to look at the actual proportions or use measures like Cramér’s V.
Visualizing Expected vs. Observed
You can visualize the difference between observed and expected frequencies to see where the association is concentrated:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency
np.random.seed(42)
data = pd.DataFrame({
'program': np.repeat(['Intensive', 'Standard'], 150),
'completed': np.concatenate([
np.random.choice(['Yes', 'No'], size=150, p=[0.60, 0.40]),
np.random.choice(['Yes', 'No'], size=150, p=[0.45, 0.55])
])
})
ct = pd.crosstab(data['program'], data['completed'])
chi2, p_value, dof, expected = chi2_contingency(ct)
# Calculate residuals (observed - expected)
expected_df = pd.DataFrame(expected, index=ct.index, columns=ct.columns)
residuals = ct - expected_df
plt.figure(figsize=(7, 4))
sns.heatmap(residuals, annot=True, fmt='.1f', cmap='RdBu_r',
center=0, linewidths=0.5, linecolor='white')
plt.xlabel('Completed Treatment', fontsize=11)
plt.ylabel('Program Type', fontsize=11)
plt.title('Observed - Expected Frequencies\n(Positive = more than expected)', fontsize=12)
plt.tight_layout()
plt.show()
The diverging color palette (RdBu_r) centered at zero shows positive residuals in red (more than expected) and negative in blue (fewer than expected). This reveals where the association comes from: Intensive has more completions than expected, Standard has fewer.

A Note on Sample Size and Statistical Significance
Statistical significance depends heavily on sample size. With a large enough sample, even tiny, practically meaningless differences become “statistically significant.” Conversely, with small samples, real and meaningful differences might not reach significance.
When interpreting chi-square tests or any statistical test with categorical data, always look at the actual proportions. A statistically significant association between program type and completion is only meaningful if the difference in completion rates (say, 60% vs. 45%) matters for your clients and your organization.
Choosing the Right Visualization
Here’s a guide for selecting the appropriate visualization for your categorical data:
| Question | Visualization |
|---|---|
| How many observations in each category? | countplot or bar chart |
| How do category counts compare across groups? | Grouped bar chart (countplot with hue) |
| What proportion of each group falls into each category? | Normalized grouped bar chart or 100% stacked bar |
| How does composition differ across groups? | Stacked bar chart |
| How are two categorical variables related? (many categories) | Heatmap of crosstab |
| Where is the association concentrated? | Heatmap of residuals |
Resources
Seaborn categorical tutorial: https://seaborn.pydata.org/tutorial/categorical.html
Seaborn countplot documentation: https://seaborn.pydata.org/generated/seaborn.countplot.html
Pandas crosstab documentation: https://pandas.pydata.org/docs/reference/api/pandas.crosstab.html
Scipy chi-square test: https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chi2_contingency.html
Statsmodels mosaic plot (for advanced categorical visualization): https://www.statsmodels.org/stable/generated/statsmodels.graphics.mosaicplot.mosaic.html

1 Response
[…] posts, we covered visualizing continuous outcomes across groups (box plots, violin plots) and categorical outcomes (count plots, stacked bars, crosstabs). Many questions in social science research involve a third […]