Health/Mental Health Services Research Secondary Datasets in the U.S.

When working with U.S. health services datasets, one of the first decisions you’ll face is whether you need public use files or restricted use files. This distinction affects what you can study, how long it takes to access the data, and what administrative steps you’ll need to complete.

Public Use Files are freely downloadable datasets that have been modified to protect respondent confidentiality. These modifications typically include removing or coarsening geographic identifiers (no county or sometimes no state information), collapsing detailed demographic categories (age ranges instead of exact ages), and sometimes excluding certain cases that might be identifiable. You can download these files immediately from agency websites and start analysis the same day. For many research questions—particularly those examining national trends, broad demographic patterns, or state-level comparisons—public use files provide sufficient detail.

ICPSR 101: What is Restricted-use Data?

Restricted Use Files contain more detailed information that could potentially identify respondents if misused. They typically include finer geographic detail (county identifiers, Census tracts), continuous demographic variables (exact age rather than age ranges), and the complete sample without exclusions. Accessing these files requires submitting a research proposal, obtaining IRB approval, signing a data use agreement, and sometimes paying fees. The application review process can take 4-8 weeks depending on the agency.

The choice between public and restricted data depends on your research question. If you’re studying county-level variation in mental health service access, need precise age at first substance use, or are examining small subpopulations where demographic detail matters, you’ll likely need restricted files. If you’re analyzing national trends, comparing states, or studying broad patterns across large demographic groups, public files will often suffice.

SAMHSA Datasets

* For Mental Health & Substance Use Services

N-SUMHSS (National Substance Use and Mental Health Services Survey)

N-SUMHSS is an annual census of mental health and substance use treatment facilities across the U.S. that tells you what services each facility offers, what insurance they accept, and which populations they serve.

Link: https://www.samhsa.gov/data/data-we-collect/n-sumhss-national-substance-use-and-mental-health-services-survey

What it measures: Facility-level data on approximately 16,000 mental health and substance use treatment facilities. Variables include facility ownership (public/private/nonprofit), services offered (medication-assisted treatment, case management, family counseling), payment accepted (Medicaid, Medicare, private insurance, sliding fee), populations served (children, veterans, criminal justice), staffing patterns, and client counts on a reference date.

Public Use Files: Free download. Includes state identifiers and facility characteristics. Good for state-level comparisons of service availability, facility type distributions, or Medicaid acceptance rates.

Geographic limitations: County-level detail not available in public files to protect facility confidentiality.

Best for: Studying geographic distribution of treatment facilities, analyzing what types of services are available in different areas, examining facility characteristics like ownership type and payment acceptance, understanding which populations are served where.

NSDUH (National Survey on Drug Use and Health)

Annual survey of ~70,000 individuals on substance use and mental health.

Link: https://www.samhsa.gov/data/data-we-collect/nsduh-national-survey-drug-use-and-health

What it measures: Individual-level data on mental health status (major depressive episode, serious psychological distress, suicidal ideation), substance use patterns, treatment utilization, unmet need for care, perceived barriers to treatment, and insurance coverage.

Public Use vs. Restricted Use:

Public Use Files (free download):

  • No geographic identifiers (no state, county, region)
  • Collapsed age categories only (not continuous age)
  • Coarser income categories
  • Some respondents excluded for disclosure protection
  • Available at: https://www.samhsa.gov/data/data-we-collect/nsduh-national-survey-drug-use-and-health

Restricted Use Files (application required):

  • State identifiers included
  • Continuous age variables
  • Age at first use variables
  • Finer demographic detail
  • Full sample
  • Application process: Submit proposal to samhda-support@samhsa.hhs.gov
  • Requires IRB approval, data use agreement
  • 4-6 week review process
  • Free once approved

Small Area Estimates (SAE): NSDUH also produces state and substate model-based estimates available through interactive tools without needing restricted data access.

Best for: Studying unmet mental health need, treatment-seeking behavior, disparities in access by insurance status or race/ethnicity.

TEDS (Treatment Episode Data Set)

Admissions and discharges from substance use treatment facilities.

Link: https://www.samhsa.gov/data/data-we-collect/teds-treatment-episode-data-set

What it measures: Demographic characteristics, substance use patterns, treatment modality, referral source, employment status, living arrangements at admission and discharge.

Best for: Analyzing treatment pathways, studying how people enter treatment (criminal justice vs. self-referral), examining discharge outcomes.

MH-CLD (Mental Health Client-Level Data)

SAMHSA’s state-level reporting on people receiving mental health services.

Link: https://www.samhsa.gov/data/data-we-collect/mh-cld-mental-health-client-level-data

What it measures: Aggregate state-level counts (not individual microdata) of people served by state mental health agencies, including demographics, diagnoses, living situation, employment status, education, involvement with criminal justice.

Best for: State mental health system capacity, characteristics of people served by public mental health systems.


National Vital Statistics System (NVSS)

Birth and death certificate data covering all U.S. vital events.

Main website: https://www.cdc.gov/nchs/nvss/index.htm
Data access: https://www.cdc.gov/nchs/data_access/vitalstatsonline.htm

Natality (Birth) Data

Link: https://www.cdc.gov/nchs/nvss/births.htm

What it measures:

Maternal characteristics: Age (single year or grouped), Race and Hispanic origin (bridged-race pre-2018, single/multiple race 2018+), Education, Marital status, Nativity (U.S.-born vs. foreign-born in restricted files), Prenatal care initiation (trimester), Number of prenatal visits, WIC participation, Pre-pregnancy BMI, Pregnancy risk factors (diabetes, hypertension, previous preterm birth, previous cesarean)

Birth outcomes: Birth weight, Gestational age, Apgar scores (1-minute, 5-minute), Plurality (singleton, twins, triplets+), Method of delivery (vaginal, cesarean), Congenital anomalies, Infant transferred, NICU admission

Father characteristics: Age, race, education (when reported)

Geographic variables (in restricted files): State, County, City (for cities 100,000+), Metropolitan/non-metropolitan classification

Public Use Files (free download):

  • 2005-present: NO geographic identifiers (national level only)
  • No state, no county, no city
  • Month and day of week available (not exact date)
  • 1989-2004: Counties/cities with population 100,000+ included
  • Pre-1989: All counties and exact dates included

Restricted Use Files:

  • Link: https://www.cdc.gov/nchs/nvss/nvss-restricted-data.htm
  • Application required: Email nvssrestricteddata@cdc.gov with Project Review Form
  • State and county identifiers available
  • Month and year (not exact day)
  • Requires data use agreement
  • 4-6 week review process
  • Free once approved
  • Cannot access data from outside the U.S.
  • Cannot link to other datasets that might re-identify individuals
  • Annual updates as new data years become available

Best for: Studying prenatal care access, birth outcomes by maternal characteristics, preterm birth disparities, geographic variation in cesarean rates.


Mortality (Death) Data

Link: https://www.cdc.gov/nchs/nvss/deaths.htm

What it measures:

Decedent characteristics: Age at death (single year, grouped, or infant age in days/months), Sex, Race and Hispanic origin, Marital status, Education, Occupation and industry (on some records), Place of death (hospital, nursing home, home, hospice, ER, other)

Cause of death: Underlying cause of death (ICD-10 codes), Multiple causes of death (up to 20 contributing causes), Manner of death (natural, accident, suicide, homicide, pending, could not be determined), Injury-related variables (place of injury, mechanism)

Geographic variables (restricted files): State of residence, County of residence, Metropolitan/non-metropolitan status

Public Use Files:

  • 2005-present: No geographic identifiers
  • 1989-2004: Counties/cities 100,000+ included
  • Pre-1989: All counties included
  • Multiple cause-of-death files include all conditions mentioned on death certificate

Restricted Use Files:

  • Same application process as natality data
  • County identifiers available
  • Can request compressed files (less detail) or detailed multiple-cause files

Interactive tools:

  • CDC WONDER: https://wonder.cdc.gov/
    • Online query system for mortality data
    • State and county level data available
    • Can build custom tables
    • Suppresses cells <10 for confidentiality
    • No download of microdata but can export tables

Best for: Mortality disparities, leading causes of death by geography, suicide/overdose patterns, maternal mortality, infant mortality linked to birth records.


Linked Birth/Infant Death Files

What it measures: Links birth certificate data to death certificates for infants who die before age 1. Allows analysis of maternal and birth characteristics associated with infant mortality.

Variables: All natality variables plus age at death, cause of death, interval between birth and death.

Best for: Infant mortality research, studying how prenatal care or birth weight affects survival.


Medicare & Medicaid Data

Medicare Claims (CMS Research Identifiable Files)

Link: https://resdac.org/ (Research Data Assistance Center)

What it measures: Individual-level claims data for all fee-for-service Medicare beneficiaries. Includes diagnoses (ICD codes), procedures (CPT, HCPCS codes), provider information, service dates, costs, enrollment information.

File types:

  • Inpatient claims
  • Outpatient claims
  • Carrier (physician) claims
  • Part D prescription drug events
  • Master Beneficiary Summary File (demographics, enrollment, chronic conditions)

Access: Requires formal data use agreement with CMS, IRB approval, fees ($3,000-$25,000+ depending on file and years requested). Processing takes 3-6 months.

Best for: Medicare service utilization, treatment patterns, spending analysis, chronic disease management.


Medicaid (T-MSIS)

Link: https://www.medicaid.gov/medicaid/data-systems/macbis/tmsis/index.html

What it measures: Medicaid and CHIP enrollment and claims data. State participation varies. Includes demographics, eligibility, service utilization, diagnoses, prescriptions.

Access: Requires DUA with CMS. More complex than Medicare due to state variation in reporting.

Best for: Low-income population health, Medicaid expansion studies, children’s health services.

MEPS (Medical Expenditure Panel Survey)

Individual-level data on healthcare costs, utilization, and insurance coverage.

Link: https://www.meps.ahrq.gov/

What it measures:

Mental health variables:

  • Mental health service utilization (psychotherapy visits, psychiatrist visits, counseling)
  • Psychotropic medication use (antidepressants, antipsychotics, anxiolytics)
  • Mental health expenditures (total costs, out-of-pocket costs, insurance payments)
  • Mental health conditions (depression, anxiety, bipolar disorder – by ICD codes)
  • Perceived mental health status (excellent/very good/good/fair/poor)
  • Psychological distress (K6 scale in some years)
  • Barriers to mental health care
  • Insurance coverage for mental health services

Other health variables:

  • All medical conditions (ICD codes)
  • All medical service use (ER, inpatient, outpatient, office visits, home health, prescription drugs)
  • Healthcare expenditures by service type and payer
  • Insurance coverage details
  • Access to care, usual source of care
  • Demographics, employment, income

Longitudinal design: Each person interviewed 5 times over 2 years, allowing you to track changes in health status, insurance, employment, and service use.

Public Use Files: Free download at https://www.meps.ahrq.gov/mepsweb/data_stats/download_data_files.jsp

  • No geographic identifiers (national level only)
  • SAS, Stata, ASCII formats available
  • Complex survey weights provided
  • Can link household, person, medical conditions, and event files

Interactive tools: MEPSnet Query Tools for creating tables without downloading microdata

Best for: Studying healthcare costs and affordability, insurance coverage effects on mental health care access, treatment patterns, disparities in mental health service use, medication adherence, provider types.

NHIS (National Health Interview Survey)

CDC’s principal source of information on the health of the U.S. civilian noninstitutionalized population.

Link: https://www.cdc.gov/nchs/nhis/index.htm

What it measures (mental health content):

Depression and anxiety screening (added 2019): PHQ-8 (Patient Health Questionnaire) for depression symptoms, GAD-7 (Generalized Anxiety Disorder scale) for anxiety symptoms

Mental health service use: Saw/talked to mental health professional in past year, Took medication for mental health, Received counseling or therapy, Delayed or didn’t get mental health care due to cost

Mental health status: Self-rated mental health (excellent/very good/good/fair/poor), Psychological distress (K6 scale in some years)

Other health content: Chronic conditions, Health behaviors (smoking, alcohol, physical activity), Insurance coverage, Healthcare access and utilization, Preventive services, Functional limitations and disability

Sample: ~30,000 adults and ~8,000 children annually

Public Use Files: Free download at https://www.cdc.gov/nchs/nhis/data-questionnaires-documentation.htm

  • No geographic identifiers (no state, county, region)
  • Complex sample design with weights
  • Can be linked to National Death Index for mortality follow-up

Interactive tools: NHIS Data Query System for pre-tabulated estimates

Best for: National prevalence estimates of depression/anxiety, mental health treatment use, unmet mental health needs, health insurance coverage effects.


BRFSS (Behavioral Risk Factor Surveillance System)

State-based telephone survey on health behaviors and chronic conditions.

Link: https://www.cdc.gov/brfss/index.html
Data tools: https://www.cdc.gov/brfss/data_tools.htm

What it measures (400,000+ adults annually):

  • Chronic conditions (asthma, diabetes, heart disease, arthritis, depression)
  • Health behaviors (smoking, alcohol use, physical activity, diet)
  • Preventive services (cancer screening, flu shots, checkups)
  • Health care access (insurance coverage, personal doctor, cost barriers)
  • Health status (self-rated health, quality of life, disability)

Core vs. Optional modules:

  • Core questions: Asked by all states
  • Optional modules: States choose which to include
  • State-added questions: Custom items per state

Geographic detail:

  • All 50 states, DC, territories
  • Some states oversample to produce county-level estimates
  • SMART BRFSS: Metropolitan/Micropolitan Area estimates for areas with 500+ respondents

Public Use Files: Free download. Includes state identifiers. Can run your own analyses with survey weights.

Interactive tools:

  • BRFSS Prevalence & Trends Data: https://www.cdc.gov/brfss/brfssprevalence/index.html
  • Pre-calculated state and metro area estimates
  • Can create custom tables

Best for: State-level policy comparisons, tracking chronic disease prevalence over time, health behavior surveillance, examining state variation in insurance coverage or preventive care.

Limitations: Telephone survey (declining response rates ~45%), self-reported data (no clinical measurements), state methodologies vary slightly.


Area Health Resources Files (AHRF)

County-level data on health workforce, facilities, and population.

Link: https://data.hrsa.gov/topics/health-workforce/nchwa/ahrf

What it measures (6,000+ variables per county):

Health workforce: Counts of physicians by specialty, Dentists, dental hygienists, Nurses (RN, LPN), Pharmacists, optometrists, podiatrists, Mental health providers (psychiatrists, psychologists, social workers)

Health facilities: Hospitals (size, type, services, ownership), Nursing homes, Community health centers, Rural health clinics, Mental health facilities, Substance abuse treatment facilities

Population characteristics: Demographics (age, race, ethnicity), Poverty rates, Education levels, Unemployment, Medicare/Medicaid enrollment

Health status indicators: Mortality rates, Cancer incidence, Preventable hospital stays

Geographic measures: Urban/rural classifications, HPSA (Health Professional Shortage Area) designations, Medically Underserved Areas

How to access: Free download in SAS, Stata, CSV formats. Updated annually. No application required.

Interactive dashboard: Visualize workforce and facility data by county/state.

Best for: Studying provider supply and distribution, health workforce shortages, correlating resources with health outcomes, county-level policy analysis.


Regional/Geographic Variation Datasets

County Health Rankings

  • Link: https://www.countyhealthrankings.org/
  • Annual rankings of health outcomes and factors for all U.S. counties
  • Combines data from multiple sources
  • Free downloads and interactive maps
  • Good for community health needs assessments

CDC WONDER (Wide-ranging Online Data for Epidemiologic Research)

  • Link: https://wonder.cdc.gov/
  • Multiple datasets: mortality, natality, cancer, TB, vaccinations
  • Query system for county and state-level data
  • No application needed
  • Suppression rules for small cells

SEER (Surveillance, Epidemiology, and End Results)

  • Link: https://seer.cancer.gov/
  • Cancer incidence and survival data
  • Geographic coverage varies (covers ~50% of U.S. population)
  • Includes state cancer registries
  • Research access requires application
  • January 19, 2026