Skip to main content

This research employs two complementary methodological approaches:

  1. Matched-pair testing — extending Bertrand & Mullainathan's seminal methodology to AI systems
  2. Structural analysis — proprietary methods detecting bias in narrative structure (methodology protected)

The first approach is fully documented here. The second produces the findings shown in our evidence section, with methodology details forthcoming in academic publication.

I. Matched-Pair Testing Design

Identical inputs, different names. Any difference in output is attributable to the name.

Experimental Structure

Name Pairs 54 pairs across 6 demographic contrast categories
Symptom Profiles 20 standardized clinical presentations
Total Comparisons 1,080 matched-pair tests
Control Variables Prompt template, model version, temperature, system prompt
Independent Variable Name only

Name Pair Categories

Name-demographic associations validated through Bertrand & Mullainathan (2004) and Fryer & Levitt (2004).

  • Anglo vs. African American name signals
  • Anglo vs. Hispanic/Latino name signals
  • Anglo vs. Asian name signals
  • Male vs. Female name signals
  • Professional title (Dr.) vs. No title
  • Socioeconomic indicator variations

Symptom Profile Categories

20 standardized presentations across clinical domains:

  • Cardiovascular (chest pain, palpitations, shortness of breath)
  • Pain management (chronic pain, acute pain, medication questions)
  • Psychiatric (depression, anxiety, psychosis presentations)
  • Emergency (acute abdomen, mental health crisis)
  • General (fatigue, neurological symptoms)

II. Vocabulary-Level Analysis

Content analysis of AI-generated responses for explicit recommendation differences.

Analysis Methods

  • Keyword extraction — automated identification of treatment recommendations, urgency markers, referral language
  • Treatment coding — classification of recommended treatments, medications, and care pathways
  • Urgency scoring — quantification of urgency language (immediate, soon, routine, etc.)
  • Sentiment analysis — tone and framing of responses

Coding Reliability

Inter-rater reliability established through:

  • Independent dual coding of 20% sample
  • Cohen's kappa > 0.85 for all coding categories
  • Discrepancies resolved through consensus discussion

III. Structural Analysis

Beyond what AI says — how it says it.

Methodology Protected

Structural analysis methodology is proprietary. Academic publication forthcoming. Research partnerships available for qualified investigators.

What We Can Share

We developed methods to detect bias in narrative structure—the mathematical patterns of how AI-generated text unfolds differently based on name signals.

  • Analysis operates on 12 structural dimensions
  • Detects patterns invisible to vocabulary analysis
  • Produces quantitative scores comparable across conditions
  • Generates behavioral signatures characterizing response patterns

The 12 Dimensions (Names Only)

Structural analysis examines:

  1. Semantic Distance — trajectory through semantic space
  2. Power Dynamics — agency asymmetry between entities
  3. Entropy — information-theoretic disorder patterns
  4. Tension/Release — buildup and release cycles
  5. Boundary Crossing — formal/personal transitions
  6. Proximity Dynamics — linguistic intimacy patterns
  7. Narrative Velocity — pacing and acceleration
  8. Symmetry Breaking — balance shifts over time
  9. Resonance — rhythmic synchronization patterns
  10. Phase Transitions — state changes in linguistic patterns
  11. Information Density — compression patterns
  12. Temporal Displacement — tense and time orientation shifts

What Is Not Available

Mathematical formulas, detector code, weighting schemes, pattern classification.

Available through research partnership.

IV. Statistical Framework

Standard methods for robust inference.

Effect Size Estimation

All findings reported as Cohen's d with interpretation:

Cohen's d Interpretation
0.2 Small effect
0.5 Medium effect
0.8 Large effect
>1.0 Very large effect

Multiple Comparison Correction

With 12 structural dimensions and multiple demographic comparisons, we apply:

  • Bonferroni correction — adjusting significance threshold by number of comparisons
  • False Discovery Rate control — Benjamini-Hochberg procedure
  • All reported findings remain significant after correction

Confidence Intervals

95% confidence intervals estimated through:

  • Bootstrap resampling — 10,000 iterations
  • Bias-corrected and accelerated (BCa) intervals
  • All reported CIs exclude zero (confirming significance)

V. Pre-Registration

This study's methodology and analysis plan were specified before data collection began.

Pre-Registration Details

Protocol Version 1.0
Date January 2026
Contents Hypotheses, methodology, analysis plan, stopping rules
Deviations None

Pre-registration prevents post-hoc adjustment of methods to achieve desired results. All analyses reported were planned before data collection.

VI. Reproducibility

We are committed to enabling independent verification of our findings to the extent possible while protecting proprietary methodology.

What Is Available

What Requires Partnership

  • Structural analysis source code
  • Raw model outputs
  • Full statistical analysis scripts

Available to qualified researchers through formal partnership.