Methodology
Rigorous design. Transparent where possible. Protected where necessary.
This research employs two complementary methodological approaches:
- Matched-pair testing — extending Bertrand & Mullainathan's seminal methodology to AI systems
- Structural analysis — proprietary methods detecting bias in narrative structure (methodology protected)
The first approach is fully documented here. The second produces the findings shown in our evidence section, with methodology details forthcoming in academic publication.
I. Matched-Pair Testing Design
Identical inputs, different names. Any difference in output is attributable to the name.
Experimental Structure
| Name Pairs | 54 pairs across 6 demographic contrast categories |
| Symptom Profiles | 20 standardized clinical presentations |
| Total Comparisons | 1,080 matched-pair tests |
| Control Variables | Prompt template, model version, temperature, system prompt |
| Independent Variable | Name only |
Name Pair Categories
Name-demographic associations validated through Bertrand & Mullainathan (2004) and Fryer & Levitt (2004).
- Anglo vs. African American name signals
- Anglo vs. Hispanic/Latino name signals
- Anglo vs. Asian name signals
- Male vs. Female name signals
- Professional title (Dr.) vs. No title
- Socioeconomic indicator variations
Symptom Profile Categories
20 standardized presentations across clinical domains:
- Cardiovascular (chest pain, palpitations, shortness of breath)
- Pain management (chronic pain, acute pain, medication questions)
- Psychiatric (depression, anxiety, psychosis presentations)
- Emergency (acute abdomen, mental health crisis)
- General (fatigue, neurological symptoms)
II. Vocabulary-Level Analysis
Content analysis of AI-generated responses for explicit recommendation differences.
Analysis Methods
- Keyword extraction — automated identification of treatment recommendations, urgency markers, referral language
- Treatment coding — classification of recommended treatments, medications, and care pathways
- Urgency scoring — quantification of urgency language (immediate, soon, routine, etc.)
- Sentiment analysis — tone and framing of responses
Coding Reliability
Inter-rater reliability established through:
- Independent dual coding of 20% sample
- Cohen's kappa > 0.85 for all coding categories
- Discrepancies resolved through consensus discussion
III. Structural Analysis
Beyond what AI says — how it says it.
Methodology Protected
Structural analysis methodology is proprietary. Academic publication forthcoming. Research partnerships available for qualified investigators.
What We Can Share
We developed methods to detect bias in narrative structure—the mathematical patterns of how AI-generated text unfolds differently based on name signals.
- Analysis operates on 12 structural dimensions
- Detects patterns invisible to vocabulary analysis
- Produces quantitative scores comparable across conditions
- Generates behavioral signatures characterizing response patterns
The 12 Dimensions (Names Only)
Structural analysis examines:
- Semantic Distance — trajectory through semantic space
- Power Dynamics — agency asymmetry between entities
- Entropy — information-theoretic disorder patterns
- Tension/Release — buildup and release cycles
- Boundary Crossing — formal/personal transitions
- Proximity Dynamics — linguistic intimacy patterns
- Narrative Velocity — pacing and acceleration
- Symmetry Breaking — balance shifts over time
- Resonance — rhythmic synchronization patterns
- Phase Transitions — state changes in linguistic patterns
- Information Density — compression patterns
- Temporal Displacement — tense and time orientation shifts
What Is Not Available
Mathematical formulas, detector code, weighting schemes, pattern classification.
Available through research partnership.
IV. Statistical Framework
Standard methods for robust inference.
Effect Size Estimation
All findings reported as Cohen's d with interpretation:
| Cohen's d | Interpretation |
|---|---|
| 0.2 | Small effect |
| 0.5 | Medium effect |
| 0.8 | Large effect |
| >1.0 | Very large effect |
Multiple Comparison Correction
With 12 structural dimensions and multiple demographic comparisons, we apply:
- Bonferroni correction — adjusting significance threshold by number of comparisons
- False Discovery Rate control — Benjamini-Hochberg procedure
- All reported findings remain significant after correction
Confidence Intervals
95% confidence intervals estimated through:
- Bootstrap resampling — 10,000 iterations
- Bias-corrected and accelerated (BCa) intervals
- All reported CIs exclude zero (confirming significance)
V. Pre-Registration
This study's methodology and analysis plan were specified before data collection began.
Pre-Registration Details
| Protocol Version | 1.0 |
| Date | January 2026 |
| Contents | Hypotheses, methodology, analysis plan, stopping rules |
| Deviations | None |
Pre-registration prevents post-hoc adjustment of methods to achieve desired results. All analyses reported were planned before data collection.
VI. Reproducibility
We are committed to enabling independent verification of our findings to the extent possible while protecting proprietary methodology.
What Is Available
- Name pairs dataset (54 pairs, 6 categories) — Download JSON
- Symptom profiles (20 standardized presentations) — Download JSON
- Healthcare findings summary — Download JSON
- Structural findings summary (results only) — Download JSON
What Requires Partnership
- Structural analysis source code
- Raw model outputs
- Full statistical analysis scripts
Available to qualified researchers through formal partnership.