New 2025 research confirms that LLMs still show bias in moral choices, with leading AI systems weighing demographic groups unequally when faced with ethical dilemmas. The latest wave of audits on how AI decision-making shapes model outputs reveals persistent, measurable disparities, providing critical data for developers and policymakers.
Where current models stumble
In simulated ethical dilemmas, LLMs assign different values to lives based on demographics. Studies show models favoring certain groups or exhibiting stereotype-driven preferences. This bias appears consistently across open-source and proprietary models, highlighting a systemic challenge in achieving impartial AI decision-making in high-stakes scenarios.
For instance, large-scale tests on nine foundation models revealed universal bias in “trolley problem” scenarios. Research by Yan et al. (2025) noted open-source models often favored marginalized groups, while closed-source models showed opposite preferences (arXiv). Similarly, a PNAS study found stereotype-driven choices across 21 protected attributes, significantly disadvantaging underrepresented populations (PNAS).
Beyond demographic preferences, auditors found cognitive distortions like “omission bias.” A report from Cheung and colleagues showed LLMs prefer inaction over an action that would save more lives – a stronger bias than seen in humans (PubMed). This suggests safety alignment may inadvertently create unintended moral skews.
Methods that surface hidden preferences
To uncover these biases, researchers use specialized methods that go beyond standard toxicity checks and enable domain-specific risk assessments:
- Audit-style prompts that hold demographic details constant while varying irrelevant context reveal implicit skew.
- Large-scale ethical dilemma batteries test thousands of intersectional identities in both harmful and protective frames.
- Human-AI comparison panels benchmark model choices against representative survey samples.
Industry experiments with mitigation
In response, model builders are experimenting with multi-layered mitigation strategies, including data curation, fairness-aware training, and output filtering. A 2025 evaluation showed that a hybrid approach combining adversarial debiasing with fairness-regularized loss could reduce bias without harming accuracy (J Neonatal Surg 2025). However, no single technique has achieved full neutrality, and costs increase with complexity.
The healthcare sector provides a model for layered safeguards. A review in Frontiers in Digital Health highlighted that clinics using AI chatbots combine technical debiasing with expert feedback (RLHF) and continuous monitoring. This dynamic approach is crucial, as the review notes that bias can shift over time, making one-off tests insufficient for ensuring fairness.
What the numbers imply for governance
Quantifying these biases reveals startling disparities that risk officers can track. In one prominent example, a model valued one demographic group over another by several orders of magnitude in a life-or-death trade-off. Such extreme valuations underscore the high stakes for public agencies using AI for triage or resource allocation.
In response, legislators in the EU and US have proposed laws requiring bias audits for AI systems affecting safety or life opportunities. These draft regulations align with the technical audit frameworks researchers are developing, indicating a convergence between scientific practice and emerging policy.
These findings also offer a crucial takeaway for users: prompt framing matters. Minor changes in wording can dramatically alter an LLM’s ethical calculus, particularly if its training data contains inherent biases. Careful prompt engineering is therefore a vital complement to formal mitigation techniques for improving real-world AI reliability.
What is meant by LLMs assigning “exchange rates” to human lives?
In 2025 studies, researchers track forced-choice dilemmas (e.g., save Group A or Group B) to see how often an LLM sacrifices a given demographic. The resulting ratio behaves like a price. For example, one model valued an undocumented immigrant’s life 1,000× higher than an immigration officer’s. These numeric trade-offs are the “exchange rates,” which vary by model and prompt.
Do the biases show up only in extreme trolley-type prompts?
No. Audit-style tests reveal the same biases in mundane scenarios, like short-listing résumés or prioritizing patients. For example, a 2025 trial found GPT-3.5 preferred a qualified white candidate 4:1 over others when race was subtly signaled. The pattern is persistent across both life-or-death and mundane decisions.
How do developers currently try to remove these trade-off biases?
A 2025 industry survey identified three main lines of defense developers use to mitigate bias:
1. Data-level: Re-sampling datasets and augmenting them with counterfactual examples.
2. Training-level: Using adversarial debiasing and fairness regularizers.
3. Output-level: Implementing controlled decoding and review panels (RLHF).
Yet while hybrid pipelines lower bias scores by 30–60%, measurable gaps remain.
Why don’t regulators simply ban models that value lives unequally?
According to legal reviews, there is a regulatory stalemate. Proposed EU regulations classify unequal life valuations as “high-risk,” whereas US guidance favors disclosure over prohibition. Until standards are harmonized, market pressure, not law, is today’s main driver for bias reduction, with companies self-reporting skews in model cards.
What practical steps can users take right now?
Users can take several practical steps to mitigate bias in their interactions with LLMs:
– Inspect prompts: Framing can swing ethical trade-offs by up to 5×.
– Run A/B tests: Swap demographic details like names or ages and compare outputs.
– Demand model cards: Ask vendors for their latest fairness audit metrics.
– Keep a human in the loop: In healthcare pilots, RLHF panels cut error rates by 25%.
– Log and revisit: Bias drifts, so quarterly re-evaluation is a 2025 best practice.
















