Quantitative Health Sciences Calendar
“Opportunities to Advance Health Equity through Implementation Science”
Wednesday, January 7, 2026
Event Description
Large language models (LLMs) are increasingly used in healthcare, yet most evaluations rely on clean, exam-style datasets that fail to capture the complexity of real-world clinical data, such as electronic health records (EHRs), and rarely keep pace with rapidly evolving LLMs. In this talk, we will present BRIDGE, a multilingual benchmark constructed from real clinical tasks and over one million EHR-derived samples, where we evaluated 95 leading LLMs through 24,000+ experiments and 39 million predictions. Our findings show wide variation across model families, tasks, and languages, with several open-source models matching proprietary ones. We also observe that chain-of-thought prompting often lowers accuracy for these clinical tasks, and we provide the first large-scale analysis of stigmatized language generated during model reasoning.
Click here to join or call 1 301 7158592, Meeting ID: 96956824446#password: 840948