In Harvard study, AI offered more accurate emergency room diagnoses than two human doctors

The story

A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases — where at least one model seemed to be more accurate than human doctors.
From the source
In Harvard study, AI offered more accurate emergency room diagnoses than two human doctors Anthony Ha 11:00 AM PDT · May 3, 2026 A new study examines how large language models perform in a variety of medical contexts, including real emergency room cases — where at least one model seemed to be more accurate than human doctors.
The study was published this week in Science and comes from a research team led by physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. The researchers said they conducted a variety of experiments to measure how OpenAI’s models compared to human physicians.
In one experiment, researchers focused on 76 patients who came into the Beth Israel emergency room, comparing the diagnoses offered by two internal medicine attending physicians to those generated by OpenAI’s o1 and 4o models. These diagnoses were assessed by two other attending physicians, who did not know which ones came from humans and which came from AI.
Who and what
Key names and topics in this story: Harvard.
Where to follow next
- Read the full piece at techcrunch.com
- More from our AI & prompts coverage

Related stories

A Coding Implementation to Explore and Analyze the TaskTrove Dataset with Streaming Parsing Visualization and Verifier Detection
In this tutorial, we take a deep dive into the TaskTrove dataset on Hugging Face and build a complete, practical workflow to efficiently explore it. Instead of downloading the full multi-gigabyte dataset, we stream it directly and work with individual samples in real time. We beg

‘This is fine’ creator says AI startup stole his art
The ad comes from Artisan, the AI startup behind billboards urging businesses to "stop hiring humans."

A Developer’s Guide to Systematic Prompting: Mastering Negative Constraints, Structured JSON Outputs, and Multi-Hypothesis Verbalized Sampling
Most developers treat prompting as an afterthought—write something reasonable, observe the output, and iterate if needed. That approach works until reliability becomes critical. As LLMs move into production systems, the difference between a prompt that usually works and one that

A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8× Rollout Generation Speedup at 8B and Projects 2.5× End-to-End Speedup at 235B
A new paper from NVIDIA Research integrates speculative decoding directly into NeMo RL with a vLLM backend, delivering lossless rollout acceleration at both 8B and projected 235B model scales. The post A New NVIDIA Research Shows Speculative Decoding in NeMo RL Achieves 1.8× Roll