Open-Source AI Matches Top Proprietary Model in Complex Medical Diagnoses
An open-source AI model has matched the performance of a leading proprietary AI tool in solving challenging medical cases that require complex clinical reasoning, according to a new NIH-funded study led by researchers at Harvard Medical School.

An open-source AI model has matched the performance of a leading proprietary AI tool in solving challenging medical cases that require complex clinical reasoning, according to a new NIH-funded study led by researchers at Harvard Medical School.
The research, published March 14 in JAMA Health Forum, shows that the open-source AI tool Llama 3.1 405B performed on par with GPT-4, a leading proprietary closed-source model, when tested on 92 diagnostically challenging clinical scenarios from The New England Journal of Medicine.
"To our knowledge, this is the first time an open-source AI model has matched the performance of GPT-4 on such challenging cases as assessed by physicians," said senior author Arjun Manrai, assistant professor of biomedical informatics in the Blavatnik Institute at HMS. "It really is stunning that the Llama models caught up so quickly with the leading proprietary model."
The open-source model correctly diagnosed 70 percent of cases, compared with 64 percent for GPT-4. It also ranked the correct diagnosis as its first suggestion 41 percent of the time, versus 37 percent for GPT-4.
Open-source models offer several advantages over closed-source alternatives, including the ability to run on a hospital's private computers (keeping patient data in-house), customization capabilities for specific clinical needs, and the potential for fine-tuning with local data.
"The open-source model is likely to be more appealing to many chief information officers, hospital administrators, and physicians since there's something fundamentally different about data leaving the hospital for another entity, even a trusted one," said the study's lead author, Thomas Buckley, a doctoral student in the AI in Medicine track at HMS.
Each year, approximately 795,000 patients in the United States die or suffer permanent disability due to diagnostic error, according to a 2023 report. AI tools could potentially serve as valuable diagnostic aides to enhance both the accuracy and speed of diagnosis when responsibly incorporated into healthcare systems.
By Mark Gaige, Harvard Medical School