Jun 24, 2025 - Health

Duke researchers examine AI safety in a health care setting

Zachery Eanes

Health care professionals are increasingly turning to artificial intelligence in their day-to-day work, especially for time-consuming tasks like taking medical notes.

In response, Duke researchers are now developing tools to evaluate how well these AI tools are performing in the hospital.

Why it matters: Health systems across the country are investing heavily in AI as a way to reduce burnout for providers and potentially improve care, Axios previously reported.

Still, there are concerns that the tools are imperfect, sometimes producing errors known as hallucinations that could have negative outcomes for patients.

The big picture: The allure of using artificial intelligence in health care is clear, with one study finding AI is reducing note-taking time by 20% and after-hours work by 30%.

And nearly two-thirds of physicians now use some form of AI in their day-to-day working lives, according to a survey by the American Medical Association.
However, a mistake in an AI note could have "downstream consequences" for a patient, if, for example, a transcription indicates the wrong medication, Michael Pencina, chief data scientist at Duke Health, told Axios.

Driving the news: Duke researchers unveiled in two studies this month that they have developed a new framework to assess AI models and monitor how well they perform over time.

The internal tool combines human evaluations, automated metric scoring and simulated edge-case scenarios. One study examined how well AI is performing at taking medical notes; the other looked at generating replies to patients in Epic, an electronic health record software.
The goal is to ensure that these tools are accurate, convey information fluently and avoid bias, according to co-authors Pencina and Chuan Hong, a biostatistics professor at Duke.

Zoom in: The study of medical note-taking tools found that AI generally produced notes that were fluent and clear, but would occasionally produce inaccuracies.

Its performance declined significantly when dealing with new drugs and medications, for example.

In the study of drafting replies on Epic, researchers found the AI-generated replies largely acceptable, with minimal changes needed by providers.

But they faced challenges in occasionally omitting critical information that required providers to make significant edits.

What they're saying: Pencina told Axios that health care systems need a way to keep up with how quickly AI tools are changing and developing.

"We really need to build a good post-market monitoring system, where there is continuous monitoring of these solutions when they're deployed in the real world," he said.
"The technology is moving and evolving fast," he added, "and part of the reason we conducted this study is to know what are the efficient metrics that we can have running on an ongoing basis to capture any issues that occur."

What's next: Pencina said that the goal is to roll out its AI monitoring framework within Duke Health first, and once they're satisfied with its performance, they could share it with other health systems.

Add Axios on Google

Duke researchers examine AI safety in a health care setting

More Raleigh stories