Nat Med:自适应大型语言模型在临床文本摘要方面优于医学专家
本文由小咖机器人翻译整理
期刊来源:Nat Med
原文链接:https://doi.org/10.1038/s41591-024-02855-5
摘要内容如下:
分析大量的文本数据和总结来自电子健康记录的关键信息给临床医生如何分配他们的时间带来了巨大的负担。尽管大型语言模型(LLM)在自然语言处理(NLP)任务中已显示出良好的前景,但其在各种临床摘要任务中的有效性仍未得到证实。在这里,我们将适应方法应用于八个LLM,跨越四个不同的临床总结任务:放射学报告、患者问题、进展记录和医患对话。句法、语义和概念自然语言处理度量的定量评估揭示了模型和适应方法之间的权衡。一项由10名医生参与的临床读者研究评估了摘要的完整性、正确性和简明性;在大多数情况下,与医学专家的总结相比,我们最适合的LLM的总结被认为相当(45%)或更好(36%)。随后的安全分析强调了LLM和医学专家面临的挑战,因为我们将错误与潜在的医疗危害联系起来,并对伪造信息的类型进行分类。我们的研究提供了LLM在多个任务的临床文本摘要方面优于医学专家的证据。这表明,将LLM集成到临床工作流程中可以减轻文档负担,使临床医生能够更专注于患者护理。
英文原文如下:
Abstracts
Analyzing vast textual data and summarizing key information from electronic health records imposes a substantial burden on how clinicians allocate their time. Although large language models (LLMs) have shown promise in natural language processing (NLP) tasks, their effectiveness on a diverse range of clinical summarization tasks remains unproven. Here we applied adaptation methods to eight LLMs, spanning four distinct clinical summarization tasks: radiology reports, patient questions, progress notes and doctor-patient dialogue. Quantitative assessments with syntactic, semantic and conceptual NLP metrics reveal trade-offs between models and adaptation methods. A clinical reader study with 10 physicians evaluated summary completeness, correctness and conciseness; in most cases, summaries from our best-adapted LLMs were deemed either equivalent (45%) or superior (36%) compared with summaries from medical experts. The ensuing safety analysis highlights challenges faced by both LLMs and medical experts, as we connect errors to potential medical harm and categorize types of fabricated information. Our research provides evidence of LLMs outperforming medical experts in clinical text summarization across multiple tasks. This suggests that integrating LLMs into clinical workflows could alleviate documentation burden, allowing clinicians to focus more on patient care.
-----------分割线---------
点击链接:https://www.mediecogroup.com/community/user/vip/categories/ ,成为医咖会员,获取12项专属权益。
