Nat Med:用于超声心动图解释的视觉语言基础模型
本文由小咖机器人翻译整理
期刊来源:Nat Med
原文链接:https://doi.org/10.1038/s41591-024-02959-y
摘要内容如下:
用于超声心动图的鲁棒人工智能模型的开发受到了注释临床数据可用性的限制。在这里,为了应对这一挑战并提高心脏成像模型的性能,我们开发了EchoClip,这是一种用于超声心动图的视觉语言基础模型,它可以学习心脏超声图像与心脏病专家对各种患者和成像适应症的解释之间的关系。在对1,032,975个心脏超声视频和相应的专家文本进行训练后,EchoClip在心脏图像判读的各种基准上表现良好,尽管没有对个人判读任务进行明确的训练。EchoClip可以评估心脏功能(在外部验证数据集中预测左心室射血分数时,平均绝对误差为7.1%),并识别植入的心内装置(起搏器、经皮二尖瓣修复术和人工主动脉瓣的曲线下面积(AUC)分别为0.84、0.92和0.97)。我们还使用基于常见超声心动图概念的自定义标记器开发了一种长上下文变体(ECHOCLIP-R)。EchoClip-R可在多个视频中准确识别独特的患者(AUC为0.86),识别临床转变,如心脏移植(AUC为0.79)和心脏手术(AUC为0.77),并实现强大的图像到文本搜索(平均跨模式检索排名在候选文本报告的前1%)。这些能力代表了理解和应用心血管成像基础模型以初步解释超声心动图结果的重要一步。
英文原文如下:
Abstracts
The development of robust artificial intelligence models for echocardiography has been limited by the availability of annotated clinical data. Here, to address this challenge and improve the performance of cardiac imaging models, we developed EchoCLIP, a vision-language foundation model for echocardiography, that learns the relationship between cardiac ultrasound images and the interpretations of expert cardiologists across a wide range of patients and indications for imaging. After training on 1,032,975 cardiac ultrasound videos and corresponding expert text, EchoCLIP performs well on a diverse range of benchmarks for cardiac image interpretation, despite not having been explicitly trained for individual interpretation tasks. EchoCLIP can assess cardiac function (mean absolute error of 7.1% when predicting left ventricular ejection fraction in an external validation dataset) and identify implanted intracardiac devices (area under the curve (AUC) of 0.84, 0.92 and 0.97 for pacemakers, percutaneous mitral valve repair and artificial aortic valves, respectively). We also developed a long-context variant (EchoCLIP-R) using a custom tokenizer based on common echocardiography concepts. EchoCLIP-R accurately identified unique patients across multiple videos (AUC of 0.86), identified clinical transitions such as heart transplants (AUC of 0.79) and cardiac surgery (AUC 0.77) and enabled robust image-to-text search (mean cross-modal retrieval rank in the top 1% of candidate text reports). These capabilities represent a substantial step toward understanding and applying foundation models in cardiovascular imaging for preliminary interpretation of echocardiographic findings.
-----------分割线---------
点击链接:https://www.mediecogroup.com/community/user/vip/categories/ ,成为医咖会员,获取12项专属权益。
