Nat Med:生成模型提高医学分类器在分布转移下的公平性
本文由小咖机器人翻译整理
期刊来源:Nat Med
原文链接:https://doi.org/10.1038/s41591-024-02838-6
摘要内容如下:
领域泛化是医疗保健中机器学习的一个普遍挑战。由于在部署和开发过程中遇到的数据之间的差异,实际条件下的模型性能可能低于预期。在模型开发过程中,某些群体或条件的代表性不足是造成这一现象的常见原因。这一挑战通常不容易通过专家临床医生的目标数据采集和“标记”来解决,由于条件罕见或可用的临床专业知识,这可能非常昂贵或几乎不可能。我们假设,生成人工智能的进步可以以一种可控的方式帮助缓解这种未满足的需求,用合成的例子来丰富我们的训练数据集,以解决未被充分代表的条件或子群的不足。我们展示了扩散模型可以以标签有效的方式从数据中自动学习现实的增强。我们证明了学习增强使模型更加稳健,并且在分布内和分布外统计上更加公平。为了评估我们的方法的通用性,我们研究了三种不同难度的医学成像背景:(1)组织病理学,(2)胸部X线和(3)皮肤病学图像。用合成样本补充真实样本提高了所有三个医疗任务中模型的稳健性,并通过提高未被充分代表的群体(尤其是分布外群体)中临床诊断的准确性,提高了公平性。
英文原文如下:
Abstracts
Domain generalization is a ubiquitous challenge for machine learning in healthcare. Model performance in real-world conditions might be lower than expected because of discrepancies between the data encountered during deployment and development. Underrepresentation of some groups or conditions during model development is a common cause of this phenomenon. This challenge is often not readily addressed by targeted data acquisition and 'labeling' by expert clinicians, which can be prohibitively expensive or practically impossible because of the rarity of conditions or the available clinical expertise. We hypothesize that advances in generative artificial intelligence can help mitigate this unmet need in a steerable fashion, enriching our training dataset with synthetic examples that address shortfalls of underrepresented conditions or subgroups. We show that diffusion models can automatically learn realistic augmentations from data in a label-efficient manner. We demonstrate that learned augmentations make models more robust and statistically fair in-distribution and out of distribution. To evaluate the generality of our approach, we studied three distinct medical imaging contexts of varying difficulty: (1) histopathology, (2) chest X-ray and (3) dermatology images. Complementing real samples with synthetic ones improved the robustness of models in all three medical tasks and increased fairness by improving the accuracy of clinical diagnosis within underrepresented groups, especially out of distribution.
-----------分割线---------
点击链接:https://www.mediecogroup.com/community/user/vip/categories/ ,成为医咖会员,获取12项专属权益。
