ManiNeg: Manifestation-guided multimodal pretraining for mammography screening.

IF 6.3 2区医学 Q1 BIOLOGY

Computers in biology and medicine Pub Date : 2025-03-01 Epub Date: 2025-01-26 DOI:10.1016/j.compbiomed.2024.109628

Xujun Li, Xin Wei, Jing Jiang, Danxiang Chen, Wei Zhang, Jinpeng Li

{"title":"ManiNeg: Manifestation-guided multimodal pretraining for mammography screening.","authors":"Xujun Li, Xin Wei, Jing Jiang, Danxiang Chen, Wei Zhang, Jinpeng Li","doi":"10.1016/j.compbiomed.2024.109628","DOIUrl":null,"url":null,"abstract":"<p><p>Breast cancer poses a significant health threat worldwide. Contrastive learning has emerged as an effective method to extract critical lesion features from mammograms, thereby offering a potent tool for breast cancer screening and analysis. A crucial aspect of contrastive learning is negative sampling, where the selection of hard negative samples is essential for driving representations to retain detailed lesion information. In large-scale contrastive learning applied to natural images, it is often assumed that extracted features can sufficiently capture semantic content, and that each mini-batch inherently includes ideal hard negative samples. However, the unique characteristics of breast lumps challenge these assumptions when dealing with mammographic data. In response, we introduce ManiNeg, a novel approach that leverages manifestations as proxies to select hard negative samples. As a condensed representation of a physician's domain knowledge, manifestations represent observable symptoms or signs of a disease and can provide a robust basis for choosing hard negative samples. This approach benefits from its invariance to model optimization, facilitating efficient sampling. We tested ManiNeg on the task of distinguishing between benign and malignant breast lumps. Our results demonstrate that ManiNeg not only improves representation in both unimodal and multimodal contexts but also offers benefits that extend to datasets beyond the initial pretraining phase. To support ManiNeg and future research endeavors, we have developed the MVKL mammographic dataset. This dataset includes multi-view mammograms, corresponding reports, meticulously annotated manifestations, and pathologically confirmed benign-malignant outcomes for each case. The MVKL dataset and our codes are publicly available at https://github.com/wxwxwwxxx/ManiNeg to foster further research within the community.</p>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"186 ","pages":"109628"},"PeriodicalIF":6.3000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.compbiomed.2024.109628","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/26 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Breast cancer poses a significant health threat worldwide. Contrastive learning has emerged as an effective method to extract critical lesion features from mammograms, thereby offering a potent tool for breast cancer screening and analysis. A crucial aspect of contrastive learning is negative sampling, where the selection of hard negative samples is essential for driving representations to retain detailed lesion information. In large-scale contrastive learning applied to natural images, it is often assumed that extracted features can sufficiently capture semantic content, and that each mini-batch inherently includes ideal hard negative samples. However, the unique characteristics of breast lumps challenge these assumptions when dealing with mammographic data. In response, we introduce ManiNeg, a novel approach that leverages manifestations as proxies to select hard negative samples. As a condensed representation of a physician's domain knowledge, manifestations represent observable symptoms or signs of a disease and can provide a robust basis for choosing hard negative samples. This approach benefits from its invariance to model optimization, facilitating efficient sampling. We tested ManiNeg on the task of distinguishing between benign and malignant breast lumps. Our results demonstrate that ManiNeg not only improves representation in both unimodal and multimodal contexts but also offers benefits that extend to datasets beyond the initial pretraining phase. To support ManiNeg and future research endeavors, we have developed the MVKL mammographic dataset. This dataset includes multi-view mammograms, corresponding reports, meticulously annotated manifestations, and pathologically confirmed benign-malignant outcomes for each case. The MVKL dataset and our codes are publicly available at https://github.com/wxwxwwxxx/ManiNeg to foster further research within the community.

查看原文本刊更多论文

maninem：表现导向的乳房x光筛查多模式预训练。

乳腺癌对全世界的健康构成重大威胁。对比学习已经成为一种从乳房x光检查中提取关键病变特征的有效方法，从而为乳腺癌筛查和分析提供了有力的工具。对比学习的一个关键方面是负抽样，其中硬负抽样的选择对于驱动表征以保留详细的病变信息至关重要。在应用于自然图像的大规模对比学习中，通常假设提取的特征可以充分捕获语义内容，并且每个小批本质上包含理想的硬负样本。然而，乳房肿块的独特特征在处理乳房x光检查数据时挑战了这些假设。作为回应，我们引入了ManiNeg，这是一种利用表现作为代理来选择硬负样本的新方法。作为医生领域知识的浓缩表示，表现代表可观察到的疾病症状或体征，可以为选择硬阴性样本提供坚实的基础。该方法对模型优化具有不变性，便于高效采样。我们测试了ManiNeg区分良性和恶性乳房肿块的任务。我们的研究结果表明，ManiNeg不仅提高了单模态和多模态上下文中的表示，而且还提供了超出初始预训练阶段的扩展到数据集的好处。为了支持管理和未来的研究工作，我们开发了MVKL乳房x线摄影数据集。该数据集包括多视图乳房x线照片，相应的报告，精心注释的表现，以及病理证实的良性恶性结果。MVKL数据集和我们的代码可在https://github.com/wxwxwwxxx/ManiNeg上公开获取，以促进社区内的进一步研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers in biology and medicine 工程技术-工程：生物医学

CiteScore

11.70

自引率

10.40%

发文量

1086

审稿时长

74 days

期刊介绍： Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.