{"title":"ManiNeg: Manifestation-guided multimodal pretraining for mammography screening.","authors":"Xujun Li, Xin Wei, Jing Jiang, Danxiang Chen, Wei Zhang, Jinpeng Li","doi":"10.1016/j.compbiomed.2024.109628","DOIUrl":null,"url":null,"abstract":"<p><p>Breast cancer poses a significant health threat worldwide. Contrastive learning has emerged as an effective method to extract critical lesion features from mammograms, thereby offering a potent tool for breast cancer screening and analysis. A crucial aspect of contrastive learning is negative sampling, where the selection of hard negative samples is essential for driving representations to retain detailed lesion information. In large-scale contrastive learning applied to natural images, it is often assumed that extracted features can sufficiently capture semantic content, and that each mini-batch inherently includes ideal hard negative samples. However, the unique characteristics of breast lumps challenge these assumptions when dealing with mammographic data. In response, we introduce ManiNeg, a novel approach that leverages manifestations as proxies to select hard negative samples. As a condensed representation of a physician's domain knowledge, manifestations represent observable symptoms or signs of a disease and can provide a robust basis for choosing hard negative samples. This approach benefits from its invariance to model optimization, facilitating efficient sampling. We tested ManiNeg on the task of distinguishing between benign and malignant breast lumps. Our results demonstrate that ManiNeg not only improves representation in both unimodal and multimodal contexts but also offers benefits that extend to datasets beyond the initial pretraining phase. To support ManiNeg and future research endeavors, we have developed the MVKL mammographic dataset. This dataset includes multi-view mammograms, corresponding reports, meticulously annotated manifestations, and pathologically confirmed benign-malignant outcomes for each case. The MVKL dataset and our codes are publicly available at https://github.com/wxwxwwxxx/ManiNeg to foster further research within the community.</p>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"186 ","pages":"109628"},"PeriodicalIF":6.3000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.compbiomed.2024.109628","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/26 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Breast cancer poses a significant health threat worldwide. Contrastive learning has emerged as an effective method to extract critical lesion features from mammograms, thereby offering a potent tool for breast cancer screening and analysis. A crucial aspect of contrastive learning is negative sampling, where the selection of hard negative samples is essential for driving representations to retain detailed lesion information. In large-scale contrastive learning applied to natural images, it is often assumed that extracted features can sufficiently capture semantic content, and that each mini-batch inherently includes ideal hard negative samples. However, the unique characteristics of breast lumps challenge these assumptions when dealing with mammographic data. In response, we introduce ManiNeg, a novel approach that leverages manifestations as proxies to select hard negative samples. As a condensed representation of a physician's domain knowledge, manifestations represent observable symptoms or signs of a disease and can provide a robust basis for choosing hard negative samples. This approach benefits from its invariance to model optimization, facilitating efficient sampling. We tested ManiNeg on the task of distinguishing between benign and malignant breast lumps. Our results demonstrate that ManiNeg not only improves representation in both unimodal and multimodal contexts but also offers benefits that extend to datasets beyond the initial pretraining phase. To support ManiNeg and future research endeavors, we have developed the MVKL mammographic dataset. This dataset includes multi-view mammograms, corresponding reports, meticulously annotated manifestations, and pathologically confirmed benign-malignant outcomes for each case. The MVKL dataset and our codes are publicly available at https://github.com/wxwxwwxxx/ManiNeg to foster further research within the community.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.