Rachid Zeghlache, Pierre-Henri Conze, Mostafa El Habib Daho, Yihao Li, Alireza Rezaei, Hugo Le Boité, Ramin Tadayoni, Pascal Massin, Béatrice Cochener, Ikram Brahim, Gwenolé Quellec, Mathieu Lamard
{"title":"L-MAE: Longitudinal masked auto-encoder with time and severity-aware encoding for diabetic retinopathy progression prediction.","authors":"Rachid Zeghlache, Pierre-Henri Conze, Mostafa El Habib Daho, Yihao Li, Alireza Rezaei, Hugo Le Boité, Ramin Tadayoni, Pascal Massin, Béatrice Cochener, Ikram Brahim, Gwenolé Quellec, Mathieu Lamard","doi":"10.1016/j.compbiomed.2024.109508","DOIUrl":null,"url":null,"abstract":"<p><p>Pre-training strategies based on self-supervised learning (SSL) have demonstrated success as pretext tasks for downstream tasks in computer vision. However, while SSL methods are often domain-agnostic, their direct application to medical imaging is challenging due to the distinct nature of medical images, including specific anatomical and temporal patterns relevant to disease progression. Additionally, traditional SSL pretext tasks often lack the contextual knowledge that is essential for clinical decision support. In this paper, we developed a longitudinal masked auto-encoder (MAE) that builds on the Transformer-based MAE architecture, specifically introducing a time-aware position embedding and a disease progression-aware masking strategy. Unlike traditional sequential approaches, our method incorporates the actual time intervals between examinations, allowing for better capture of temporal trends. Furthermore, the masking strategy evolves in alignment with disease progression during follow-up exams to capture pathological changes, improving disease progression assessments. Using the OPHDIAT dataset, a large-scale longitudinal screening dataset for diabetic retinopathy (DR), we evaluated our pre-trained model by predicting the severity level at the next visit within three years, based on past examination series. Our findings demonstrate that both the time-aware position embedding and the disease progression-informed masking significantly enhance predictive accuracy. Compared to conventional baseline models and standard longitudinal Transformers, these simple yet effective adaptations substantially improve the predictive power of deep classification models in this domain.</p>","PeriodicalId":10578,"journal":{"name":"Computers in biology and medicine","volume":"185 ","pages":"109508"},"PeriodicalIF":7.0000,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in biology and medicine","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.compbiomed.2024.109508","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Pre-training strategies based on self-supervised learning (SSL) have demonstrated success as pretext tasks for downstream tasks in computer vision. However, while SSL methods are often domain-agnostic, their direct application to medical imaging is challenging due to the distinct nature of medical images, including specific anatomical and temporal patterns relevant to disease progression. Additionally, traditional SSL pretext tasks often lack the contextual knowledge that is essential for clinical decision support. In this paper, we developed a longitudinal masked auto-encoder (MAE) that builds on the Transformer-based MAE architecture, specifically introducing a time-aware position embedding and a disease progression-aware masking strategy. Unlike traditional sequential approaches, our method incorporates the actual time intervals between examinations, allowing for better capture of temporal trends. Furthermore, the masking strategy evolves in alignment with disease progression during follow-up exams to capture pathological changes, improving disease progression assessments. Using the OPHDIAT dataset, a large-scale longitudinal screening dataset for diabetic retinopathy (DR), we evaluated our pre-trained model by predicting the severity level at the next visit within three years, based on past examination series. Our findings demonstrate that both the time-aware position embedding and the disease progression-informed masking significantly enhance predictive accuracy. Compared to conventional baseline models and standard longitudinal Transformers, these simple yet effective adaptations substantially improve the predictive power of deep classification models in this domain.
期刊介绍:
Computers in Biology and Medicine is an international forum for sharing groundbreaking advancements in the use of computers in bioscience and medicine. This journal serves as a medium for communicating essential research, instruction, ideas, and information regarding the rapidly evolving field of computer applications in these domains. By encouraging the exchange of knowledge, we aim to facilitate progress and innovation in the utilization of computers in biology and medicine.