Thomas Z Li, Kaiwen Xu, Riqiang Gao, Yucheng Tang, Thomas A Lasko, Fabien Maldonado, Kim L Sandler, Bennett A Landman
{"title":"纵向计算机断层扫描诊断肺癌中的时距视觉变换器","authors":"Thomas Z Li, Kaiwen Xu, Riqiang Gao, Yucheng Tang, Thomas A Lasko, Fabien Maldonado, Kim L Sandler, Bennett A Landman","doi":"10.1117/12.2653911","DOIUrl":null,"url":null,"abstract":"<p><p>Features learned from single radiologic images are unable to provide information about whether and how much a lesion may be changing over time. Time-dependent features computed from repeated images can capture those changes and help identify malignant lesions by their temporal behavior. However, longitudinal medical imaging presents the unique challenge of sparse, irregular time intervals in data acquisition. While self-attention has been shown to be a versatile and efficient learning mechanism for time series and natural images, its potential for interpreting temporal distance between sparse, irregularly sampled spatial features has not been explored. In this work, we propose two interpretations of a time-distance vision transformer (ViT) by using (1) vector embeddings of continuous time and (2) a temporal emphasis model to scale self-attention weights. The two algorithms are evaluated based on benign versus malignant lung cancer discrimination of synthetic pulmonary nodules and lung screening computed tomography studies from the National Lung Screening Trial (NLST). Experiments evaluating the time-distance ViTs on synthetic nodules show a fundamental improvement in classifying irregularly sampled longitudinal images when compared to standard ViTs. In cross-validation on screening chest CTs from the NLST, our methods (0.785 and 0.786 AUC respectively) significantly outperform a cross-sectional approach (0.734 AUC) and match the discriminative performance of the leading longitudinal medical imaging algorithm (0.779 AUC) on benign versus malignant classification. This work represents the first self-attention-based framework for classifying longitudinal medical images. Our code is available at https://github.com/tom1193/time-distance-transformer.</p>","PeriodicalId":74505,"journal":{"name":"Proceedings of SPIE--the International Society for Optical Engineering","volume":"12464 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10353776/pdf/nihms-1858277.pdf","citationCount":"0","resultStr":"{\"title\":\"Time-distance vision transformers in lung cancer diagnosis from longitudinal computed tomography.\",\"authors\":\"Thomas Z Li, Kaiwen Xu, Riqiang Gao, Yucheng Tang, Thomas A Lasko, Fabien Maldonado, Kim L Sandler, Bennett A Landman\",\"doi\":\"10.1117/12.2653911\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Features learned from single radiologic images are unable to provide information about whether and how much a lesion may be changing over time. Time-dependent features computed from repeated images can capture those changes and help identify malignant lesions by their temporal behavior. However, longitudinal medical imaging presents the unique challenge of sparse, irregular time intervals in data acquisition. While self-attention has been shown to be a versatile and efficient learning mechanism for time series and natural images, its potential for interpreting temporal distance between sparse, irregularly sampled spatial features has not been explored. In this work, we propose two interpretations of a time-distance vision transformer (ViT) by using (1) vector embeddings of continuous time and (2) a temporal emphasis model to scale self-attention weights. The two algorithms are evaluated based on benign versus malignant lung cancer discrimination of synthetic pulmonary nodules and lung screening computed tomography studies from the National Lung Screening Trial (NLST). Experiments evaluating the time-distance ViTs on synthetic nodules show a fundamental improvement in classifying irregularly sampled longitudinal images when compared to standard ViTs. In cross-validation on screening chest CTs from the NLST, our methods (0.785 and 0.786 AUC respectively) significantly outperform a cross-sectional approach (0.734 AUC) and match the discriminative performance of the leading longitudinal medical imaging algorithm (0.779 AUC) on benign versus malignant classification. This work represents the first self-attention-based framework for classifying longitudinal medical images. Our code is available at https://github.com/tom1193/time-distance-transformer.</p>\",\"PeriodicalId\":74505,\"journal\":{\"name\":\"Proceedings of SPIE--the International Society for Optical Engineering\",\"volume\":\"12464 \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10353776/pdf/nihms-1858277.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of SPIE--the International Society for Optical Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1117/12.2653911\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2023/4/3 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of SPIE--the International Society for Optical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2653911","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/4/3 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
从单张放射图像中获得的特征无法提供病变是否会随时间发生变化以及变化程度如何的信息。从重复图像中计算出的随时间变化的特征可以捕捉到这些变化,并有助于通过其时间行为识别恶性病变。然而,纵向医学成像面临着数据采集时间间隔稀疏、不规则的独特挑战。虽然自我注意已被证明是一种适用于时间序列和自然图像的多功能、高效的学习机制,但它在解释稀疏、不规则采样的空间特征之间的时间距离方面的潜力还未被发掘。在这项工作中,我们提出了对时间距离视觉变换器(ViT)的两种解释,分别使用(1)连续时间的向量嵌入和(2)时间强调模型来缩放自我注意权重。基于对合成肺结节和国家肺筛查试验(NLST)肺筛查计算机断层扫描研究的良性与恶性肺癌判别,对这两种算法进行了评估。在合成结节上评估时间-距离 ViT 的实验表明,与标准 ViT 相比,时间-距离 ViT 在对不规则采样的纵向图像进行分类方面有根本性的改进。在对来自 NLST 的筛查胸部 CT 进行交叉验证时,我们的方法(AUC 分别为 0.785 和 0.786)明显优于横截面方法(AUC 为 0.734),并在良性与恶性分类方面与领先的纵向医学成像算法(AUC 为 0.779)的分辨性能相当。这项工作代表了首个基于自我注意力的纵向医学图像分类框架。我们的代码见 https://github.com/tom1193/time-distance-transformer。
Time-distance vision transformers in lung cancer diagnosis from longitudinal computed tomography.
Features learned from single radiologic images are unable to provide information about whether and how much a lesion may be changing over time. Time-dependent features computed from repeated images can capture those changes and help identify malignant lesions by their temporal behavior. However, longitudinal medical imaging presents the unique challenge of sparse, irregular time intervals in data acquisition. While self-attention has been shown to be a versatile and efficient learning mechanism for time series and natural images, its potential for interpreting temporal distance between sparse, irregularly sampled spatial features has not been explored. In this work, we propose two interpretations of a time-distance vision transformer (ViT) by using (1) vector embeddings of continuous time and (2) a temporal emphasis model to scale self-attention weights. The two algorithms are evaluated based on benign versus malignant lung cancer discrimination of synthetic pulmonary nodules and lung screening computed tomography studies from the National Lung Screening Trial (NLST). Experiments evaluating the time-distance ViTs on synthetic nodules show a fundamental improvement in classifying irregularly sampled longitudinal images when compared to standard ViTs. In cross-validation on screening chest CTs from the NLST, our methods (0.785 and 0.786 AUC respectively) significantly outperform a cross-sectional approach (0.734 AUC) and match the discriminative performance of the leading longitudinal medical imaging algorithm (0.779 AUC) on benign versus malignant classification. This work represents the first self-attention-based framework for classifying longitudinal medical images. Our code is available at https://github.com/tom1193/time-distance-transformer.