Cross Attention Transformers for Multi-modal Unsupervised Whole-Body PET Anomaly Detection

Ashay Patel, Petru-Danial Tudiosu, Walter H.L. Pinaya, Gary Cook, Vicky Goh, Sebastien Ourselin, M. Jorge Cardoso
{"title":"Cross Attention Transformers for Multi-modal Unsupervised Whole-Body PET Anomaly Detection","authors":"Ashay Patel, Petru-Danial Tudiosu, Walter H.L. Pinaya, Gary Cook, Vicky Goh, Sebastien Ourselin, M. Jorge Cardoso","doi":"10.59275/j.melba.2023-18c1","DOIUrl":null,"url":null,"abstract":"Cancer is a highly heterogeneous condition that can occur almost anywhere in the human body. [<sup>18</sup>F]fluorodeoxyglucose Positron Emission Tomography (<sup>18</sup>F-FDG PET) is a imaging modality commonly used to detect cancer due to its high sensitivity and clear visualisation of the pattern of metabolic activity. Nonetheless, as cancer is highly heterogeneous, it is challenging to train general-purpose discriminative cancer detection models, with data availability and disease complexity often cited as a limiting factor. Unsupervised learning methods, more specifically anomaly detection models, have been suggested as a putative solution. These models learn a healthy representation of tissue and detect cancer by predicting deviations from the healthy norm, which requires models capable of accurately learning long-range interactions between organs, their imaging patterns, and other abstract features with high levels of expressivity. Such characteristics are suitably satisfied by transformers, which have been shown to generate state-of-the-art results in unsupervised anomaly detection by training on normal data. This work expands upon such approaches by introducing multi-modal conditioning of the transformer via cross-attention i.e. supplying anatomical reference information from paired CT images to aid the PET anomaly detection task. Furthermore, we show the importance and impact of codebook sizing within a Vector Quantized Variational Autoencoder, on the ability of the transformer network to fulfill the task of anomaly detection. Using 294 whole-body PET/CT samples containing various cancer types, we show that our anomaly detection method is robust and capable of achieving accurate cancer localization results even in cases where normal training data is unavailable. In addition, we show the efficacy of this approach on out-of-sample data showcasing the generalizability of this approach even with limited training data. Lastly, we propose to combine model uncertainty with a new kernel density estimation approach, and show that it provides clinically and statistically significant improvements in accuracy and robustness, when compared to the classic residual-based anomaly maps. Overall, a superior performance is demonstrated against leading state-of-the-art alternatives, drawing attention to the potential of these approaches.","PeriodicalId":75083,"journal":{"name":"The journal of machine learning for biomedical imaging","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The journal of machine learning for biomedical imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.59275/j.melba.2023-18c1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Cancer is a highly heterogeneous condition that can occur almost anywhere in the human body. [18F]fluorodeoxyglucose Positron Emission Tomography (18F-FDG PET) is a imaging modality commonly used to detect cancer due to its high sensitivity and clear visualisation of the pattern of metabolic activity. Nonetheless, as cancer is highly heterogeneous, it is challenging to train general-purpose discriminative cancer detection models, with data availability and disease complexity often cited as a limiting factor. Unsupervised learning methods, more specifically anomaly detection models, have been suggested as a putative solution. These models learn a healthy representation of tissue and detect cancer by predicting deviations from the healthy norm, which requires models capable of accurately learning long-range interactions between organs, their imaging patterns, and other abstract features with high levels of expressivity. Such characteristics are suitably satisfied by transformers, which have been shown to generate state-of-the-art results in unsupervised anomaly detection by training on normal data. This work expands upon such approaches by introducing multi-modal conditioning of the transformer via cross-attention i.e. supplying anatomical reference information from paired CT images to aid the PET anomaly detection task. Furthermore, we show the importance and impact of codebook sizing within a Vector Quantized Variational Autoencoder, on the ability of the transformer network to fulfill the task of anomaly detection. Using 294 whole-body PET/CT samples containing various cancer types, we show that our anomaly detection method is robust and capable of achieving accurate cancer localization results even in cases where normal training data is unavailable. In addition, we show the efficacy of this approach on out-of-sample data showcasing the generalizability of this approach even with limited training data. Lastly, we propose to combine model uncertainty with a new kernel density estimation approach, and show that it provides clinically and statistically significant improvements in accuracy and robustness, when compared to the classic residual-based anomaly maps. Overall, a superior performance is demonstrated against leading state-of-the-art alternatives, drawing attention to the potential of these approaches.
多模态无监督全身PET异常检测的交叉注意转换器
癌症是一种高度异质性的疾病,几乎可以发生在人体的任何地方。[<sup>18</sup>F]氟脱氧葡萄糖正电子发射断层扫描(<sup>18</sup>F- fdg PET)是一种通常用于检测癌症的成像方式,因为它具有高灵敏度和清晰的代谢活动模式的可视化。然而,由于癌症是高度异质性的,训练通用的判别性癌症检测模型是具有挑战性的,数据可用性和疾病复杂性通常被认为是限制因素。无监督学习方法,更具体地说是异常检测模型,被认为是一种假定的解决方案。这些模型学习组织的健康表示,并通过预测与健康规范的偏差来检测癌症,这需要模型能够准确地学习器官之间的远程相互作用,它们的成像模式,以及其他具有高水平表达能力的抽象特征。变压器可以很好地满足这些特征,通过对正常数据的训练,变压器可以在无监督异常检测中产生最先进的结果。这项工作扩展了这些方法,通过交叉注意引入变压器的多模态调节,即提供来自成对CT图像的解剖学参考信息,以帮助PET异常检测任务。此外,我们还展示了在矢量量化变分自编码器中码本大小对变压器网络完成异常检测任务的能力的重要性和影响。使用294个包含各种癌症类型的全身PET/CT样本,我们表明我们的异常检测方法是鲁棒的,即使在无法获得正常训练数据的情况下也能够获得准确的癌症定位结果。此外,我们展示了这种方法在样本外数据上的有效性,证明了这种方法即使在有限的训练数据下也具有泛化性。最后,我们提出将模型不确定性与一种新的核密度估计方法相结合,并表明与经典的基于残差的异常图相比,它在准确性和鲁棒性方面提供了临床和统计上显著的改进。总体而言,与领先的最先进的替代方案相比,展示了优越的性能,引起了人们对这些方法潜力的关注。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信