Lei Bi;Xiaohang Fu;Qiufang Liu;Shaoli Song;David Dagan Feng;Michael Fulham;Jinman Kim
{"title":"通过级联 CNN 变换器网络共同学习多模态 PET-CT 特征","authors":"Lei Bi;Xiaohang Fu;Qiufang Liu;Shaoli Song;David Dagan Feng;Michael Fulham;Jinman Kim","doi":"10.1109/TRPMS.2024.3417901","DOIUrl":null,"url":null,"abstract":"<italic>Background:</i>\n Automated segmentation of multimodality positron emission tomography—computed tomography (PET-CT) data is a major challenge in the development of computer-aided diagnosis systems (CADs). In this context, convolutional neural network (CNN)-based methods are considered as the state-of-the-art. These CNN-based methods, however, have difficulty in co-learning the complementary PET-CT image features and in learning the global context when focusing solely on local patterns. \n<italic>Methods:</i>\n We propose a cascaded CNN-transformer network (CCNN-TN) tailored for PET-CT image segmentation. We employed a transformer network (TN) because of its ability to establish global context via self-attention and embedding image patches. We extended the TN definition by cascading multiple TNs and CNNs to learn the global and local contexts. We also introduced a hyper fusion branch that iteratively fuses the separately extracted complementary image features. We evaluated our approach, when compared to current state-of-the-art CNN methods, on three datasets: two nonsmall cell lung cancer (NSCLC) and one soft tissue sarcoma (STS). \n<italic>Results:</i>\n Our CCNN-TN method achieved a dice similarity coefficient (DSC) score of 72.25% (NSCLC), 67.11% (NSCLC), and 66.36% (STS) for segmentation of tumors. Compared to other methods the DSC was higher for our CCNN-TN by 4.5%, 1.31%, and 3.44%. \n<italic>Conclusion:</i>\n Our experimental results demonstrate that CCNN-TN, when compared to the existing methods, achieved more generalizable results across different datasets and has consistent performance across various image fusion strategies and network backbones.","PeriodicalId":46807,"journal":{"name":"IEEE Transactions on Radiation and Plasma Medical Sciences","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Co-Learning Multimodality PET-CT Features via a Cascaded CNN-Transformer Network\",\"authors\":\"Lei Bi;Xiaohang Fu;Qiufang Liu;Shaoli Song;David Dagan Feng;Michael Fulham;Jinman Kim\",\"doi\":\"10.1109/TRPMS.2024.3417901\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<italic>Background:</i>\\n Automated segmentation of multimodality positron emission tomography—computed tomography (PET-CT) data is a major challenge in the development of computer-aided diagnosis systems (CADs). In this context, convolutional neural network (CNN)-based methods are considered as the state-of-the-art. These CNN-based methods, however, have difficulty in co-learning the complementary PET-CT image features and in learning the global context when focusing solely on local patterns. \\n<italic>Methods:</i>\\n We propose a cascaded CNN-transformer network (CCNN-TN) tailored for PET-CT image segmentation. We employed a transformer network (TN) because of its ability to establish global context via self-attention and embedding image patches. We extended the TN definition by cascading multiple TNs and CNNs to learn the global and local contexts. We also introduced a hyper fusion branch that iteratively fuses the separately extracted complementary image features. We evaluated our approach, when compared to current state-of-the-art CNN methods, on three datasets: two nonsmall cell lung cancer (NSCLC) and one soft tissue sarcoma (STS). \\n<italic>Results:</i>\\n Our CCNN-TN method achieved a dice similarity coefficient (DSC) score of 72.25% (NSCLC), 67.11% (NSCLC), and 66.36% (STS) for segmentation of tumors. Compared to other methods the DSC was higher for our CCNN-TN by 4.5%, 1.31%, and 3.44%. \\n<italic>Conclusion:</i>\\n Our experimental results demonstrate that CCNN-TN, when compared to the existing methods, achieved more generalizable results across different datasets and has consistent performance across various image fusion strategies and network backbones.\",\"PeriodicalId\":46807,\"journal\":{\"name\":\"IEEE Transactions on Radiation and Plasma Medical Sciences\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":4.6000,\"publicationDate\":\"2024-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Radiation and Plasma Medical Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10570071/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Radiation and Plasma Medical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10570071/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
Co-Learning Multimodality PET-CT Features via a Cascaded CNN-Transformer Network
Background:
Automated segmentation of multimodality positron emission tomography—computed tomography (PET-CT) data is a major challenge in the development of computer-aided diagnosis systems (CADs). In this context, convolutional neural network (CNN)-based methods are considered as the state-of-the-art. These CNN-based methods, however, have difficulty in co-learning the complementary PET-CT image features and in learning the global context when focusing solely on local patterns.
Methods:
We propose a cascaded CNN-transformer network (CCNN-TN) tailored for PET-CT image segmentation. We employed a transformer network (TN) because of its ability to establish global context via self-attention and embedding image patches. We extended the TN definition by cascading multiple TNs and CNNs to learn the global and local contexts. We also introduced a hyper fusion branch that iteratively fuses the separately extracted complementary image features. We evaluated our approach, when compared to current state-of-the-art CNN methods, on three datasets: two nonsmall cell lung cancer (NSCLC) and one soft tissue sarcoma (STS).
Results:
Our CCNN-TN method achieved a dice similarity coefficient (DSC) score of 72.25% (NSCLC), 67.11% (NSCLC), and 66.36% (STS) for segmentation of tumors. Compared to other methods the DSC was higher for our CCNN-TN by 4.5%, 1.31%, and 3.44%.
Conclusion:
Our experimental results demonstrate that CCNN-TN, when compared to the existing methods, achieved more generalizable results across different datasets and has consistent performance across various image fusion strategies and network backbones.