{"title":"Multi-modality multiorgan image segmentation using continual learning with enhanced hard attention to the task.","authors":"Ming-Long Wu, Yi-Fan Peng","doi":"10.1002/mp.17842","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Enabling a deep neural network (DNN) to learn multiple tasks using the concept of continual learning potentially better mimics human brain functions. However, current continual learning studies for medical image segmentation are mostly limited to single-modality images at identical anatomical locations.</p><p><strong>Purpose: </strong>To propose and evaluate a continual learning method termed eHAT (enhanced hard attention to the task) for performing multi-modality, multiorgan segmentation tasks using a DNN.</p><p><strong>Methods: </strong>Four public datasets covering the lumbar spine, heart, and brain acquired by magnetic resonance imaging (MRI) and computed tomography (CT) were included to segment the vertebral bodies, the right ventricle, and brain tumors, respectively. Three-task (spine CT, heart MRI, and brain MRI) and four-task (spine CT, heart MRI, brain MRI, and spine MRI) models were tested for eHAT, with the three-task results compared with state-of-the-art continual learning methods. The effectiveness of multitask performance was measured using the forgetting rate, defined as the average difference in Dice coefficients and Hausdorff distances between multiple-task and single-task models. The ability to transfer knowledge to different tasks was evaluated using backward transfer (BWT).</p><p><strong>Results: </strong>The forgetting rates were -2.51% to -0.60% for the three-task eHAT models with varying task orders, substantially better than the -18.13% to -3.59% using original hard attention to the task (HAT), while those in four-task models were -2.54% to -1.59%. In addition, four-task U-net models with eHAT using only half the number of channels (1/4 parameters) yielded nearly equal performance with or without regularization. A retrospective model comparison showed that eHAT with fixed or automatic regularization had significantly superior BWT (-3% to 0%) compared to HAT (-22% to -4%).</p><p><strong>Conclusion: </strong>We demonstrate for the first time that eHAT effectively achieves continual learning of multi-modality, multiorgan segmentation tasks using a single DNN, with improved forgetting rates compared with HAT.</p>","PeriodicalId":94136,"journal":{"name":"Medical physics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medical physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/mp.17842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Enabling a deep neural network (DNN) to learn multiple tasks using the concept of continual learning potentially better mimics human brain functions. However, current continual learning studies for medical image segmentation are mostly limited to single-modality images at identical anatomical locations.
Purpose: To propose and evaluate a continual learning method termed eHAT (enhanced hard attention to the task) for performing multi-modality, multiorgan segmentation tasks using a DNN.
Methods: Four public datasets covering the lumbar spine, heart, and brain acquired by magnetic resonance imaging (MRI) and computed tomography (CT) were included to segment the vertebral bodies, the right ventricle, and brain tumors, respectively. Three-task (spine CT, heart MRI, and brain MRI) and four-task (spine CT, heart MRI, brain MRI, and spine MRI) models were tested for eHAT, with the three-task results compared with state-of-the-art continual learning methods. The effectiveness of multitask performance was measured using the forgetting rate, defined as the average difference in Dice coefficients and Hausdorff distances between multiple-task and single-task models. The ability to transfer knowledge to different tasks was evaluated using backward transfer (BWT).
Results: The forgetting rates were -2.51% to -0.60% for the three-task eHAT models with varying task orders, substantially better than the -18.13% to -3.59% using original hard attention to the task (HAT), while those in four-task models were -2.54% to -1.59%. In addition, four-task U-net models with eHAT using only half the number of channels (1/4 parameters) yielded nearly equal performance with or without regularization. A retrospective model comparison showed that eHAT with fixed or automatic regularization had significantly superior BWT (-3% to 0%) compared to HAT (-22% to -4%).
Conclusion: We demonstrate for the first time that eHAT effectively achieves continual learning of multi-modality, multiorgan segmentation tasks using a single DNN, with improved forgetting rates compared with HAT.