Jieyu Hu, E. Smistad, I. M. Salte, H. Dalen, L. Løvstakken
{"title":"Exploiting temporal information in echocardiography for improved image segmentation","authors":"Jieyu Hu, E. Smistad, I. M. Salte, H. Dalen, L. Løvstakken","doi":"10.1109/IUS54386.2022.9958670","DOIUrl":null,"url":null,"abstract":"Echocardiography is based on evaluating cineloops, where the temporal information is important for diagnosis. This information is seldom fully utilized in image analyses based on deep learning due to the massive manual annotation work required. In this work, we investigate the use of temporal information for the left heart segmentation throughout the cardiac cycle, both to enhance the training of simpler networks and for spatiotemporal neural networks to ensure consistent segmentation over time. Fully annotated cineloops were achieved in a semi-supervised manner, using pseudo-labeling from a network trained using limited annotations from the cardiac cycle. A temporal outlier removal method was developed to avoid artefact annotations. The study used $\\mathbf{N}\\boldsymbol{=174}$ recordings with A2C, A3C, and A4C views annotated at 7 frames, targeted at ES/ED and challenging cardiac cycle time points, with a testing set of $\\mathbf{N}\\boldsymbol{=25}$. We compared the performance of non-temporal U-Net segmentation trained with and without fully annotated cineloops, and by adding convLSTM layers in various configurations (encoder/decoder) to improve temporal consistency. Compared to the baseline U-Net trained at ES/ED, adding extra annotations targeted at time points with typical issues (e.g. valve opening), reduced outliers significantly and improved the average Dice. The fully automated pseudo-labeling exploited all frames, reduced outliers, and increased Dice to the same level as extra manual annotations. This approach also enabled the training of spatiotemporal networks. Adding convLSTM layers at each level in the encoder provided the best results.","PeriodicalId":272387,"journal":{"name":"2022 IEEE International Ultrasonics Symposium (IUS)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Ultrasonics Symposium (IUS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IUS54386.2022.9958670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Echocardiography is based on evaluating cineloops, where the temporal information is important for diagnosis. This information is seldom fully utilized in image analyses based on deep learning due to the massive manual annotation work required. In this work, we investigate the use of temporal information for the left heart segmentation throughout the cardiac cycle, both to enhance the training of simpler networks and for spatiotemporal neural networks to ensure consistent segmentation over time. Fully annotated cineloops were achieved in a semi-supervised manner, using pseudo-labeling from a network trained using limited annotations from the cardiac cycle. A temporal outlier removal method was developed to avoid artefact annotations. The study used $\mathbf{N}\boldsymbol{=174}$ recordings with A2C, A3C, and A4C views annotated at 7 frames, targeted at ES/ED and challenging cardiac cycle time points, with a testing set of $\mathbf{N}\boldsymbol{=25}$. We compared the performance of non-temporal U-Net segmentation trained with and without fully annotated cineloops, and by adding convLSTM layers in various configurations (encoder/decoder) to improve temporal consistency. Compared to the baseline U-Net trained at ES/ED, adding extra annotations targeted at time points with typical issues (e.g. valve opening), reduced outliers significantly and improved the average Dice. The fully automated pseudo-labeling exploited all frames, reduced outliers, and increased Dice to the same level as extra manual annotations. This approach also enabled the training of spatiotemporal networks. Adding convLSTM layers at each level in the encoder provided the best results.