The effect of data diversity on the performance of deep learning models for predicting early gastric cancer under endoscopy

Journal of Digital Health Pub Date : 2022-02-21 DOI:10.55976/jdh.1202214319-24

Conghui Shi, Jia Li, Lianlian Wu

{"title":"The effect of data diversity on the performance of deep learning models for predicting early gastric cancer under endoscopy","authors":"Conghui Shi, Jia Li, Lianlian Wu","doi":"10.55976/jdh.1202214319-24","DOIUrl":null,"url":null,"abstract":" \nAims: This study aimed to explore the effect of training set diversity on the performance of deep learning models for predicting early gastric cancer (EGC) under endoscopy.\nMethods: Images of EGC and non-cancerous lesions under narrow-band imaging (ME-NBI) and magnifying blue laser imaging (ME-BLI) were retrospectively collected. Training set 1 was composed of 150 non-cancerous and 309 EGC ME-NBI images, training set 2 was composed of 1505 non-cancerous and 309 EGC ME-BLI images, and training set 3 was the combination of training set 1 and 2. Test set 1 was composed of 376 non-cancerous and 1052 EGC ME-NBI images, test set 2 consisted of 529 non-cancerous and 71 EGC ME-BLI images, and test set 3 was the combination of test set 1 and test set 2. Three deep learning models, convolutional neural network (CNN) 1, CNN 2 and CNN 3 (CNN 1, CNN 2 and CNN 3 were independently trained using training set 1, training set 2 and training set 3, respectively), were constructed, and their performances on each test set were respectively evaluated. One hundred and thirty-eight ME-NBI videos and 17 ME-BLI videos were further collected to evaluate and compare the performance of each model in real time.\nResults: On the whole, the performance of CNN 3 was the best. The accuracy (Acc), sensitivity (Sn), specificity (Sp) and area under the curve (AUC) of test set 1 in CNN 3 were 87.89% (1255/1428), 90.96% (342/376), 86.79% (913/1052) and 94.60%, respectively. The Acc, Sn, Sp and AUC of test set 2 in CNN 3 were 95% (570/600), 97.92% (518/529), 73.24% (52/71) and 90.93% respectively. The Acc, Sn, Sp and AUC of test set 3 in CNN 3 were 89.99% (1825/2028), 95.03% (860/905), 85.93% (965/1123) and 94.89%, respectively. The performance of CNN 3 was also the best in videos test set. The Acc, Sn and Sp of videos test set in CNN 3 were 91.03% (142/156), 90.58% (125/138) and 94.44% (17/18), respectively.\nConclusions: The deep learning model with the most diverse training data has the best diagnostic effect.","PeriodicalId":131334,"journal":{"name":"Journal of Digital Health","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.55976/jdh.1202214319-24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Aims: This study aimed to explore the effect of training set diversity on the performance of deep learning models for predicting early gastric cancer (EGC) under endoscopy. Methods: Images of EGC and non-cancerous lesions under narrow-band imaging (ME-NBI) and magnifying blue laser imaging (ME-BLI) were retrospectively collected. Training set 1 was composed of 150 non-cancerous and 309 EGC ME-NBI images, training set 2 was composed of 1505 non-cancerous and 309 EGC ME-BLI images, and training set 3 was the combination of training set 1 and 2. Test set 1 was composed of 376 non-cancerous and 1052 EGC ME-NBI images, test set 2 consisted of 529 non-cancerous and 71 EGC ME-BLI images, and test set 3 was the combination of test set 1 and test set 2. Three deep learning models, convolutional neural network (CNN) 1, CNN 2 and CNN 3 (CNN 1, CNN 2 and CNN 3 were independently trained using training set 1, training set 2 and training set 3, respectively), were constructed, and their performances on each test set were respectively evaluated. One hundred and thirty-eight ME-NBI videos and 17 ME-BLI videos were further collected to evaluate and compare the performance of each model in real time. Results: On the whole, the performance of CNN 3 was the best. The accuracy (Acc), sensitivity (Sn), specificity (Sp) and area under the curve (AUC) of test set 1 in CNN 3 were 87.89% (1255/1428), 90.96% (342/376), 86.79% (913/1052) and 94.60%, respectively. The Acc, Sn, Sp and AUC of test set 2 in CNN 3 were 95% (570/600), 97.92% (518/529), 73.24% (52/71) and 90.93% respectively. The Acc, Sn, Sp and AUC of test set 3 in CNN 3 were 89.99% (1825/2028), 95.03% (860/905), 85.93% (965/1123) and 94.89%, respectively. The performance of CNN 3 was also the best in videos test set. The Acc, Sn and Sp of videos test set in CNN 3 were 91.03% (142/156), 90.58% (125/138) and 94.44% (17/18), respectively. Conclusions: The deep learning model with the most diverse training data has the best diagnostic effect.

查看原文本刊更多论文

数据多样性对内镜下早期胃癌预测深度学习模型性能的影响

目的:本研究旨在探讨训练集多样性对内镜下早期胃癌(EGC)预测深度学习模型性能的影响。方法:回顾性收集EGC及非癌性病变的窄带成像(ME-NBI)和放大蓝光成像(ME-BLI)图像。训练集1由150张非癌性和309张EGC ME-NBI图像组成，训练集2由1505张非癌性和309张EGC ME-BLI图像组成，训练集3是训练集1和2的组合。测试集1由376张非癌性和1052张EGC ME-NBI图像组成，测试集2由529张非癌性和71张EGC ME-BLI图像组成，测试集3是测试集1和测试集2的组合。构建了卷积神经网络(CNN) 1、CNN 2和CNN 3三个深度学习模型(CNN 1、CNN 2和CNN 3分别使用训练集1、训练集2和训练集3独立训练)，并分别对其在每个测试集上的性能进行了评价。进一步收集了138个ME-NBI视频和17个ME-BLI视频，实时评价和比较各模型的性能。结果:整体来看，CNN 3的表现最好。CNN 3中测试集1的准确度(Acc)、灵敏度(Sn)、特异度(Sp)和曲线下面积(AUC)分别为87.89%(1255/1428)、90.96%(342/376)、86.79%(913/1052)和94.60%。cnn3中测试集2的Acc、Sn、Sp和AUC分别为95%(570/600)、97.92%(518/529)、73.24%(52/71)和90.93%。CNN 3中测试集3的Acc、Sn、Sp和AUC分别为89.99%(1825/2028)、95.03%(860/905)、85.93%(965/1123)和94.89%。在视频测试集中，CNN 3的表现也是最好的。CNN 3视频测试集的Acc、Sn和Sp分别为91.03%(142/156)、90.58%(125/138)和94.44%(17/18)。结论:训练数据最多样化的深度学习模型诊断效果最好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Digital Health

自引率

0.00%

发文量