{"title":"The effect of data diversity on the performance of deep learning models for predicting early gastric cancer under endoscopy","authors":"Conghui Shi, Jia Li, Lianlian Wu","doi":"10.55976/jdh.1202214319-24","DOIUrl":null,"url":null,"abstract":" \nAims: This study aimed to explore the effect of training set diversity on the performance of deep learning models for predicting early gastric cancer (EGC) under endoscopy.\nMethods: Images of EGC and non-cancerous lesions under narrow-band imaging (ME-NBI) and magnifying blue laser imaging (ME-BLI) were retrospectively collected. Training set 1 was composed of 150 non-cancerous and 309 EGC ME-NBI images, training set 2 was composed of 1505 non-cancerous and 309 EGC ME-BLI images, and training set 3 was the combination of training set 1 and 2. Test set 1 was composed of 376 non-cancerous and 1052 EGC ME-NBI images, test set 2 consisted of 529 non-cancerous and 71 EGC ME-BLI images, and test set 3 was the combination of test set 1 and test set 2. Three deep learning models, convolutional neural network (CNN) 1, CNN 2 and CNN 3 (CNN 1, CNN 2 and CNN 3 were independently trained using training set 1, training set 2 and training set 3, respectively), were constructed, and their performances on each test set were respectively evaluated. One hundred and thirty-eight ME-NBI videos and 17 ME-BLI videos were further collected to evaluate and compare the performance of each model in real time.\nResults: On the whole, the performance of CNN 3 was the best. The accuracy (Acc), sensitivity (Sn), specificity (Sp) and area under the curve (AUC) of test set 1 in CNN 3 were 87.89% (1255/1428), 90.96% (342/376), 86.79% (913/1052) and 94.60%, respectively. The Acc, Sn, Sp and AUC of test set 2 in CNN 3 were 95% (570/600), 97.92% (518/529), 73.24% (52/71) and 90.93% respectively. The Acc, Sn, Sp and AUC of test set 3 in CNN 3 were 89.99% (1825/2028), 95.03% (860/905), 85.93% (965/1123) and 94.89%, respectively. The performance of CNN 3 was also the best in videos test set. The Acc, Sn and Sp of videos test set in CNN 3 were 91.03% (142/156), 90.58% (125/138) and 94.44% (17/18), respectively.\nConclusions: The deep learning model with the most diverse training data has the best diagnostic effect.","PeriodicalId":131334,"journal":{"name":"Journal of Digital Health","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Digital Health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.55976/jdh.1202214319-24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Aims: This study aimed to explore the effect of training set diversity on the performance of deep learning models for predicting early gastric cancer (EGC) under endoscopy.
Methods: Images of EGC and non-cancerous lesions under narrow-band imaging (ME-NBI) and magnifying blue laser imaging (ME-BLI) were retrospectively collected. Training set 1 was composed of 150 non-cancerous and 309 EGC ME-NBI images, training set 2 was composed of 1505 non-cancerous and 309 EGC ME-BLI images, and training set 3 was the combination of training set 1 and 2. Test set 1 was composed of 376 non-cancerous and 1052 EGC ME-NBI images, test set 2 consisted of 529 non-cancerous and 71 EGC ME-BLI images, and test set 3 was the combination of test set 1 and test set 2. Three deep learning models, convolutional neural network (CNN) 1, CNN 2 and CNN 3 (CNN 1, CNN 2 and CNN 3 were independently trained using training set 1, training set 2 and training set 3, respectively), were constructed, and their performances on each test set were respectively evaluated. One hundred and thirty-eight ME-NBI videos and 17 ME-BLI videos were further collected to evaluate and compare the performance of each model in real time.
Results: On the whole, the performance of CNN 3 was the best. The accuracy (Acc), sensitivity (Sn), specificity (Sp) and area under the curve (AUC) of test set 1 in CNN 3 were 87.89% (1255/1428), 90.96% (342/376), 86.79% (913/1052) and 94.60%, respectively. The Acc, Sn, Sp and AUC of test set 2 in CNN 3 were 95% (570/600), 97.92% (518/529), 73.24% (52/71) and 90.93% respectively. The Acc, Sn, Sp and AUC of test set 3 in CNN 3 were 89.99% (1825/2028), 95.03% (860/905), 85.93% (965/1123) and 94.89%, respectively. The performance of CNN 3 was also the best in videos test set. The Acc, Sn and Sp of videos test set in CNN 3 were 91.03% (142/156), 90.58% (125/138) and 94.44% (17/18), respectively.
Conclusions: The deep learning model with the most diverse training data has the best diagnostic effect.