Endoscopy-based IBD identification by a quantized deep learning pipeline.

IF 6.1 3区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Biodata Mining Pub Date : 2023-11-25 DOI:10.1186/s13040-023-00350-0

Massimiliano Datres, Elisa Paolazzi, Marco Chierici, Matteo Pozzi, Antonio Colangelo, Marcello Dorian Donzella, Giuseppe Jurman

{"title":"Endoscopy-based IBD identification by a quantized deep learning pipeline.","authors":"Massimiliano Datres, Elisa Paolazzi, Marco Chierici, Matteo Pozzi, Antonio Colangelo, Marcello Dorian Donzella, Giuseppe Jurman","doi":"10.1186/s13040-023-00350-0","DOIUrl":null,"url":null,"abstract":"Background: Discrimination between patients affected by inflammatory bowel diseases and healthy controls on the basis of endoscopic imaging is an challenging problem for machine learning models. Such task is used here as the testbed for a novel deep learning classification pipeline, powered by a set of solutions enhancing characterising elements such as reproducibility, interpretability, reduced computational workload, bias-free modeling and careful image preprocessing.Results: First, an automatic preprocessing procedure is devised, aimed to remove artifacts from clinical data, feeding then the resulting images to an aggregated per-patient model to mimic the clinicians decision process. The predictions are based on multiple snapshots obtained through resampling, reducing the risk of misleading outcomes by removing the low confidence predictions. Each patient's outcome is explained by returning the images the prediction is based upon, supporting clinicians in verifying diagnoses without the need for evaluating the full set of endoscopic images. As a major theoretical contribution, quantization is employed to reduce the complexity and the computational cost of the model, allowing its deployment on small power devices with an almost negligible 3% performance degradation. Such quantization procedure holds relevance not only in the context of per-patient models but also for assessing its feasibility in providing real-time support to clinicians even in low-resources environments. The pipeline is demonstrated on a private dataset of endoscopic images of 758 IBD patients and 601 healthy controls, achieving Matthews Correlation Coefficient 0.9 as top performance on test set.Conclusion: We highlighted how a comprehensive pre-processing pipeline plays a crucial role in identifying and removing artifacts from data, solving one of the principal challenges encountered when working with clinical data. Furthermore, we constructively showed how it is possible to emulate clinicians decision process and how it offers significant advantages, particularly in terms of explainability and trust within the healthcare context. Last but not least, we proved that quantization can be a useful tool to reduce the time and resources consumption with an acceptable degradation of the model performs. The quantization study proposed in this work points up the potential development of real-time quantized algorithms as valuable tools to support clinicians during endoscopy procedures.","PeriodicalId":48947,"journal":{"name":"Biodata Mining","volume":"16 1","pages":"33"},"PeriodicalIF":6.1000,"publicationDate":"2023-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10675910/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biodata Mining","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13040-023-00350-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Discrimination between patients affected by inflammatory bowel diseases and healthy controls on the basis of endoscopic imaging is an challenging problem for machine learning models. Such task is used here as the testbed for a novel deep learning classification pipeline, powered by a set of solutions enhancing characterising elements such as reproducibility, interpretability, reduced computational workload, bias-free modeling and careful image preprocessing.

Results: First, an automatic preprocessing procedure is devised, aimed to remove artifacts from clinical data, feeding then the resulting images to an aggregated per-patient model to mimic the clinicians decision process. The predictions are based on multiple snapshots obtained through resampling, reducing the risk of misleading outcomes by removing the low confidence predictions. Each patient's outcome is explained by returning the images the prediction is based upon, supporting clinicians in verifying diagnoses without the need for evaluating the full set of endoscopic images. As a major theoretical contribution, quantization is employed to reduce the complexity and the computational cost of the model, allowing its deployment on small power devices with an almost negligible 3% performance degradation. Such quantization procedure holds relevance not only in the context of per-patient models but also for assessing its feasibility in providing real-time support to clinicians even in low-resources environments. The pipeline is demonstrated on a private dataset of endoscopic images of 758 IBD patients and 601 healthy controls, achieving Matthews Correlation Coefficient 0.9 as top performance on test set.

Conclusion: We highlighted how a comprehensive pre-processing pipeline plays a crucial role in identifying and removing artifacts from data, solving one of the principal challenges encountered when working with clinical data. Furthermore, we constructively showed how it is possible to emulate clinicians decision process and how it offers significant advantages, particularly in terms of explainability and trust within the healthcare context. Last but not least, we proved that quantization can be a useful tool to reduce the time and resources consumption with an acceptable degradation of the model performs. The quantization study proposed in this work points up the potential development of real-time quantized algorithms as valuable tools to support clinicians during endoscopy procedures.

Abstract Image

查看原文本刊更多论文

基于内窥镜的IBD量化深度学习管道识别。

背景:基于内镜成像区分炎症性肠病患者和健康对照者是机器学习模型面临的一个具有挑战性的问题。这种任务在这里被用作新型深度学习分类管道的测试平台，由一组解决方案提供支持，这些解决方案增强了再现性、可解释性、减少计算工作量、无偏见建模和仔细的图像预处理等特征元素。结果:首先，设计了一个自动预处理程序，旨在从临床数据中去除伪影，然后将生成的图像输入到汇总的每个患者模型中，以模拟临床医生的决策过程。预测基于通过重新采样获得的多个快照，通过去除低置信度预测来降低误导性结果的风险。通过返回预测所基于的图像来解释每个患者的结果，支持临床医生验证诊断，而无需评估全套内窥镜图像。作为主要的理论贡献，量化被用于降低模型的复杂性和计算成本，允许其部署在小功率器件上，几乎可以忽略3%的性能下降。这种量化程序不仅在每个患者模型的背景下具有相关性，而且在评估其在低资源环境中为临床医生提供实时支持的可行性时也具有相关性。该管道在一个包含758名IBD患者和601名健康对照者的内窥镜图像的私有数据集上进行了演示，在测试集上达到了马修斯相关系数0.9的最佳性能。结论:我们强调了一个全面的预处理管道如何在识别和去除数据中的伪像方面发挥关键作用，解决了处理临床数据时遇到的主要挑战之一。此外，我们建设性地展示了如何模拟临床医生的决策过程，以及它如何提供显著的优势，特别是在医疗保健环境中的可解释性和信任方面。最后但并非最不重要的是，我们证明了量化可以是一个有用的工具，可以在可接受的模型性能下降的情况下减少时间和资源消耗。在这项工作中提出的量化研究指出了实时量化算法的潜在发展，作为有价值的工具，在内窥镜检查过程中支持临床医生。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Biodata Mining MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

7.90

自引率

0.00%

发文量

审稿时长

23 weeks

期刊介绍： BioData Mining is an open access, open peer-reviewed journal encompassing research on all aspects of data mining applied to high-dimensional biological and biomedical data, focusing on computational aspects of knowledge discovery from large-scale genetic, transcriptomic, genomic, proteomic, and metabolomic data. Topical areas include, but are not limited to: -Development, evaluation, and application of novel data mining and machine learning algorithms. -Adaptation, evaluation, and application of traditional data mining and machine learning algorithms. -Open-source software for the application of data mining and machine learning algorithms. -Design, development and integration of databases, software and web services for the storage, management, retrieval, and analysis of data from large scale studies. -Pre-processing, post-processing, modeling, and interpretation of data mining and machine learning results for biological interpretation and knowledge discovery.