{"title":"WEO Newsletter: The Impact of Artificial Intelligence on Management of Inflammatory Bowel Disease: An Expert Commentary","authors":"","doi":"10.1111/den.15072","DOIUrl":null,"url":null,"abstract":"<p>By Nayantara Coelho-Prabhu, MD FACG AGAF FASGE, Mayo Clinic Rochester</p><p>The complexity of IBD, including both Crohn's disease (CD) and ulcerative colitis (UC), lies in its heterogeneity in presentation, unpredictable disease course, and varying responses to therapy. Current approaches rely on a combination of clinical indices, imaging, endoscopy, histology, and biomarkers—many of which are subjective and variably interpreted. This subjectivity results in difficulties with establishing standards of care, and often is the root cause of complications. Also, there is an increasing focus on achieving healing in IBD across all aspects of the disease including clinical, radiologic, endoscopic and histologic (STRIDE-II). To achieve this, we must establish standardization across these targets. These challenges present a fertile ground for AI applications aimed at improving accuracy, efficiency, and personalization in IBD management.</p><p>Endoscopic assessment remains central to IBD diagnosis and monitoring. However, the qualitative nature of inflammation scoring and interobserver variability in all scoring systems such as the Mayo Endoscopic Score or SES-CD has long plagued clinical and research settings. This has been the impetus to develop automated scoring systems that aim to standardize these scores. The first iteration of these models used still images to train convoluted neural networks (CNNs) and then reported on their successful scoring of test data still images. These systems utilized expert scoring as the gold standard, and they were found to have excellent performance in distinguishing Mayo 0-1 from Mayo 2-3 scores, similar to human experts. The next step was that CNNs were trained to read video segments, obtained from pharmaceutical randomized trials that had captured video segments, scored by central readers. Because the earlier systems were compared to human gold standard, which has low interoperator agreement, the next step in this evolution was to consider disease outcome as a measure of validity. Again, clinical trial videos were used and the CNNs were trained to report a cumulative disease score that was correlated with outcomes with more meaningful results. The goal is to be able to predict responders from non-responders. AI can detect subtle visual features on endoscopy, which can be harnessed to make histologic inference without the need for biopsy. Such predictive CNNs have been developed using white light images as well as enhanced imaging techniques including endocytoscopy, narrow band imaging (vascular patterns) and I-scan. These can predict relapse rates based on real-time endoscope imaging with great accuracy. In capsule enteroscopy, AI has been developed to accurately identify and quantify small bowel ulcerations, and significantly reduce capsule reading time, for both trainees and experts. These recent AI-driven computer vision tools have demonstrated the ability to automatically segment mucosal features, detect ulcerations, and quantify inflammation with high reproducibility. Deep learning models offer the potential for real-time, standardized disease activity scoring and prediction of future outcomes at the point of care.</p><p>Histological remission is emerging as a critical therapeutic goal in IBD, yet its assessment is labor-intensive and prone to subjectivity. AI algorithms trained on digital pathology slides have begun to automate the quantification of neutrophils, crypt distortion, and epithelial injury, enabling standardized application of indices like the Nancy or Robarts Histopathology Index. An algorithm to predict future phenotypic presentation of Crohn's disease from index biopsies also displays the potential of AI in IBD histology. The digitization of entire slides and the rapidly expanding computing power for big data are some factors responsible for the rapid enhancements in AI development in this field. These tools not only reduce pathologist burden but also enhance sensitivity in detecting subclinical inflammation that may precede relapse, thus guiding therapy intensification. The potential for worldwide application of such algorithms, especially in emerging nations, is exciting. However, vigilance regarding inclusion of representative data during algorithm development to avoid biases is crucial.</p><p>Patients with longstanding colitis are at increased risk for colorectal dysplasia and cancer. Surveillance colonoscopy with targeted biopsies is standard, but flat and subtle lesions often go undetected. AI-assisted endoscopy, particularly with computer-aided detection (CADe) systems, has been shown to improve adenoma detection in non-IBD screening and surveillance colonoscopy. However, in multiple studies where these CADe systems, developed on non-IBD patients, were applied to IBD surveillance colonoscopies, they did not perform well. Particularly, flat lesions and lesions in fields of active inflammation were missed with higher frequency. Hence, systems were re-trained utilizing images of dysplastic lesions from IBD surveillance colonoscopies and showed marked improvement in dysplasia detection in IBD. Hence, practitioners should be cautious while utilizing these available CADe systems directly in IBD surveillance as thus far, no commercially available system has been specifically trained or is approved to use for IBD surveillance. In the future, systems can be developed that integrate endoscopic and histologic features to stratify dysplasia risk, potentially individualizing surveillance intervals and biopsy strategies.</p><p>Cross-sectional imaging plays a vital role in assessing transmural and extramural disease, especially in Crohn's disease. Radiomics, a form of AI that extracts high-dimensional features from radiographic images, has shown promise in characterizing bowel wall thickness, vascularity, and fibrosis. Improvements in automated bowel segmentation have helped the automated extraction of Crohn's disease activity measures using both CT and MR enterographies, which in turn are used to develop algorithms for standardized reporting. When combined with clinical data, AI models can distinguish inflammatory from fibrotic strictures, a distinction critical to choosing medical versus surgical management. Deep learning tools also assist in identifying complications like fistulas and abscesses with increased accuracy and reduced interpretation time. This application of AI in medicine, like histology, has widespread implications across the world in affording democratization of high-quality care especially in areas of the world lacking in resources and expertise.</p><p>A practice-changing application of AI in IBD lies in NLP, which allows extraction of relevant information from structured and unstructured clinical narratives in electronic health records (EHRs). Machine Learning (ML) tools were first developed utilizing demographic and lab data to predict response to and adverse effects from thiopurines, and later biologic therapies. Thus, AI can support clinical decision-making by synthesizing patient history, lab values, and imaging reports into actionable insights. NLP algorithms can identify disease phenotypes, medication usage, and adverse events more efficiently than manual chart review, thereby enabling large-scale epidemiologic studies and quality improvement efforts. The utilization of large ML models to synthesize bulky multi-omics data assessing the microbiome, genetic and transcriptional data is the focus of current and future work in this field.</p><p>Large language models (LLMs) are another aspect of AI applications that are likely to transform the way we practice medicine. They are being utilized in the clinic setting to help synthesize patient encounters and facilitate accurate and concise medical documentation. There are various commercial voice-to-text solutions which record patient-provider interactions and generate documentation helping to reduce administrative burden and provider burnout. LLMs also can be harnessed to formulate diagnostic and therapeutic conclusions in the form of Chatbots. These can be patient facing where they help to answer common patient queries by utilizing generative AI. They can also be provider facing where they can be used to collate published literature and guidelines to help make recommendations for care. There has been an explosion of these technologies in just the last few years, but thoughtful review of the outputs and considerate application is the key to prevent harmful outcomes as a result of hallucinations and production of false data by these computer systems.</p><p>There are several limitations that users of these systems must be aware of so as to understand their value. Variability in endoscopic image quality, differences in equipment, and inconsistent annotation standards can affect the performance and generalizability of AI systems. Many AI studies in IBD endoscopy have shown moderate to high levels of heterogeneity, which limits the reproducibility and robustness of the results. Most studies have been conducted in controlled settings with limited external datasets, which may not reflect real-world clinical environments. The use of AI in clinical settings raises ethical and legal issues, such as data privacy, informed consent, and liability in case of diagnostic errors.</p><p>While the promise of AI in IBD is undeniable, widespread adoption will require robust validation, regulatory approval, and integration into clinical workflows. Biases, both recognized and unrecognized will need to be acknowledged by the developers of these systems to allow them to be utilized in the safest manner. Importantly, the development of transparent, explainable AI models will be critical to ensuring clinician trust and ethical deployment. Cross-disciplinary collaboration between gastroenterologists, data scientists, and engineers will be essential to translate these innovations from bench to bedside.</p><p>In conclusion, AI is poised to redefine the management of IBD by enhancing diagnostic accuracy, streamlining workflows, and supporting personalized care. As these technologies mature, they will not replace the clinician but will undoubtedly augment clinical decision-making—ushering in a new era of precision medicine in IBD.</p>","PeriodicalId":159,"journal":{"name":"Digestive Endoscopy","volume":"37 7","pages":"807-809"},"PeriodicalIF":4.7000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/den.15072","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digestive Endoscopy","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/den.15072","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
By Nayantara Coelho-Prabhu, MD FACG AGAF FASGE, Mayo Clinic Rochester
The complexity of IBD, including both Crohn's disease (CD) and ulcerative colitis (UC), lies in its heterogeneity in presentation, unpredictable disease course, and varying responses to therapy. Current approaches rely on a combination of clinical indices, imaging, endoscopy, histology, and biomarkers—many of which are subjective and variably interpreted. This subjectivity results in difficulties with establishing standards of care, and often is the root cause of complications. Also, there is an increasing focus on achieving healing in IBD across all aspects of the disease including clinical, radiologic, endoscopic and histologic (STRIDE-II). To achieve this, we must establish standardization across these targets. These challenges present a fertile ground for AI applications aimed at improving accuracy, efficiency, and personalization in IBD management.
Endoscopic assessment remains central to IBD diagnosis and monitoring. However, the qualitative nature of inflammation scoring and interobserver variability in all scoring systems such as the Mayo Endoscopic Score or SES-CD has long plagued clinical and research settings. This has been the impetus to develop automated scoring systems that aim to standardize these scores. The first iteration of these models used still images to train convoluted neural networks (CNNs) and then reported on their successful scoring of test data still images. These systems utilized expert scoring as the gold standard, and they were found to have excellent performance in distinguishing Mayo 0-1 from Mayo 2-3 scores, similar to human experts. The next step was that CNNs were trained to read video segments, obtained from pharmaceutical randomized trials that had captured video segments, scored by central readers. Because the earlier systems were compared to human gold standard, which has low interoperator agreement, the next step in this evolution was to consider disease outcome as a measure of validity. Again, clinical trial videos were used and the CNNs were trained to report a cumulative disease score that was correlated with outcomes with more meaningful results. The goal is to be able to predict responders from non-responders. AI can detect subtle visual features on endoscopy, which can be harnessed to make histologic inference without the need for biopsy. Such predictive CNNs have been developed using white light images as well as enhanced imaging techniques including endocytoscopy, narrow band imaging (vascular patterns) and I-scan. These can predict relapse rates based on real-time endoscope imaging with great accuracy. In capsule enteroscopy, AI has been developed to accurately identify and quantify small bowel ulcerations, and significantly reduce capsule reading time, for both trainees and experts. These recent AI-driven computer vision tools have demonstrated the ability to automatically segment mucosal features, detect ulcerations, and quantify inflammation with high reproducibility. Deep learning models offer the potential for real-time, standardized disease activity scoring and prediction of future outcomes at the point of care.
Histological remission is emerging as a critical therapeutic goal in IBD, yet its assessment is labor-intensive and prone to subjectivity. AI algorithms trained on digital pathology slides have begun to automate the quantification of neutrophils, crypt distortion, and epithelial injury, enabling standardized application of indices like the Nancy or Robarts Histopathology Index. An algorithm to predict future phenotypic presentation of Crohn's disease from index biopsies also displays the potential of AI in IBD histology. The digitization of entire slides and the rapidly expanding computing power for big data are some factors responsible for the rapid enhancements in AI development in this field. These tools not only reduce pathologist burden but also enhance sensitivity in detecting subclinical inflammation that may precede relapse, thus guiding therapy intensification. The potential for worldwide application of such algorithms, especially in emerging nations, is exciting. However, vigilance regarding inclusion of representative data during algorithm development to avoid biases is crucial.
Patients with longstanding colitis are at increased risk for colorectal dysplasia and cancer. Surveillance colonoscopy with targeted biopsies is standard, but flat and subtle lesions often go undetected. AI-assisted endoscopy, particularly with computer-aided detection (CADe) systems, has been shown to improve adenoma detection in non-IBD screening and surveillance colonoscopy. However, in multiple studies where these CADe systems, developed on non-IBD patients, were applied to IBD surveillance colonoscopies, they did not perform well. Particularly, flat lesions and lesions in fields of active inflammation were missed with higher frequency. Hence, systems were re-trained utilizing images of dysplastic lesions from IBD surveillance colonoscopies and showed marked improvement in dysplasia detection in IBD. Hence, practitioners should be cautious while utilizing these available CADe systems directly in IBD surveillance as thus far, no commercially available system has been specifically trained or is approved to use for IBD surveillance. In the future, systems can be developed that integrate endoscopic and histologic features to stratify dysplasia risk, potentially individualizing surveillance intervals and biopsy strategies.
Cross-sectional imaging plays a vital role in assessing transmural and extramural disease, especially in Crohn's disease. Radiomics, a form of AI that extracts high-dimensional features from radiographic images, has shown promise in characterizing bowel wall thickness, vascularity, and fibrosis. Improvements in automated bowel segmentation have helped the automated extraction of Crohn's disease activity measures using both CT and MR enterographies, which in turn are used to develop algorithms for standardized reporting. When combined with clinical data, AI models can distinguish inflammatory from fibrotic strictures, a distinction critical to choosing medical versus surgical management. Deep learning tools also assist in identifying complications like fistulas and abscesses with increased accuracy and reduced interpretation time. This application of AI in medicine, like histology, has widespread implications across the world in affording democratization of high-quality care especially in areas of the world lacking in resources and expertise.
A practice-changing application of AI in IBD lies in NLP, which allows extraction of relevant information from structured and unstructured clinical narratives in electronic health records (EHRs). Machine Learning (ML) tools were first developed utilizing demographic and lab data to predict response to and adverse effects from thiopurines, and later biologic therapies. Thus, AI can support clinical decision-making by synthesizing patient history, lab values, and imaging reports into actionable insights. NLP algorithms can identify disease phenotypes, medication usage, and adverse events more efficiently than manual chart review, thereby enabling large-scale epidemiologic studies and quality improvement efforts. The utilization of large ML models to synthesize bulky multi-omics data assessing the microbiome, genetic and transcriptional data is the focus of current and future work in this field.
Large language models (LLMs) are another aspect of AI applications that are likely to transform the way we practice medicine. They are being utilized in the clinic setting to help synthesize patient encounters and facilitate accurate and concise medical documentation. There are various commercial voice-to-text solutions which record patient-provider interactions and generate documentation helping to reduce administrative burden and provider burnout. LLMs also can be harnessed to formulate diagnostic and therapeutic conclusions in the form of Chatbots. These can be patient facing where they help to answer common patient queries by utilizing generative AI. They can also be provider facing where they can be used to collate published literature and guidelines to help make recommendations for care. There has been an explosion of these technologies in just the last few years, but thoughtful review of the outputs and considerate application is the key to prevent harmful outcomes as a result of hallucinations and production of false data by these computer systems.
There are several limitations that users of these systems must be aware of so as to understand their value. Variability in endoscopic image quality, differences in equipment, and inconsistent annotation standards can affect the performance and generalizability of AI systems. Many AI studies in IBD endoscopy have shown moderate to high levels of heterogeneity, which limits the reproducibility and robustness of the results. Most studies have been conducted in controlled settings with limited external datasets, which may not reflect real-world clinical environments. The use of AI in clinical settings raises ethical and legal issues, such as data privacy, informed consent, and liability in case of diagnostic errors.
While the promise of AI in IBD is undeniable, widespread adoption will require robust validation, regulatory approval, and integration into clinical workflows. Biases, both recognized and unrecognized will need to be acknowledged by the developers of these systems to allow them to be utilized in the safest manner. Importantly, the development of transparent, explainable AI models will be critical to ensuring clinician trust and ethical deployment. Cross-disciplinary collaboration between gastroenterologists, data scientists, and engineers will be essential to translate these innovations from bench to bedside.
In conclusion, AI is poised to redefine the management of IBD by enhancing diagnostic accuracy, streamlining workflows, and supporting personalized care. As these technologies mature, they will not replace the clinician but will undoubtedly augment clinical decision-making—ushering in a new era of precision medicine in IBD.
期刊介绍:
Digestive Endoscopy (DEN) is the official journal of the Japan Gastroenterological Endoscopy Society, the Asian Pacific Society for Digestive Endoscopy and the World Endoscopy Organization. Digestive Endoscopy serves as a medium for presenting original articles that offer significant contributions to knowledge in the broad field of endoscopy. The Journal also includes Reviews, Original Articles, How I Do It, Case Reports (only of exceptional interest and novelty are accepted), Letters, Techniques and Images, abstracts and news items that may be of interest to endoscopists.