{"title":"MetaPath Chat: multimodal generative artificial intelligence chatbot for clinical pathology","authors":"Haizhu Chen, Ruichong Lin, Yu Yunfang","doi":"10.1002/mco2.769","DOIUrl":null,"url":null,"abstract":"<p>Recently, two pivotal studies, one published in <i>Nature</i><span><sup>1</sup></span> and another in <i>Cell</i>,<span><sup>2</sup></span> present groundbreaking advancements that are set to revolutionize artificial intelligence (AI) in pathology. The first study introduced PathChat, a multimodal generative AI assistant for human pathology.<span><sup>1</sup></span> The second study unveiled TriPath, a weakly supervised AI model designed for analyzing three-dimensional (3D) pathology samples and predicting patient outcomes.<span><sup>2</sup></span> These findings highlight the potential of AI to revolutionize pathology by enhancing diagnostic and prognostic accuracy and enabling new forms of human–machine collaboration.</p><p>Lu et al.<span><sup>1</sup></span> introduced PathChat, an AI assistant designed to aid pathologists in diagnostic workflows (Figure 1). PathChat integrated a vision-language model that combined a pretrained vision encoder with a large language model, fine-tuned on over 456,916 visual-language instruction, encompassing 999,202 question–answer turns. The vision encoder, based on the UNI architecture, was pretrained on over 100 million histology image patches from over 100,000 slides and further refined with 1.18 million pathology image-caption pairs.</p><p>The performance of PathChat was rigorously evaluated against state-of-the-art multimodal AI assistants, including LLaVA 1.5, LLaVA-Med, and GPT-4 V.<span><sup>1</sup></span> First, evaluations focused on multiple-choice diagnostic questions using routine H&E whole slide images (WSIs) from both The Cancer Genome Atlas and an in-house pathology archive, covering 54 diagnoses from 11 major pathology practices and organ sites. Evaluations were conducted in two settings: image-only and image with clinical context. PathChat outperformed its counterparts in diagnostic accuracy for multiple-choice questions. Specifically, in the image-only setting, PathChat achieved a 78.1% accuracy (+52.4% vs. LLaVA 1.5 and +63.8% vs. LLaVA-Med, both <i>p</i> < 0.001). In the image with clinical context setting, PathChat's accuracy improved to 89.5% (+39.0% vs. LLaVA 1.5 and +60.9% vs. LLaVA-Med, both <i>p</i> < 0.001), demonstrating its ability to leverage multimodal information effectively. Moreover, PathChat outperformed GPT-4 V in both image-only (78.8 vs. 25%) and image with clinical context (90.5% vs. 63.5%) settings, highlighting its superior diagnostic accuracy.</p><p>Furthermore, Lu and colleagues<span><sup>1</sup></span> assessed the ability of PathChat to generate coherent, clinically relevant responses to open-ended pathology-related questions. Seven expert pathologists ranked the responses of different models based on relevance, correctness, and explanation quality. PathChat produced more preferable responses compared with other multimodal large language models (MLLMs), with a median win rate of 56.5%, 67.7%, and 74.2%, respectively, against GPT-4 V, LLaVA 1.5, and LLaVA-Med. Importantly, PathChat also supported interactive, multiturn conversations, making it a versatile tool for education, research, and clinical decision-making.</p><p>In discussing the clinical contributions of PathChat, Lu and colleagues<span><sup>1</sup></span> also scrutinized its limitations and suggested directions for further research. The training data for PathChat, although extensive, were derived from retrospective datasets, which might contain outdated information. Consequently, continuous updates to the training data and model alignment with current practices are necessary to maintain accuracy and relevance. Moreover, future research could enhance PathChat's capabilities by extending support for WSIs, incorporating reinforcement learning from human feedback, and developing functionalities like precise counting or localization of objects within images.</p><p>In traditional histopathology, the reliance on 2D cross-sections often fails to capture critical spatial information present in 3D structures. Song et al.<span><sup>2</sup></span> addressed this limitation by developing TriPath, a deep learning model leveraging weakly supervised AI for analyzing 3D pathology samples (Figure 1). The weakly supervised AI has been confirmed to successfully identify critical pathological features with minimal manual labeling, showcasing performance on par with, and in some cases superior to, fully supervised methods. TriPath employed a combination of convolutional neural networks and transformer architectures to process volumetric data, segmenting large tissues into smaller 2D or 3D patches and summarizing them into low-dimensional feature vectors for patient-level risk prediction. Trained on a large dataset of annotated 3D pathology samples, TriPath demonstrated high accuracy in identifying various pathological conditions.</p><p>The study tested the utility of TriPath for risk stratification using prostate cancer specimens imaged with difference 3D modalities, including open-top light-sheet microscopy and microcomputed tomography.<span><sup>2</sup></span> TriPath consistently outperformed traditional 2D slice-based approaches and even clinical baselines assessed by certified pathologists, effectively reducing variability in risk prediction caused by heterogeneous tissue structures.</p><p>The implementation of TriPath could streamline diagnostic processes, enabling more efficient and accurate analysis of 3D samples. Additionally, by minimizing the need for large volumes of labeled data, this approach could significantly reduce the resources required for AI training, making it accessible to more institutions. However, TriPath relied on high-quality serial sections, which might not always be available. Additionally, the substantial computational resources required for 3D reconstruction may limit its widespread adoption in resource-constrained settings. Future research should focus on improving the accessibility and efficiency of 3D reconstruction techniques, developing methods to handle lower-quality samples and reducing computational demands.</p><p>The application of novel AI technologies into clinical practice holds immense potential.<span><sup>3, 4</sup></span> However, pathology practices currently face several challenges, including diagnostic variability, time-intensive manual slide examinations, and the limitations of analyzing 2D cross-sections. These challenges not only impact diagnostic accuracy but can also delay the treatment decisions. The integration of novel AI technologies like PathChat and TriPath into clinical practice offers significant potential to address these challenges. As a multimodal AI assistant, PathChat enhances diagnostic accuracy by integrating visual and textual data. Meanwhile, TriPath, as a weakly supervised AI model, focused on analyzing 3D pathology samples, providing more comprehensive tissue analysis, facilitating risk stratification, and reducing diagnostic variability.</p><p>The implementation of PathChat and TriPath in clinical pathology can optimize workflow by automating routine diagnostic tasks, enabling pathologists to concentrate on more challenging cases. This technological synergy not only increases efficiency but also addresses variability in diagnoses caused by limited experience among pathologists. Additionally, these novel AI technologies contribute to reducing diagnostic turnaround times, facilitating quicker treatment decisions, and ultimately improving patient outcomes. Future research should focus on further refining these models, incorporating more diverse datasets, and exploring real-world applications. Additionally, adding additional data types, such as proteomics, genomics, and radiology, could create even more comprehensive and accurate diagnostic tools.<span><sup>5</sup></span></p><p>Moreover, integrating multimodal generative AI with weakly supervised AI may create a synergistic effect, leading to more accurate and comprehensive tools. Herein, we propose the concept of a novel model, MetaPath Chat, based on MLLMs (Figure 1). Such an integrated system is designed to comprehensively process various forms of pathological images and can leverage the strengths of each technology: the ability of multimodal generative AI to provide context-aware and interactive responses, and the efficiency of weakly supervised AI in extracting meaningful features from large datasets without extensive annotations. It may offer the potential for more accurate diagnostics, optimized patient prognostication, enhanced educational tools, and accelerated research. Future research should strive to achieve the construction and application of MetaPath Chat, leveraging the strengths of integrated models to enhance performance and broaden its applicability.</p><p>Altogether, these two studies represent significant advancements in the application of AI to pathology. PathChat provides a powerful multimodal tool that enhances diagnostic workflows and supports educational and research activities, while TriPath brings a new level of detail and accuracy to 3D tissue analysis. Future research should focus on making these technologies more accessible and efficient, ensuring their broad implementation to improve patient outcomes and advance our understanding of various diseases.</p><p>Haizhu Chen and Ruichong Lin conceived, drafted, and revised the manuscript. Yunfang Yu conceived and revised the manuscript. All authors have read and approved the final manuscript.</p><p>The authors declare no conflict of interest.</p><p>Not available.</p>","PeriodicalId":94133,"journal":{"name":"MedComm","volume":null,"pages":null},"PeriodicalIF":10.7000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mco2.769","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MedComm","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mco2.769","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, two pivotal studies, one published in Nature1 and another in Cell,2 present groundbreaking advancements that are set to revolutionize artificial intelligence (AI) in pathology. The first study introduced PathChat, a multimodal generative AI assistant for human pathology.1 The second study unveiled TriPath, a weakly supervised AI model designed for analyzing three-dimensional (3D) pathology samples and predicting patient outcomes.2 These findings highlight the potential of AI to revolutionize pathology by enhancing diagnostic and prognostic accuracy and enabling new forms of human–machine collaboration.
Lu et al.1 introduced PathChat, an AI assistant designed to aid pathologists in diagnostic workflows (Figure 1). PathChat integrated a vision-language model that combined a pretrained vision encoder with a large language model, fine-tuned on over 456,916 visual-language instruction, encompassing 999,202 question–answer turns. The vision encoder, based on the UNI architecture, was pretrained on over 100 million histology image patches from over 100,000 slides and further refined with 1.18 million pathology image-caption pairs.
The performance of PathChat was rigorously evaluated against state-of-the-art multimodal AI assistants, including LLaVA 1.5, LLaVA-Med, and GPT-4 V.1 First, evaluations focused on multiple-choice diagnostic questions using routine H&E whole slide images (WSIs) from both The Cancer Genome Atlas and an in-house pathology archive, covering 54 diagnoses from 11 major pathology practices and organ sites. Evaluations were conducted in two settings: image-only and image with clinical context. PathChat outperformed its counterparts in diagnostic accuracy for multiple-choice questions. Specifically, in the image-only setting, PathChat achieved a 78.1% accuracy (+52.4% vs. LLaVA 1.5 and +63.8% vs. LLaVA-Med, both p < 0.001). In the image with clinical context setting, PathChat's accuracy improved to 89.5% (+39.0% vs. LLaVA 1.5 and +60.9% vs. LLaVA-Med, both p < 0.001), demonstrating its ability to leverage multimodal information effectively. Moreover, PathChat outperformed GPT-4 V in both image-only (78.8 vs. 25%) and image with clinical context (90.5% vs. 63.5%) settings, highlighting its superior diagnostic accuracy.
Furthermore, Lu and colleagues1 assessed the ability of PathChat to generate coherent, clinically relevant responses to open-ended pathology-related questions. Seven expert pathologists ranked the responses of different models based on relevance, correctness, and explanation quality. PathChat produced more preferable responses compared with other multimodal large language models (MLLMs), with a median win rate of 56.5%, 67.7%, and 74.2%, respectively, against GPT-4 V, LLaVA 1.5, and LLaVA-Med. Importantly, PathChat also supported interactive, multiturn conversations, making it a versatile tool for education, research, and clinical decision-making.
In discussing the clinical contributions of PathChat, Lu and colleagues1 also scrutinized its limitations and suggested directions for further research. The training data for PathChat, although extensive, were derived from retrospective datasets, which might contain outdated information. Consequently, continuous updates to the training data and model alignment with current practices are necessary to maintain accuracy and relevance. Moreover, future research could enhance PathChat's capabilities by extending support for WSIs, incorporating reinforcement learning from human feedback, and developing functionalities like precise counting or localization of objects within images.
In traditional histopathology, the reliance on 2D cross-sections often fails to capture critical spatial information present in 3D structures. Song et al.2 addressed this limitation by developing TriPath, a deep learning model leveraging weakly supervised AI for analyzing 3D pathology samples (Figure 1). The weakly supervised AI has been confirmed to successfully identify critical pathological features with minimal manual labeling, showcasing performance on par with, and in some cases superior to, fully supervised methods. TriPath employed a combination of convolutional neural networks and transformer architectures to process volumetric data, segmenting large tissues into smaller 2D or 3D patches and summarizing them into low-dimensional feature vectors for patient-level risk prediction. Trained on a large dataset of annotated 3D pathology samples, TriPath demonstrated high accuracy in identifying various pathological conditions.
The study tested the utility of TriPath for risk stratification using prostate cancer specimens imaged with difference 3D modalities, including open-top light-sheet microscopy and microcomputed tomography.2 TriPath consistently outperformed traditional 2D slice-based approaches and even clinical baselines assessed by certified pathologists, effectively reducing variability in risk prediction caused by heterogeneous tissue structures.
The implementation of TriPath could streamline diagnostic processes, enabling more efficient and accurate analysis of 3D samples. Additionally, by minimizing the need for large volumes of labeled data, this approach could significantly reduce the resources required for AI training, making it accessible to more institutions. However, TriPath relied on high-quality serial sections, which might not always be available. Additionally, the substantial computational resources required for 3D reconstruction may limit its widespread adoption in resource-constrained settings. Future research should focus on improving the accessibility and efficiency of 3D reconstruction techniques, developing methods to handle lower-quality samples and reducing computational demands.
The application of novel AI technologies into clinical practice holds immense potential.3, 4 However, pathology practices currently face several challenges, including diagnostic variability, time-intensive manual slide examinations, and the limitations of analyzing 2D cross-sections. These challenges not only impact diagnostic accuracy but can also delay the treatment decisions. The integration of novel AI technologies like PathChat and TriPath into clinical practice offers significant potential to address these challenges. As a multimodal AI assistant, PathChat enhances diagnostic accuracy by integrating visual and textual data. Meanwhile, TriPath, as a weakly supervised AI model, focused on analyzing 3D pathology samples, providing more comprehensive tissue analysis, facilitating risk stratification, and reducing diagnostic variability.
The implementation of PathChat and TriPath in clinical pathology can optimize workflow by automating routine diagnostic tasks, enabling pathologists to concentrate on more challenging cases. This technological synergy not only increases efficiency but also addresses variability in diagnoses caused by limited experience among pathologists. Additionally, these novel AI technologies contribute to reducing diagnostic turnaround times, facilitating quicker treatment decisions, and ultimately improving patient outcomes. Future research should focus on further refining these models, incorporating more diverse datasets, and exploring real-world applications. Additionally, adding additional data types, such as proteomics, genomics, and radiology, could create even more comprehensive and accurate diagnostic tools.5
Moreover, integrating multimodal generative AI with weakly supervised AI may create a synergistic effect, leading to more accurate and comprehensive tools. Herein, we propose the concept of a novel model, MetaPath Chat, based on MLLMs (Figure 1). Such an integrated system is designed to comprehensively process various forms of pathological images and can leverage the strengths of each technology: the ability of multimodal generative AI to provide context-aware and interactive responses, and the efficiency of weakly supervised AI in extracting meaningful features from large datasets without extensive annotations. It may offer the potential for more accurate diagnostics, optimized patient prognostication, enhanced educational tools, and accelerated research. Future research should strive to achieve the construction and application of MetaPath Chat, leveraging the strengths of integrated models to enhance performance and broaden its applicability.
Altogether, these two studies represent significant advancements in the application of AI to pathology. PathChat provides a powerful multimodal tool that enhances diagnostic workflows and supports educational and research activities, while TriPath brings a new level of detail and accuracy to 3D tissue analysis. Future research should focus on making these technologies more accessible and efficient, ensuring their broad implementation to improve patient outcomes and advance our understanding of various diseases.
Haizhu Chen and Ruichong Lin conceived, drafted, and revised the manuscript. Yunfang Yu conceived and revised the manuscript. All authors have read and approved the final manuscript.