MetaPath Chat: multimodal generative artificial intelligence chatbot for clinical pathology

IF 10.7 Q1 MEDICINE, RESEARCH & EXPERIMENTAL

MedComm Pub Date : 2024-10-10 DOI:10.1002/mco2.769

Haizhu Chen, Ruichong Lin, Yu Yunfang

{"title":"MetaPath Chat: multimodal generative artificial intelligence chatbot for clinical pathology","authors":"Haizhu Chen, Ruichong Lin, Yu Yunfang","doi":"10.1002/mco2.769","DOIUrl":null,"url":null,"abstract":"Recently, two pivotal studies, one published in Nature1 and another in Cell,2 present groundbreaking advancements that are set to revolutionize artificial intelligence (AI) in pathology. The first study introduced PathChat, a multimodal generative AI assistant for human pathology.1 The second study unveiled TriPath, a weakly supervised AI model designed for analyzing three-dimensional (3D) pathology samples and predicting patient outcomes.2 These findings highlight the potential of AI to revolutionize pathology by enhancing diagnostic and prognostic accuracy and enabling new forms of human–machine collaboration.Lu et al.1 introduced PathChat, an AI assistant designed to aid pathologists in diagnostic workflows (Figure 1). PathChat integrated a vision-language model that combined a pretrained vision encoder with a large language model, fine-tuned on over 456,916 visual-language instruction, encompassing 999,202 question–answer turns. The vision encoder, based on the UNI architecture, was pretrained on over 100 million histology image patches from over 100,000 slides and further refined with 1.18 million pathology image-caption pairs.The performance of PathChat was rigorously evaluated against state-of-the-art multimodal AI assistants, including LLaVA 1.5, LLaVA-Med, and GPT-4 V.1 First, evaluations focused on multiple-choice diagnostic questions using routine H&E whole slide images (WSIs) from both The Cancer Genome Atlas and an in-house pathology archive, covering 54 diagnoses from 11 major pathology practices and organ sites. Evaluations were conducted in two settings: image-only and image with clinical context. PathChat outperformed its counterparts in diagnostic accuracy for multiple-choice questions. Specifically, in the image-only setting, PathChat achieved a 78.1% accuracy (+52.4% vs. LLaVA 1.5 and +63.8% vs. LLaVA-Med, both p < 0.001). In the image with clinical context setting, PathChat's accuracy improved to 89.5% (+39.0% vs. LLaVA 1.5 and +60.9% vs. LLaVA-Med, both p < 0.001), demonstrating its ability to leverage multimodal information effectively. Moreover, PathChat outperformed GPT-4 V in both image-only (78.8 vs. 25%) and image with clinical context (90.5% vs. 63.5%) settings, highlighting its superior diagnostic accuracy.Furthermore, Lu and colleagues1 assessed the ability of PathChat to generate coherent, clinically relevant responses to open-ended pathology-related questions. Seven expert pathologists ranked the responses of different models based on relevance, correctness, and explanation quality. PathChat produced more preferable responses compared with other multimodal large language models (MLLMs), with a median win rate of 56.5%, 67.7%, and 74.2%, respectively, against GPT-4 V, LLaVA 1.5, and LLaVA-Med. Importantly, PathChat also supported interactive, multiturn conversations, making it a versatile tool for education, research, and clinical decision-making.In discussing the clinical contributions of PathChat, Lu and colleagues1 also scrutinized its limitations and suggested directions for further research. The training data for PathChat, although extensive, were derived from retrospective datasets, which might contain outdated information. Consequently, continuous updates to the training data and model alignment with current practices are necessary to maintain accuracy and relevance. Moreover, future research could enhance PathChat's capabilities by extending support for WSIs, incorporating reinforcement learning from human feedback, and developing functionalities like precise counting or localization of objects within images.In traditional histopathology, the reliance on 2D cross-sections often fails to capture critical spatial information present in 3D structures. Song et al.2 addressed this limitation by developing TriPath, a deep learning model leveraging weakly supervised AI for analyzing 3D pathology samples (Figure 1). The weakly supervised AI has been confirmed to successfully identify critical pathological features with minimal manual labeling, showcasing performance on par with, and in some cases superior to, fully supervised methods. TriPath employed a combination of convolutional neural networks and transformer architectures to process volumetric data, segmenting large tissues into smaller 2D or 3D patches and summarizing them into low-dimensional feature vectors for patient-level risk prediction. Trained on a large dataset of annotated 3D pathology samples, TriPath demonstrated high accuracy in identifying various pathological conditions.The study tested the utility of TriPath for risk stratification using prostate cancer specimens imaged with difference 3D modalities, including open-top light-sheet microscopy and microcomputed tomography.2 TriPath consistently outperformed traditional 2D slice-based approaches and even clinical baselines assessed by certified pathologists, effectively reducing variability in risk prediction caused by heterogeneous tissue structures.The implementation of TriPath could streamline diagnostic processes, enabling more efficient and accurate analysis of 3D samples. Additionally, by minimizing the need for large volumes of labeled data, this approach could significantly reduce the resources required for AI training, making it accessible to more institutions. However, TriPath relied on high-quality serial sections, which might not always be available. Additionally, the substantial computational resources required for 3D reconstruction may limit its widespread adoption in resource-constrained settings. Future research should focus on improving the accessibility and efficiency of 3D reconstruction techniques, developing methods to handle lower-quality samples and reducing computational demands.The application of novel AI technologies into clinical practice holds immense potential.3, 4 However, pathology practices currently face several challenges, including diagnostic variability, time-intensive manual slide examinations, and the limitations of analyzing 2D cross-sections. These challenges not only impact diagnostic accuracy but can also delay the treatment decisions. The integration of novel AI technologies like PathChat and TriPath into clinical practice offers significant potential to address these challenges. As a multimodal AI assistant, PathChat enhances diagnostic accuracy by integrating visual and textual data. Meanwhile, TriPath, as a weakly supervised AI model, focused on analyzing 3D pathology samples, providing more comprehensive tissue analysis, facilitating risk stratification, and reducing diagnostic variability.The implementation of PathChat and TriPath in clinical pathology can optimize workflow by automating routine diagnostic tasks, enabling pathologists to concentrate on more challenging cases. This technological synergy not only increases efficiency but also addresses variability in diagnoses caused by limited experience among pathologists. Additionally, these novel AI technologies contribute to reducing diagnostic turnaround times, facilitating quicker treatment decisions, and ultimately improving patient outcomes. Future research should focus on further refining these models, incorporating more diverse datasets, and exploring real-world applications. Additionally, adding additional data types, such as proteomics, genomics, and radiology, could create even more comprehensive and accurate diagnostic tools.5Moreover, integrating multimodal generative AI with weakly supervised AI may create a synergistic effect, leading to more accurate and comprehensive tools. Herein, we propose the concept of a novel model, MetaPath Chat, based on MLLMs (Figure 1). Such an integrated system is designed to comprehensively process various forms of pathological images and can leverage the strengths of each technology: the ability of multimodal generative AI to provide context-aware and interactive responses, and the efficiency of weakly supervised AI in extracting meaningful features from large datasets without extensive annotations. It may offer the potential for more accurate diagnostics, optimized patient prognostication, enhanced educational tools, and accelerated research. Future research should strive to achieve the construction and application of MetaPath Chat, leveraging the strengths of integrated models to enhance performance and broaden its applicability.Altogether, these two studies represent significant advancements in the application of AI to pathology. PathChat provides a powerful multimodal tool that enhances diagnostic workflows and supports educational and research activities, while TriPath brings a new level of detail and accuracy to 3D tissue analysis. Future research should focus on making these technologies more accessible and efficient, ensuring their broad implementation to improve patient outcomes and advance our understanding of various diseases.Haizhu Chen and Ruichong Lin conceived, drafted, and revised the manuscript. Yunfang Yu conceived and revised the manuscript. All authors have read and approved the final manuscript.The authors declare no conflict of interest.Not available.","PeriodicalId":94133,"journal":{"name":"MedComm","volume":null,"pages":null},"PeriodicalIF":10.7000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mco2.769","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"MedComm","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mco2.769","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, RESEARCH & EXPERIMENTAL","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, two pivotal studies, one published in Nature¹ and another in Cell,² present groundbreaking advancements that are set to revolutionize artificial intelligence (AI) in pathology. The first study introduced PathChat, a multimodal generative AI assistant for human pathology.¹ The second study unveiled TriPath, a weakly supervised AI model designed for analyzing three-dimensional (3D) pathology samples and predicting patient outcomes.² These findings highlight the potential of AI to revolutionize pathology by enhancing diagnostic and prognostic accuracy and enabling new forms of human–machine collaboration.

Lu et al.¹ introduced PathChat, an AI assistant designed to aid pathologists in diagnostic workflows (Figure 1). PathChat integrated a vision-language model that combined a pretrained vision encoder with a large language model, fine-tuned on over 456,916 visual-language instruction, encompassing 999,202 question–answer turns. The vision encoder, based on the UNI architecture, was pretrained on over 100 million histology image patches from over 100,000 slides and further refined with 1.18 million pathology image-caption pairs.

The performance of PathChat was rigorously evaluated against state-of-the-art multimodal AI assistants, including LLaVA 1.5, LLaVA-Med, and GPT-4 V.¹ First, evaluations focused on multiple-choice diagnostic questions using routine H&E whole slide images (WSIs) from both The Cancer Genome Atlas and an in-house pathology archive, covering 54 diagnoses from 11 major pathology practices and organ sites. Evaluations were conducted in two settings: image-only and image with clinical context. PathChat outperformed its counterparts in diagnostic accuracy for multiple-choice questions. Specifically, in the image-only setting, PathChat achieved a 78.1% accuracy (+52.4% vs. LLaVA 1.5 and +63.8% vs. LLaVA-Med, both p < 0.001). In the image with clinical context setting, PathChat's accuracy improved to 89.5% (+39.0% vs. LLaVA 1.5 and +60.9% vs. LLaVA-Med, both p < 0.001), demonstrating its ability to leverage multimodal information effectively. Moreover, PathChat outperformed GPT-4 V in both image-only (78.8 vs. 25%) and image with clinical context (90.5% vs. 63.5%) settings, highlighting its superior diagnostic accuracy.

Furthermore, Lu and colleagues¹ assessed the ability of PathChat to generate coherent, clinically relevant responses to open-ended pathology-related questions. Seven expert pathologists ranked the responses of different models based on relevance, correctness, and explanation quality. PathChat produced more preferable responses compared with other multimodal large language models (MLLMs), with a median win rate of 56.5%, 67.7%, and 74.2%, respectively, against GPT-4 V, LLaVA 1.5, and LLaVA-Med. Importantly, PathChat also supported interactive, multiturn conversations, making it a versatile tool for education, research, and clinical decision-making.

In discussing the clinical contributions of PathChat, Lu and colleagues¹ also scrutinized its limitations and suggested directions for further research. The training data for PathChat, although extensive, were derived from retrospective datasets, which might contain outdated information. Consequently, continuous updates to the training data and model alignment with current practices are necessary to maintain accuracy and relevance. Moreover, future research could enhance PathChat's capabilities by extending support for WSIs, incorporating reinforcement learning from human feedback, and developing functionalities like precise counting or localization of objects within images.

In traditional histopathology, the reliance on 2D cross-sections often fails to capture critical spatial information present in 3D structures. Song et al.² addressed this limitation by developing TriPath, a deep learning model leveraging weakly supervised AI for analyzing 3D pathology samples (Figure 1). The weakly supervised AI has been confirmed to successfully identify critical pathological features with minimal manual labeling, showcasing performance on par with, and in some cases superior to, fully supervised methods. TriPath employed a combination of convolutional neural networks and transformer architectures to process volumetric data, segmenting large tissues into smaller 2D or 3D patches and summarizing them into low-dimensional feature vectors for patient-level risk prediction. Trained on a large dataset of annotated 3D pathology samples, TriPath demonstrated high accuracy in identifying various pathological conditions.

The study tested the utility of TriPath for risk stratification using prostate cancer specimens imaged with difference 3D modalities, including open-top light-sheet microscopy and microcomputed tomography.² TriPath consistently outperformed traditional 2D slice-based approaches and even clinical baselines assessed by certified pathologists, effectively reducing variability in risk prediction caused by heterogeneous tissue structures.

The implementation of TriPath could streamline diagnostic processes, enabling more efficient and accurate analysis of 3D samples. Additionally, by minimizing the need for large volumes of labeled data, this approach could significantly reduce the resources required for AI training, making it accessible to more institutions. However, TriPath relied on high-quality serial sections, which might not always be available. Additionally, the substantial computational resources required for 3D reconstruction may limit its widespread adoption in resource-constrained settings. Future research should focus on improving the accessibility and efficiency of 3D reconstruction techniques, developing methods to handle lower-quality samples and reducing computational demands.

The application of novel AI technologies into clinical practice holds immense potential.^{3, 4} However, pathology practices currently face several challenges, including diagnostic variability, time-intensive manual slide examinations, and the limitations of analyzing 2D cross-sections. These challenges not only impact diagnostic accuracy but can also delay the treatment decisions. The integration of novel AI technologies like PathChat and TriPath into clinical practice offers significant potential to address these challenges. As a multimodal AI assistant, PathChat enhances diagnostic accuracy by integrating visual and textual data. Meanwhile, TriPath, as a weakly supervised AI model, focused on analyzing 3D pathology samples, providing more comprehensive tissue analysis, facilitating risk stratification, and reducing diagnostic variability.

The implementation of PathChat and TriPath in clinical pathology can optimize workflow by automating routine diagnostic tasks, enabling pathologists to concentrate on more challenging cases. This technological synergy not only increases efficiency but also addresses variability in diagnoses caused by limited experience among pathologists. Additionally, these novel AI technologies contribute to reducing diagnostic turnaround times, facilitating quicker treatment decisions, and ultimately improving patient outcomes. Future research should focus on further refining these models, incorporating more diverse datasets, and exploring real-world applications. Additionally, adding additional data types, such as proteomics, genomics, and radiology, could create even more comprehensive and accurate diagnostic tools.⁵

Moreover, integrating multimodal generative AI with weakly supervised AI may create a synergistic effect, leading to more accurate and comprehensive tools. Herein, we propose the concept of a novel model, MetaPath Chat, based on MLLMs (Figure 1). Such an integrated system is designed to comprehensively process various forms of pathological images and can leverage the strengths of each technology: the ability of multimodal generative AI to provide context-aware and interactive responses, and the efficiency of weakly supervised AI in extracting meaningful features from large datasets without extensive annotations. It may offer the potential for more accurate diagnostics, optimized patient prognostication, enhanced educational tools, and accelerated research. Future research should strive to achieve the construction and application of MetaPath Chat, leveraging the strengths of integrated models to enhance performance and broaden its applicability.

Altogether, these two studies represent significant advancements in the application of AI to pathology. PathChat provides a powerful multimodal tool that enhances diagnostic workflows and supports educational and research activities, while TriPath brings a new level of detail and accuracy to 3D tissue analysis. Future research should focus on making these technologies more accessible and efficient, ensuring their broad implementation to improve patient outcomes and advance our understanding of various diseases.

Haizhu Chen and Ruichong Lin conceived, drafted, and revised the manuscript. Yunfang Yu conceived and revised the manuscript. All authors have read and approved the final manuscript.

The authors declare no conflict of interest.

Not available.

查看原文本刊更多论文

MetaPath Chat：用于临床病理学的多模态生成式人工智能聊天机器人

最近，《自然》（Nature）1 和《细胞》（Cell）2 上分别发表了两项关键性研究，这两项研究取得了突破性进展，将彻底改变病理学领域的人工智能（AI）。第一项研究介绍了用于人类病理学的多模态生成式人工智能助手PathChat。1 第二项研究揭示了弱监督人工智能模型TriPath，该模型旨在分析三维（3D）病理样本并预测患者预后。2 这些研究结果凸显了人工智能通过提高诊断和预后准确性以及实现新形式的人机协作来彻底改变病理学的潜力。PathChat 集成了一个视觉语言模型，该模型结合了一个预训练的视觉编码器和一个大型语言模型，并根据超过 456,916 条视觉语言指令进行了微调，其中包括 999,202 个问答回合。PathChat 的性能与最先进的多模态人工智能助手（包括 LLaVA 1.5、LLaVA-Med 和 GPT-4 V.1）进行了严格评估。首先，评估重点是使用来自癌症基因组图谱（The Cancer Genome Atlas）和内部病理档案的常规 H&E 全切片图像（WSI）的多项选择诊断问题，涵盖 11 个主要病理实践和器官部位的 54 项诊断。评估在两种情况下进行：纯图像和带临床背景的图像。在多选题的诊断准确性方面，PathChat 优于同类产品。具体来说，在纯图像设置中，PathChat 的准确率达到了 78.1%（与 LLaVA 1.5 相比，准确率提高了 52.4%；与 LLaVA-Med 相比，准确率提高了 63.8%，两者的 p 均为 0.001）。在有临床背景的图像中，PathChat 的准确率提高到了 89.5%（与 LLaVA 1.5 相比提高了 39.0%，与 LLaVA-Med 相比提高了 60.9%，均为 p <0.001），这表明它能够有效地利用多模态信息。此外，PathChat 在仅有图像（78.8% 对 25%）和有临床背景的图像（90.5% 对 63.5%）两种情况下的表现都优于 GPT-4 V，突显了其卓越的诊断准确性。此外，Lu 及其同事1 评估了 PathChat 为开放式病理相关问题生成连贯、临床相关回答的能力。七位病理专家根据相关性、正确性和解释质量对不同模型的回答进行了排名。与其他多模态大语言模型（MLLM）相比，PathChat 生成的回答更受欢迎，与 GPT-4 V、LLaVA 1.5 和 LLaVA-Med 相比，中位胜率分别为 56.5%、67.7% 和 74.2%。重要的是，PathChat 还支持交互式多轮对话，使其成为教育、研究和临床决策的多功能工具。在讨论 PathChat 的临床贡献时，Lu 及其同事1 还仔细研究了其局限性，并提出了进一步研究的方向。PathChat的训练数据虽然范围广泛，但来自于回顾性数据集，其中可能包含过时的信息。因此，为了保持准确性和相关性，有必要不断更新训练数据，并使模型与当前实践保持一致。此外，未来的研究还可以通过扩展对 WSI 的支持、结合人类反馈的强化学习以及开发精确计数或定位图像中的物体等功能来增强 PathChat 的功能。Song 等人2 利用弱监督人工智能开发了用于分析三维病理样本的深度学习模型 TriPath，从而解决了这一局限性（图 1）。经证实，弱监督人工智能只需最少的人工标记就能成功识别关键病理特征，其性能与完全监督方法相当，在某些情况下甚至优于完全监督方法。TriPath 采用卷积神经网络和变压器架构的组合来处理容积数据，将大块组织分割成较小的二维或三维斑块，并将其归纳为低维特征向量，用于患者级别的风险预测。该研究利用不同的三维模式（包括开口光片显微镜和微计算机断层扫描）对前列腺癌标本进行成像，测试了 TriPath 在风险分层方面的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

MedComm

CiteScore

6.70

自引率

0.00%

发文量

审稿时长

10 weeks