{"title":"评价通用大语言模型作为宫颈细胞学诊断支持工具。","authors":"Thiyaphat Laohawetwanit, Sompon Apornvirat, Aleksandra Asaturova, Hua Li, Kris Lami, Andrey Bychkov","doi":"10.1016/j.prp.2025.156159","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The application of general-purpose large language models (LLMs) in cytopathology remains largely unexplored. This study aims to evaluate the accuracy and consistency of a custom version of ChatGPT-4 (GPT), ChatGPT o3, and Gemini 2.5 Pro as diagnostic support tools for cervical cytology.</p><p><strong>Materials and methods: </strong>A total of 200 Papanicolaou-stained cervical cytology images were acquired at 40x magnification, each measuring 384 × 384 pixels. These images consisted of 100 cases classified as negative for intraepithelial lesion or malignancy (NILM) and 100 cases across various abnormal categories: 20 low-grade squamous intraepithelial lesion (LSIL), 20 high-grade squamous intraepithelial lesion (HSIL), 20 squamous cell carcinoma (SCC), 20 adenocarcinoma in situ (AIS), and 20 adenocarcinoma (ADC). Diagnostic accuracy and consistency were evaluated by submitting each image to a GPT, ChatGPT o3, and Gemini 2.5 Pro 5-10 times.</p><p><strong>Results: </strong>When distinguishing normal from abnormal cytology, LLMs showed mean sensitivity between 85.4 % and 100 %, and specificity between 67.2 % and 92.7 %. ChatGPT o3 was more accurate in identifying NILM (mean 89.2 % vs. 67.2 %) but less accurate in detecting LSIL (34 % vs. 85 %), HSIL (6 % vs. 63 %), and ADC (28 % vs. 91 %). Chain-of-thought prompting and submitting multiple images of the same diagnosis to ChatGPT o3 and Gemini 2.5 Pro did not significantly improve accuracy. Both models also performed poorly in identifying cervicovaginal infections.</p><p><strong>Conclusions: </strong>ChatGPT o3 and Gemini 2.5 Pro demonstrated complementary strengths in cervical cytology. Due to their low accuracy and inconsistency in abnormal cytology, general-purpose LLMs are not recommended as diagnostic support tools in cervical cytology.</p>","PeriodicalId":19916,"journal":{"name":"Pathology, research and practice","volume":"274 ","pages":"156159"},"PeriodicalIF":3.2000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of general-purpose large language models as diagnostic support tools in cervical cytology.\",\"authors\":\"Thiyaphat Laohawetwanit, Sompon Apornvirat, Aleksandra Asaturova, Hua Li, Kris Lami, Andrey Bychkov\",\"doi\":\"10.1016/j.prp.2025.156159\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>The application of general-purpose large language models (LLMs) in cytopathology remains largely unexplored. This study aims to evaluate the accuracy and consistency of a custom version of ChatGPT-4 (GPT), ChatGPT o3, and Gemini 2.5 Pro as diagnostic support tools for cervical cytology.</p><p><strong>Materials and methods: </strong>A total of 200 Papanicolaou-stained cervical cytology images were acquired at 40x magnification, each measuring 384 × 384 pixels. These images consisted of 100 cases classified as negative for intraepithelial lesion or malignancy (NILM) and 100 cases across various abnormal categories: 20 low-grade squamous intraepithelial lesion (LSIL), 20 high-grade squamous intraepithelial lesion (HSIL), 20 squamous cell carcinoma (SCC), 20 adenocarcinoma in situ (AIS), and 20 adenocarcinoma (ADC). Diagnostic accuracy and consistency were evaluated by submitting each image to a GPT, ChatGPT o3, and Gemini 2.5 Pro 5-10 times.</p><p><strong>Results: </strong>When distinguishing normal from abnormal cytology, LLMs showed mean sensitivity between 85.4 % and 100 %, and specificity between 67.2 % and 92.7 %. ChatGPT o3 was more accurate in identifying NILM (mean 89.2 % vs. 67.2 %) but less accurate in detecting LSIL (34 % vs. 85 %), HSIL (6 % vs. 63 %), and ADC (28 % vs. 91 %). Chain-of-thought prompting and submitting multiple images of the same diagnosis to ChatGPT o3 and Gemini 2.5 Pro did not significantly improve accuracy. Both models also performed poorly in identifying cervicovaginal infections.</p><p><strong>Conclusions: </strong>ChatGPT o3 and Gemini 2.5 Pro demonstrated complementary strengths in cervical cytology. Due to their low accuracy and inconsistency in abnormal cytology, general-purpose LLMs are not recommended as diagnostic support tools in cervical cytology.</p>\",\"PeriodicalId\":19916,\"journal\":{\"name\":\"Pathology, research and practice\",\"volume\":\"274 \",\"pages\":\"156159\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pathology, research and practice\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.prp.2025.156159\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/8/7 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"PATHOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pathology, research and practice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.prp.2025.156159","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/7 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"PATHOLOGY","Score":null,"Total":0}
Evaluation of general-purpose large language models as diagnostic support tools in cervical cytology.
Introduction: The application of general-purpose large language models (LLMs) in cytopathology remains largely unexplored. This study aims to evaluate the accuracy and consistency of a custom version of ChatGPT-4 (GPT), ChatGPT o3, and Gemini 2.5 Pro as diagnostic support tools for cervical cytology.
Materials and methods: A total of 200 Papanicolaou-stained cervical cytology images were acquired at 40x magnification, each measuring 384 × 384 pixels. These images consisted of 100 cases classified as negative for intraepithelial lesion or malignancy (NILM) and 100 cases across various abnormal categories: 20 low-grade squamous intraepithelial lesion (LSIL), 20 high-grade squamous intraepithelial lesion (HSIL), 20 squamous cell carcinoma (SCC), 20 adenocarcinoma in situ (AIS), and 20 adenocarcinoma (ADC). Diagnostic accuracy and consistency were evaluated by submitting each image to a GPT, ChatGPT o3, and Gemini 2.5 Pro 5-10 times.
Results: When distinguishing normal from abnormal cytology, LLMs showed mean sensitivity between 85.4 % and 100 %, and specificity between 67.2 % and 92.7 %. ChatGPT o3 was more accurate in identifying NILM (mean 89.2 % vs. 67.2 %) but less accurate in detecting LSIL (34 % vs. 85 %), HSIL (6 % vs. 63 %), and ADC (28 % vs. 91 %). Chain-of-thought prompting and submitting multiple images of the same diagnosis to ChatGPT o3 and Gemini 2.5 Pro did not significantly improve accuracy. Both models also performed poorly in identifying cervicovaginal infections.
Conclusions: ChatGPT o3 and Gemini 2.5 Pro demonstrated complementary strengths in cervical cytology. Due to their low accuracy and inconsistency in abnormal cytology, general-purpose LLMs are not recommended as diagnostic support tools in cervical cytology.
期刊介绍:
Pathology, Research and Practice provides accessible coverage of the most recent developments across the entire field of pathology: Reviews focus on recent progress in pathology, while Comments look at interesting current problems and at hypotheses for future developments in pathology. Original Papers present novel findings on all aspects of general, anatomic and molecular pathology. Rapid Communications inform readers on preliminary findings that may be relevant for further studies and need to be communicated quickly. Teaching Cases look at new aspects or special diagnostic problems of diseases and at case reports relevant for the pathologist''s practice.