Omer Faruk Dilbaz, Muhammet Nusret Ozates, Beyza Bolat, Cigdem Gunduz-Demir, Ibrahim Kulac
{"title":"Systematic comparison of GPT models for the analysis of pathology reports in a low-resource language: A case study for Turkish.","authors":"Omer Faruk Dilbaz, Muhammet Nusret Ozates, Beyza Bolat, Cigdem Gunduz-Demir, Ibrahim Kulac","doi":"10.1093/ajcp/aqaf091","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>Large language models (LLMs) can process text for various applications, including surgical pathology reports, but studies primarily focus on English. Their performance has not been systematically studied for a low-resource language. To analyze the performance of various LLMs, 759 Turkish pathology reports from 5 different procedures were selected.</p><p><strong>Methods: </strong>We used 10 examples from every procedure to optimize prompts for OpenAI's GPT-3.5 Turbo, GPT-4o mini, and GPT-4o. The rest was used to test generalizability.</p><p><strong>Results: </strong>The GPT-4o model performed superior in processing Turkish reports (12%-25% over GPT-3.5 Turbo, 3%-16% over GPT-4o mini). English-translated versions of the reports have been demonstrated to enhance accuracy, especially for GPT-3.5 Turbo and GPT-4o mini. GPT4-o showed comparable results for Turkish and English. A 12% to 22% performance gap was observed between GPT-4o and GPT-3.5 Turbo for English-translated reports. Domain-related tips in prompts increased accuracy. Results of larger test sets were parallel for all models with the validation set. The GPT-4o model yielded the most accurate results, while the GPT-4o mini model demonstrated intermediate performance. The GPT-3.5 Turbo model exhibited the least accuracy.</p><p><strong>Conclusions: </strong>To our knowledge, for the first time in the literature, we have demonstrated the performance of GPT models in Turkish surgical pathology reports, and results indicate that data extracted by GPT-4o are almost ready for direct application.</p>","PeriodicalId":7506,"journal":{"name":"American journal of clinical pathology","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"American journal of clinical pathology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ajcp/aqaf091","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Objective: Large language models (LLMs) can process text for various applications, including surgical pathology reports, but studies primarily focus on English. Their performance has not been systematically studied for a low-resource language. To analyze the performance of various LLMs, 759 Turkish pathology reports from 5 different procedures were selected.
Methods: We used 10 examples from every procedure to optimize prompts for OpenAI's GPT-3.5 Turbo, GPT-4o mini, and GPT-4o. The rest was used to test generalizability.
Results: The GPT-4o model performed superior in processing Turkish reports (12%-25% over GPT-3.5 Turbo, 3%-16% over GPT-4o mini). English-translated versions of the reports have been demonstrated to enhance accuracy, especially for GPT-3.5 Turbo and GPT-4o mini. GPT4-o showed comparable results for Turkish and English. A 12% to 22% performance gap was observed between GPT-4o and GPT-3.5 Turbo for English-translated reports. Domain-related tips in prompts increased accuracy. Results of larger test sets were parallel for all models with the validation set. The GPT-4o model yielded the most accurate results, while the GPT-4o mini model demonstrated intermediate performance. The GPT-3.5 Turbo model exhibited the least accuracy.
Conclusions: To our knowledge, for the first time in the literature, we have demonstrated the performance of GPT models in Turkish surgical pathology reports, and results indicate that data extracted by GPT-4o are almost ready for direct application.
期刊介绍:
The American Journal of Clinical Pathology (AJCP) is the official journal of the American Society for Clinical Pathology and the Academy of Clinical Laboratory Physicians and Scientists. It is a leading international journal for publication of articles concerning novel anatomic pathology and laboratory medicine observations on human disease. AJCP emphasizes articles that focus on the application of evolving technologies for the diagnosis and characterization of diseases and conditions, as well as those that have a direct link toward improving patient care.