Beyza Bolat, Ozgur Can Eren, A Humeyra Dur-Karasayar, Cisel Aydin Mericoz, Cigdem Gunduz-Demir, Ibrahim Kulac
{"title":"Large Language Models as a Rapid and Objective Tool for Pathology Report Data Extraction.","authors":"Beyza Bolat, Ozgur Can Eren, A Humeyra Dur-Karasayar, Cisel Aydin Mericoz, Cigdem Gunduz-Demir, Ibrahim Kulac","doi":"10.5146/tjpath.2024.13256","DOIUrl":null,"url":null,"abstract":"<p><p>Medical institutions continuously create a substantial amount of data that is used for scientific research. One of the departments with a great amount of archived data is the pathology department. Pathology archives hold the potential to create a case series of valuable rare entities or large cohorts of common entities. The major problem in creation of these databases is data extraction which is still commonly done manually and is highly laborious and error prone. For these reasons, we offer using large language models to overcome these challenges. Ten pathology reports of selected resection specimens were retrieved from electronic archives of Koç University Hospital for the initial set. These reports were de-identified and uploaded to ChatGPT and Google Bard. Both algorithms were asked to turn the reports in a synoptic report format that is easy to export to a data editor such as Microsoft Excel or Google Sheets. Both programs created tables with Google Bard facilitating the creation of a spreadsheet from the data automatically. In conclusion, we propose the use of AI-assisted data extraction for academic research purposes, as it may enhance efficiency and precision compared to manual data entry.</p>","PeriodicalId":45415,"journal":{"name":"Turkish Journal of Pathology","volume":" ","pages":"138-141"},"PeriodicalIF":1.1000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11129865/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Turkish Journal of Pathology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5146/tjpath.2024.13256","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"PATHOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Medical institutions continuously create a substantial amount of data that is used for scientific research. One of the departments with a great amount of archived data is the pathology department. Pathology archives hold the potential to create a case series of valuable rare entities or large cohorts of common entities. The major problem in creation of these databases is data extraction which is still commonly done manually and is highly laborious and error prone. For these reasons, we offer using large language models to overcome these challenges. Ten pathology reports of selected resection specimens were retrieved from electronic archives of Koç University Hospital for the initial set. These reports were de-identified and uploaded to ChatGPT and Google Bard. Both algorithms were asked to turn the reports in a synoptic report format that is easy to export to a data editor such as Microsoft Excel or Google Sheets. Both programs created tables with Google Bard facilitating the creation of a spreadsheet from the data automatically. In conclusion, we propose the use of AI-assisted data extraction for academic research purposes, as it may enhance efficiency and precision compared to manual data entry.
医疗机构不断产生大量用于科学研究的数据。病理部门是拥有大量存档数据的部门之一。病理档案有可能创建有价值的罕见病例系列或常见病例的大型群组。创建这些数据库的主要问题是数据提取,而数据提取通常仍由人工完成,非常费力且容易出错。因此,我们提出使用大型语言模型来克服这些挑战。我们从 Koç 大学医院的电子档案中检索了 10 份选定切除标本的病理报告,作为初始集。这些报告经过去标识化处理后上传到 ChatGPT 和 Google Bard。这两种算法都被要求将报告转换成易于导出到 Microsoft Excel 或 Google Sheets 等数据编辑器的综合报告格式。这两个程序都能创建表格,而 Google Bard 则能根据数据自动创建电子表格。总之,我们建议将人工智能辅助数据提取用于学术研究目的,因为与人工数据录入相比,它可以提高效率和精确度。