{"title":"Data augmentation based on large language models for radiological report classification","authors":"Jaime Collado-Montañez, María-Teresa Martín-Valdivia, Eugenio Martínez-Cámara","doi":"10.1016/j.knosys.2024.112745","DOIUrl":null,"url":null,"abstract":"<div><div>The International Classification of Diseases (ICD) is fundamental in the field of healthcare as it provides a standardized framework for the classification and coding of medical diagnoses and procedures, enabling the understanding of international public health patterns and trends. However, manually classifying medical reports according to this standard is a slow, tedious and error-prone process, which shows the need for automated systems to offload the healthcare professional of this task and to reduce the number of errors. In this paper, we propose an automated classification system based on Natural Language Processing to analyze radiological reports and classify them according to the ICD-10. Since the specialized use of the language of radiological reports and the usual unbalanced distribution of medical report sets, we propose a methodology grounded in leveraging large language models for augmenting the data of unrepresented classes and adapting the classification language models to the specific use of the language of radiological reports. The results show that the proposed methodology enhances the classification performance on the CARES corpus of radiological reports.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"308 ","pages":"Article 112745"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124013790","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
The International Classification of Diseases (ICD) is fundamental in the field of healthcare as it provides a standardized framework for the classification and coding of medical diagnoses and procedures, enabling the understanding of international public health patterns and trends. However, manually classifying medical reports according to this standard is a slow, tedious and error-prone process, which shows the need for automated systems to offload the healthcare professional of this task and to reduce the number of errors. In this paper, we propose an automated classification system based on Natural Language Processing to analyze radiological reports and classify them according to the ICD-10. Since the specialized use of the language of radiological reports and the usual unbalanced distribution of medical report sets, we propose a methodology grounded in leveraging large language models for augmenting the data of unrepresented classes and adapting the classification language models to the specific use of the language of radiological reports. The results show that the proposed methodology enhances the classification performance on the CARES corpus of radiological reports.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.