对意大利医学文本分类的大型语言模型进行基准测试：生成模型是最佳选择吗？

Studies in health technology and informatics Pub Date : 2025-10-02 DOI:10.3233/SHTI251486

Livia Lilli, Stefano Patarnello, Carlotta Masciocchi, Antonio Marchetti, Giovanni Arcuri

{"title":"对意大利医学文本分类的大型语言模型进行基准测试：生成模型是最佳选择吗？","authors":"Livia Lilli, Stefano Patarnello, Carlotta Masciocchi, Antonio Marchetti, Giovanni Arcuri","doi":"10.3233/SHTI251486","DOIUrl":null,"url":null,"abstract":"The extraction of meaningful information from clinical reports has been an area of growing interest, with a variety of studies leveraging natural language processing (NLP) techniques based on BERT architectures and generative large language models (LLMs). However, identifying the most effective approach remains challenging, especially for text classification, where model architecture, data availability, domain-specific nuances and language play a crucial role in performance. In this study, we present a benchmark analysis of generative LLMs and BERT-based models for the classification of metastasis in Italian clinical reports of breast cancer patients. Our methodology compares the performance of generative LLMs implemented within a structured generation framework, versus BERT-based models fine-tuned on the metastasis classification task, and also applied in a zero-shot learning setting. In our experiments, fine-tuned BERT models achieved the most balanced results (F1 = 0.884, AUC = 0.720). Generative LLMs showed promising performance, with potential for improvement through further adaptation. Finally, our study suggests that both BERT-based models and generative LLMs are potential solutions also in low computational settings, making them accessible for real-world clinical applications, particularly in medical text classification.","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"332 ","pages":"12-16"},"PeriodicalIF":0.0000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Benchmarking Large Language Models for Italian Medical Text Classification: Are Generative Models the Best Choice?\",\"authors\":\"Livia Lilli, Stefano Patarnello, Carlotta Masciocchi, Antonio Marchetti, Giovanni Arcuri\",\"doi\":\"10.3233/SHTI251486\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The extraction of meaningful information from clinical reports has been an area of growing interest, with a variety of studies leveraging natural language processing (NLP) techniques based on BERT architectures and generative large language models (LLMs). However, identifying the most effective approach remains challenging, especially for text classification, where model architecture, data availability, domain-specific nuances and language play a crucial role in performance. In this study, we present a benchmark analysis of generative LLMs and BERT-based models for the classification of metastasis in Italian clinical reports of breast cancer patients. Our methodology compares the performance of generative LLMs implemented within a structured generation framework, versus BERT-based models fine-tuned on the metastasis classification task, and also applied in a zero-shot learning setting. In our experiments, fine-tuned BERT models achieved the most balanced results (F1 = 0.884, AUC = 0.720). Generative LLMs showed promising performance, with potential for improvement through further adaptation. Finally, our study suggests that both BERT-based models and generative LLMs are potential solutions also in low computational settings, making them accessible for real-world clinical applications, particularly in medical text classification.\",\"PeriodicalId\":94357,\"journal\":{\"name\":\"Studies in health technology and informatics\",\"volume\":\"332 \",\"pages\":\"12-16\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Studies in health technology and informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3233/SHTI251486\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Studies in health technology and informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3233/SHTI251486","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

从临床报告中提取有意义的信息一直是一个越来越受关注的领域，各种研究利用基于BERT架构和生成式大型语言模型（llm）的自然语言处理（NLP）技术。然而，确定最有效的方法仍然具有挑战性，特别是对于文本分类，其中模型体系结构、数据可用性、特定领域的细微差别和语言在性能中起着至关重要的作用。在这项研究中，我们对意大利乳腺癌患者临床报告中的生成性llm和基于bert的转移分类模型进行了基准分析。我们的方法比较了在结构化生成框架内实现的生成式llm与基于bert的模型的性能，这些模型在转移分类任务上进行了微调，并应用于零射击学习设置。在我们的实验中，微调后的BERT模型获得了最平衡的结果（F1 = 0.884, AUC = 0.720）。生成式llm表现出良好的性能，并有可能通过进一步的适应进行改进。最后，我们的研究表明，基于bert的模型和生成式llm在低计算设置中也是潜在的解决方案，使它们可用于现实世界的临床应用，特别是在医学文本分类中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Benchmarking Large Language Models for Italian Medical Text Classification: Are Generative Models the Best Choice?

The extraction of meaningful information from clinical reports has been an area of growing interest, with a variety of studies leveraging natural language processing (NLP) techniques based on BERT architectures and generative large language models (LLMs). However, identifying the most effective approach remains challenging, especially for text classification, where model architecture, data availability, domain-specific nuances and language play a crucial role in performance. In this study, we present a benchmark analysis of generative LLMs and BERT-based models for the classification of metastasis in Italian clinical reports of breast cancer patients. Our methodology compares the performance of generative LLMs implemented within a structured generation framework, versus BERT-based models fine-tuned on the metastasis classification task, and also applied in a zero-shot learning setting. In our experiments, fine-tuned BERT models achieved the most balanced results (F1 = 0.884, AUC = 0.720). Generative LLMs showed promising performance, with potential for improvement through further adaptation. Finally, our study suggests that both BERT-based models and generative LLMs are potential solutions also in low computational settings, making them accessible for real-world clinical applications, particularly in medical text classification.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Studies in health technology and informatics

自引率

0.00%

发文量