使用微调llm对低资源语言进行情感分析：土耳其语客户评论的案例研究

IF 3.4 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Access Pub Date : 2025-04-30 DOI:10.1109/ACCESS.2025.3566000

Rukiye Savran Kiziltepe;Ercan Ezin;Ömer Yentür;Arwa M. Basbrain;Murat Karakus

{"title":"使用微调llm对低资源语言进行情感分析：土耳其语客户评论的案例研究","authors":"Rukiye Savran Kiziltepe;Ercan Ezin;Ömer Yentür;Arwa M. Basbrain;Murat Karakus","doi":"10.1109/ACCESS.2025.3566000","DOIUrl":null,"url":null,"abstract":"This study investigates the application of advanced fine-tuned Large Language Models (LLMs) for Turkish Sentiment Analysis (SA), focusing on e-commerce product reviews. Our research utilizes four open-source Turkish SA datasets: Turkish Sentiment Analysis version 1 (TRSAv1), Vitamins and Supplements Customer Review (VSCR), Turkish Sentiment Analysis Dataset (TSAD), and TR Customer Review (TRCR). While these datasets were initially labeled based on star ratings, we implemented a comprehensive relabeling process using state-of-the-art LLMs to enhance data quality. To ensure reliable annotations, we first conducted a comparative analysis of different LLMs using the Cohen’s Kappa agreement metric, which led to the selection of ChatGPT-4o-mini as the best-performing model for dataset annotation. Our methodology then focuses on evaluating the SA capabilities of leading instruction-tuned LLMs through a comparative analysis of zero-shot models and Low-Rank Adaptation (LoRA) fine-tuned LlaMA-3.2-1B-IT and Gemma-2-2B-IT models. Evaluations were conducted on both in-domain and out-domain test sets derived from the original star-ratings-based labels and the newly generated GPT labels. The results demonstrate that our fine-tuned models outperformed leading commercial LLMs by 6% in both in-domain and out-domain evaluations. Notably, models fine-tuned on GPT-generated labels achieved superior performance, with in-domain and out-domain F1-scores reaching 0.912 and 0.9184, respectively. These findings underscore the transformative potential of combining LLM relabeling with LoRA fine-tuning for optimizing SA, demonstrating robust performance across diverse datasets and domains.","PeriodicalId":13079,"journal":{"name":"IEEE Access","volume":"13 ","pages":"77382-77394"},"PeriodicalIF":3.4000,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10980352","citationCount":"0","resultStr":"{\"title\":\"Advancing Sentiment Analysis for Low-Resource Languages Using Fine-Tuned LLMs: A Case Study of Customer Reviews in Turkish Language\",\"authors\":\"Rukiye Savran Kiziltepe;Ercan Ezin;Ömer Yentür;Arwa M. Basbrain;Murat Karakus\",\"doi\":\"10.1109/ACCESS.2025.3566000\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This study investigates the application of advanced fine-tuned Large Language Models (LLMs) for Turkish Sentiment Analysis (SA), focusing on e-commerce product reviews. Our research utilizes four open-source Turkish SA datasets: Turkish Sentiment Analysis version 1 (TRSAv1), Vitamins and Supplements Customer Review (VSCR), Turkish Sentiment Analysis Dataset (TSAD), and TR Customer Review (TRCR). While these datasets were initially labeled based on star ratings, we implemented a comprehensive relabeling process using state-of-the-art LLMs to enhance data quality. To ensure reliable annotations, we first conducted a comparative analysis of different LLMs using the Cohen’s Kappa agreement metric, which led to the selection of ChatGPT-4o-mini as the best-performing model for dataset annotation. Our methodology then focuses on evaluating the SA capabilities of leading instruction-tuned LLMs through a comparative analysis of zero-shot models and Low-Rank Adaptation (LoRA) fine-tuned LlaMA-3.2-1B-IT and Gemma-2-2B-IT models. Evaluations were conducted on both in-domain and out-domain test sets derived from the original star-ratings-based labels and the newly generated GPT labels. The results demonstrate that our fine-tuned models outperformed leading commercial LLMs by 6% in both in-domain and out-domain evaluations. Notably, models fine-tuned on GPT-generated labels achieved superior performance, with in-domain and out-domain F1-scores reaching 0.912 and 0.9184, respectively. These findings underscore the transformative potential of combining LLM relabeling with LoRA fine-tuning for optimizing SA, demonstrating robust performance across diverse datasets and domains.\",\"PeriodicalId\":13079,\"journal\":{\"name\":\"IEEE Access\",\"volume\":\"13 \",\"pages\":\"77382-77394\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10980352\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Access\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10980352/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Access","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10980352/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

本研究探讨了高级微调大语言模型（llm）在土耳其情感分析（SA）中的应用，重点是电子商务产品评论。我们的研究使用了四个开源的土耳其SA数据集：土耳其情绪分析版本1 (TRSAv1)，维生素和补充剂客户评论（VSCR），土耳其情绪分析数据集（TSAD）和TR客户评论（TRCR）。虽然这些数据集最初是根据星级进行标记的，但我们使用最先进的llm实施了全面的重新标记过程，以提高数据质量。为了确保可靠的注释，我们首先使用Cohen 's Kappa协议度量对不同的llm进行了比较分析，这导致选择chatgpt - 40 -mini作为数据集注释的最佳模型。然后，我们的方法侧重于通过对零射击模型和低秩自适应（LoRA）微调的LlaMA-3.2-1B-IT和Gemma-2-2B-IT模型的比较分析来评估领先的指令调谐llm的SA能力。对源自原始基于星级的标签和新生成的GPT标签的域内和域外测试集进行评估。结果表明，我们的微调模型在领域内和领域外的评估中都比领先的商业法学硕士高出6%。值得注意的是，在gpt生成的标签上进行微调的模型取得了更好的性能，域内和域外f1得分分别达到0.912和0.9184。这些发现强调了将LLM重新标记与LoRA微调相结合以优化SA的变革潜力，展示了跨不同数据集和领域的强大性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Advancing Sentiment Analysis for Low-Resource Languages Using Fine-Tuned LLMs: A Case Study of Customer Reviews in Turkish Language

This study investigates the application of advanced fine-tuned Large Language Models (LLMs) for Turkish Sentiment Analysis (SA), focusing on e-commerce product reviews. Our research utilizes four open-source Turkish SA datasets: Turkish Sentiment Analysis version 1 (TRSAv1), Vitamins and Supplements Customer Review (VSCR), Turkish Sentiment Analysis Dataset (TSAD), and TR Customer Review (TRCR). While these datasets were initially labeled based on star ratings, we implemented a comprehensive relabeling process using state-of-the-art LLMs to enhance data quality. To ensure reliable annotations, we first conducted a comparative analysis of different LLMs using the Cohen’s Kappa agreement metric, which led to the selection of ChatGPT-4o-mini as the best-performing model for dataset annotation. Our methodology then focuses on evaluating the SA capabilities of leading instruction-tuned LLMs through a comparative analysis of zero-shot models and Low-Rank Adaptation (LoRA) fine-tuned LlaMA-3.2-1B-IT and Gemma-2-2B-IT models. Evaluations were conducted on both in-domain and out-domain test sets derived from the original star-ratings-based labels and the newly generated GPT labels. The results demonstrate that our fine-tuned models outperformed leading commercial LLMs by 6% in both in-domain and out-domain evaluations. Notably, models fine-tuned on GPT-generated labels achieved superior performance, with in-domain and out-domain F1-scores reaching 0.912 and 0.9184, respectively. These findings underscore the transformative potential of combining LLM relabeling with LoRA fine-tuning for optimizing SA, demonstrating robust performance across diverse datasets and domains.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Access COMPUTER SCIENCE, INFORMATION SYSTEMSENGIN-ENGINEERING, ELECTRICAL & ELECTRONIC

CiteScore

9.80

自引率

7.70%

发文量

6673

审稿时长

6 weeks

期刊介绍： IEEE Access® is a multidisciplinary, open access (OA), applications-oriented, all-electronic archival journal that continuously presents the results of original research or development across all of IEEE''s fields of interest. IEEE Access will publish articles that are of high interest to readers, original, technically correct, and clearly presented. Supported by author publication charges (APC), its hallmarks are a rapid peer review and publication process with open access to all readers. Unlike IEEE''s traditional Transactions or Journals, reviews are "binary", in that reviewers will either Accept or Reject an article in the form it is submitted in order to achieve rapid turnaround. Especially encouraged are submissions on: Multidisciplinary topics, or applications-oriented articles and negative results that do not fit within the scope of IEEE''s traditional journals. Practical articles discussing new experiments or measurement techniques, interesting solutions to engineering. Development of new or improved fabrication or manufacturing techniques. Reviews or survey articles of new or evolving fields oriented to assist others in understanding the new area.