Pitso Walter Khoboko , Vukosi Marivate , Joseph Sefara
{"title":"为低资源语言优化翻译:在大型语言模型中使用自定义提示工程进行有效微调","authors":"Pitso Walter Khoboko , Vukosi Marivate , Joseph Sefara","doi":"10.1016/j.mlwa.2025.100649","DOIUrl":null,"url":null,"abstract":"<div><div>Training large language models (LLMs) can be prohibitively expensive. However, the emergence of new Parameter-Efficient Fine-Tuning (PEFT) strategies provides a cost-effective approach to unlocking the potential of LLMs across a variety of natural language processing (NLP) tasks. In this study, we selected the Mistral 7B language model as our primary LLM due to its superior performance, which surpasses that of LLAMA 2 13B across multiple benchmarks. By leveraging PEFT methods, we aimed to significantly reduce the cost of fine-tuning while maintaining high levels of performance.</div><div>Despite their advancements, LLMs often struggle with translation tasks for low-resource languages, particularly morphologically rich African languages. To address this, we employed customized prompt engineering techniques to enhance LLM translation capabilities for these languages.</div><div>Our experimentation focused on fine-tuning the Mistral 7B model to identify the best-performing ensemble using a custom prompt strategy. The results obtained from the fine-tuned Mistral 7B model were compared against several models: Serengeti, Gemma, Google Translate, and No Language Left Behind (NLLB). Specifically, Serengeti and Gemma were fine-tuned using the same custom prompt strategy as the Mistral model, while Google Translate and NLLB Gemma, which are pre-trained to handle English-to-Zulu and English-to-Xhosa translations, were evaluated directly on the test data set. This comparative analysis allowed us to assess the efficacy of the fine-tuned Mistral 7B model against both custom-tuned and pre-trained translation models.</div><div>LLMs have traditionally struggled to produce high-quality translations, especially for low-resource languages. Our experiments revealed that the key to improving translation performance lies in using the correct prompt during fine-tuning. We used the Mistral 7B model to develop a custom prompt that significantly enhanced translation quality for English-to-Zulu and English-to-Xhosa language pairs. After fine-tuning the Mistral 7B model for 30 GPU days, we compared its performance to the No Language Left Behind (NLLB) model and Google Translator API on the same test dataset. While NLLB achieved the highest scores across BLEU, G-Eval (cosine similarity), and Chrf++ (F1-score), our results demonstrated that Mistral 7B, with the custom prompt, still performed competitively.</div><div>Additionally, we showed that our prompt template can improve the translation accuracy of other models, such as Gemma and Serengeti, when applied to high-quality bilingual datasets. This demonstrates that our custom prompt strategy is adaptable across different model architectures, bilingual settings, and is highly effective in accelerating learning for low-resource language translation.</div></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"20 ","pages":"Article 100649"},"PeriodicalIF":4.9000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimizing translation for low-resource languages: Efficient fine-tuning with custom prompt engineering in large language models\",\"authors\":\"Pitso Walter Khoboko , Vukosi Marivate , Joseph Sefara\",\"doi\":\"10.1016/j.mlwa.2025.100649\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Training large language models (LLMs) can be prohibitively expensive. However, the emergence of new Parameter-Efficient Fine-Tuning (PEFT) strategies provides a cost-effective approach to unlocking the potential of LLMs across a variety of natural language processing (NLP) tasks. In this study, we selected the Mistral 7B language model as our primary LLM due to its superior performance, which surpasses that of LLAMA 2 13B across multiple benchmarks. By leveraging PEFT methods, we aimed to significantly reduce the cost of fine-tuning while maintaining high levels of performance.</div><div>Despite their advancements, LLMs often struggle with translation tasks for low-resource languages, particularly morphologically rich African languages. To address this, we employed customized prompt engineering techniques to enhance LLM translation capabilities for these languages.</div><div>Our experimentation focused on fine-tuning the Mistral 7B model to identify the best-performing ensemble using a custom prompt strategy. The results obtained from the fine-tuned Mistral 7B model were compared against several models: Serengeti, Gemma, Google Translate, and No Language Left Behind (NLLB). Specifically, Serengeti and Gemma were fine-tuned using the same custom prompt strategy as the Mistral model, while Google Translate and NLLB Gemma, which are pre-trained to handle English-to-Zulu and English-to-Xhosa translations, were evaluated directly on the test data set. This comparative analysis allowed us to assess the efficacy of the fine-tuned Mistral 7B model against both custom-tuned and pre-trained translation models.</div><div>LLMs have traditionally struggled to produce high-quality translations, especially for low-resource languages. Our experiments revealed that the key to improving translation performance lies in using the correct prompt during fine-tuning. We used the Mistral 7B model to develop a custom prompt that significantly enhanced translation quality for English-to-Zulu and English-to-Xhosa language pairs. After fine-tuning the Mistral 7B model for 30 GPU days, we compared its performance to the No Language Left Behind (NLLB) model and Google Translator API on the same test dataset. While NLLB achieved the highest scores across BLEU, G-Eval (cosine similarity), and Chrf++ (F1-score), our results demonstrated that Mistral 7B, with the custom prompt, still performed competitively.</div><div>Additionally, we showed that our prompt template can improve the translation accuracy of other models, such as Gemma and Serengeti, when applied to high-quality bilingual datasets. This demonstrates that our custom prompt strategy is adaptable across different model architectures, bilingual settings, and is highly effective in accelerating learning for low-resource language translation.</div></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"20 \",\"pages\":\"Article 100649\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-04-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827025000325\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827025000325","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
训练大型语言模型(llm)可能非常昂贵。然而,新的参数高效微调(PEFT)策略的出现为释放llm在各种自然语言处理(NLP)任务中的潜力提供了一种经济有效的方法。在本研究中,我们选择Mistral 7B语言模型作为我们的主要LLM,因为它的性能优越,在多个基准测试中超过了LLAMA 2 13B。通过利用PEFT方法,我们的目标是在保持高水平性能的同时显著降低微调的成本。尽管法学硕士取得了进步,但他们经常在资源匮乏的语言,特别是词法丰富的非洲语言的翻译任务中挣扎。为了解决这个问题,我们采用了定制的提示工程技术来增强LLM对这些语言的翻译能力。我们的实验集中在微调西北风7B模型,以确定使用自定义提示策略的最佳表现合奏。将经过微调的Mistral 7B模型得到的结果与Serengeti、Gemma、谷歌Translate和No Language Left Behind (NLLB)模型进行了比较。具体来说,Serengeti和Gemma使用与Mistral模型相同的自定义提示策略进行微调,而谷歌Translate和NLLB Gemma经过预先训练,可以处理英语到祖鲁语和英语到科萨语的翻译,直接在测试数据集上进行评估。这种比较分析使我们能够评估经过微调的Mistral 7B模型与定制调整和预训练的翻译模型的有效性。传统上,法学硕士很难做出高质量的翻译,尤其是对于资源匮乏的语言。我们的实验表明,提高翻译性能的关键在于在微调过程中使用正确的提示符。我们使用Mistral 7B模型开发了一个自定义提示符,显著提高了英语到祖鲁语和英语到科萨语对的翻译质量。在对Mistral 7B模型进行了30天的GPU微调后,我们将其性能与同一测试数据集上的No Language Left Behind (NLLB)模型和谷歌Translator API进行了比较。虽然NLLB在BLEU, G-Eval(余弦相似性)和chrf++ (f1分数)中获得了最高分,但我们的结果表明,使用自定义提示符的Mistral 7B仍然具有竞争力。此外,我们还表明,当我们的提示模板应用于高质量的双语数据集时,可以提高其他模型(如Gemma和Serengeti)的翻译精度。这表明我们的自定义提示策略适用于不同的模型架构和双语设置,并且在加速低资源语言翻译的学习方面非常有效。
Optimizing translation for low-resource languages: Efficient fine-tuning with custom prompt engineering in large language models
Training large language models (LLMs) can be prohibitively expensive. However, the emergence of new Parameter-Efficient Fine-Tuning (PEFT) strategies provides a cost-effective approach to unlocking the potential of LLMs across a variety of natural language processing (NLP) tasks. In this study, we selected the Mistral 7B language model as our primary LLM due to its superior performance, which surpasses that of LLAMA 2 13B across multiple benchmarks. By leveraging PEFT methods, we aimed to significantly reduce the cost of fine-tuning while maintaining high levels of performance.
Despite their advancements, LLMs often struggle with translation tasks for low-resource languages, particularly morphologically rich African languages. To address this, we employed customized prompt engineering techniques to enhance LLM translation capabilities for these languages.
Our experimentation focused on fine-tuning the Mistral 7B model to identify the best-performing ensemble using a custom prompt strategy. The results obtained from the fine-tuned Mistral 7B model were compared against several models: Serengeti, Gemma, Google Translate, and No Language Left Behind (NLLB). Specifically, Serengeti and Gemma were fine-tuned using the same custom prompt strategy as the Mistral model, while Google Translate and NLLB Gemma, which are pre-trained to handle English-to-Zulu and English-to-Xhosa translations, were evaluated directly on the test data set. This comparative analysis allowed us to assess the efficacy of the fine-tuned Mistral 7B model against both custom-tuned and pre-trained translation models.
LLMs have traditionally struggled to produce high-quality translations, especially for low-resource languages. Our experiments revealed that the key to improving translation performance lies in using the correct prompt during fine-tuning. We used the Mistral 7B model to develop a custom prompt that significantly enhanced translation quality for English-to-Zulu and English-to-Xhosa language pairs. After fine-tuning the Mistral 7B model for 30 GPU days, we compared its performance to the No Language Left Behind (NLLB) model and Google Translator API on the same test dataset. While NLLB achieved the highest scores across BLEU, G-Eval (cosine similarity), and Chrf++ (F1-score), our results demonstrated that Mistral 7B, with the custom prompt, still performed competitively.
Additionally, we showed that our prompt template can improve the translation accuracy of other models, such as Gemma and Serengeti, when applied to high-quality bilingual datasets. This demonstrates that our custom prompt strategy is adaptable across different model architectures, bilingual settings, and is highly effective in accelerating learning for low-resource language translation.