{"title":"Exploring the Effectiveness of GPT-3 in Translating Specialized Religious Text from Arabic to English: A Comparative Study with Human Translation","authors":"Maysa Banat, Yasmine Abu Adla","doi":"10.48185/jtls.v4i2.762","DOIUrl":null,"url":null,"abstract":"In recent years, Natural Language Processing (NLP) models such as Generative Pre-trained Transformer 3 (GPT-3) have shown remarkable improvements in various language-related tasks, including machine translation. However, most studies that have evaluated the performance of NLP models in translation tasks have focused on general-purpose text, leaving the evaluation of their effectiveness in handling specialized text to be relatively unexplored. Therefore, this study aimed to evaluate the effectiveness of GPT-3 in translating specialized Arabic text to English and compare its performance to human translation. To achieve this goal, the study selected ten chapters from a specialized book written in Arabic, covering topics in specialized religious context. The chapters were translated by a professional human translator and by GPT-3 using its translation Application Programming Interface. The translation performance of GPT-3 to was compared to human translation using qualitative measures, specifically the Direct Assessment method. Additionally, the translations were evaluated using two different evaluation metrics, Bidirectional Encoder Representations from Transformers (BERT) score and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric, which measure the similarity between the translated text and the reference text.The qualitative results show that GPT produced generally understandable translations but failed to capture nuances and cultural context. On the other hand, the quantitative results of the study showed that GPT-3 was able to achieve a relatively high level of accuracy in translating specialized religious text, with comparable scores to human translations in some cases. Specifically, the BERT score of GPT-3 translations was 0.83. The study also found that the Rouge score failed to fully reflect the capabilities of GPT-3 in translating specialized text.Overall, the findings of this study suggest that GPT-3 has promising potential as a translation tool for specialized religious text, but further research is needed to improve its capabilities and address its limitations.","PeriodicalId":53294,"journal":{"name":"International Journal of English Language and Translation Studies","volume":"76 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of English Language and Translation Studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48185/jtls.v4i2.762","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, Natural Language Processing (NLP) models such as Generative Pre-trained Transformer 3 (GPT-3) have shown remarkable improvements in various language-related tasks, including machine translation. However, most studies that have evaluated the performance of NLP models in translation tasks have focused on general-purpose text, leaving the evaluation of their effectiveness in handling specialized text to be relatively unexplored. Therefore, this study aimed to evaluate the effectiveness of GPT-3 in translating specialized Arabic text to English and compare its performance to human translation. To achieve this goal, the study selected ten chapters from a specialized book written in Arabic, covering topics in specialized religious context. The chapters were translated by a professional human translator and by GPT-3 using its translation Application Programming Interface. The translation performance of GPT-3 to was compared to human translation using qualitative measures, specifically the Direct Assessment method. Additionally, the translations were evaluated using two different evaluation metrics, Bidirectional Encoder Representations from Transformers (BERT) score and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric, which measure the similarity between the translated text and the reference text.The qualitative results show that GPT produced generally understandable translations but failed to capture nuances and cultural context. On the other hand, the quantitative results of the study showed that GPT-3 was able to achieve a relatively high level of accuracy in translating specialized religious text, with comparable scores to human translations in some cases. Specifically, the BERT score of GPT-3 translations was 0.83. The study also found that the Rouge score failed to fully reflect the capabilities of GPT-3 in translating specialized text.Overall, the findings of this study suggest that GPT-3 has promising potential as a translation tool for specialized religious text, but further research is needed to improve its capabilities and address its limitations.