{"title":"机器音译方法的进展、限制、挑战、应用和未来方向","authors":"A’la Syauqi , Aji Prasetya Wibawa","doi":"10.1016/j.nlp.2025.100158","DOIUrl":null,"url":null,"abstract":"<div><div>Machine transliteration is critical in natural language processing (NLP), facilitating script conversion while preserving phonetic integrity across diverse languages. Using the PRISMA framework, this review analyzes 73 selected studies on machine transliteration, covering both methodological advancements and its role in NLP applications. Among these, 37 studies focus on transliteration methods (rule-based, statistical, machine learning, hybrid, and semantic), while 32 studies explore their application in NLP tasks such as machine translation, sentiment analysis, and text normalization. Rule-based methods provide structured frameworks but face challenges in adapting to linguistic variability. Statistical techniques demonstrate robustness yet depend heavily on the availability of parallel corpora. Machine learning models leverage neural architectures to achieve high accuracy but are constrained by data scarcity for low-resource languages. Hybrid approaches integrate multiple methodologies, while semantic knowledge-based models enhance accuracy by incorporating linguistic features. The review highlights transliteration’s role in NLP applications such as machine translation, sentiment analysis, and text normalization, which are critical for improving multilingual language accessibility. Findings show that machine learning-based approaches dominate transliteration research (32 of 73 studies), followed by rule-based and hybrid methods. These approaches contribute to improving multilingual accessibility and NLP performance. This study provides actionable insights for researchers and practitioners by synthesizing advancements and identifying challenges. These insights enable the development more efficient and inclusive transliteration systems, ultimately supporting linguistic diversity and advancing multilingual NLP technologies. The review identifies gaps in addressing underrepresented languages like Javanese, where complex character sets, orthographic rules, and scriptio continua remain underexplored.</div></div>","PeriodicalId":100944,"journal":{"name":"Natural Language Processing Journal","volume":"11 ","pages":"Article 100158"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Advances in machine transliteration methods, limitations, challenges, applications and future directions\",\"authors\":\"A’la Syauqi , Aji Prasetya Wibawa\",\"doi\":\"10.1016/j.nlp.2025.100158\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Machine transliteration is critical in natural language processing (NLP), facilitating script conversion while preserving phonetic integrity across diverse languages. Using the PRISMA framework, this review analyzes 73 selected studies on machine transliteration, covering both methodological advancements and its role in NLP applications. Among these, 37 studies focus on transliteration methods (rule-based, statistical, machine learning, hybrid, and semantic), while 32 studies explore their application in NLP tasks such as machine translation, sentiment analysis, and text normalization. Rule-based methods provide structured frameworks but face challenges in adapting to linguistic variability. Statistical techniques demonstrate robustness yet depend heavily on the availability of parallel corpora. Machine learning models leverage neural architectures to achieve high accuracy but are constrained by data scarcity for low-resource languages. Hybrid approaches integrate multiple methodologies, while semantic knowledge-based models enhance accuracy by incorporating linguistic features. The review highlights transliteration’s role in NLP applications such as machine translation, sentiment analysis, and text normalization, which are critical for improving multilingual language accessibility. Findings show that machine learning-based approaches dominate transliteration research (32 of 73 studies), followed by rule-based and hybrid methods. These approaches contribute to improving multilingual accessibility and NLP performance. This study provides actionable insights for researchers and practitioners by synthesizing advancements and identifying challenges. These insights enable the development more efficient and inclusive transliteration systems, ultimately supporting linguistic diversity and advancing multilingual NLP technologies. The review identifies gaps in addressing underrepresented languages like Javanese, where complex character sets, orthographic rules, and scriptio continua remain underexplored.</div></div>\",\"PeriodicalId\":100944,\"journal\":{\"name\":\"Natural Language Processing Journal\",\"volume\":\"11 \",\"pages\":\"Article 100158\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Language Processing Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949719125000342\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Processing Journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949719125000342","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Advances in machine transliteration methods, limitations, challenges, applications and future directions
Machine transliteration is critical in natural language processing (NLP), facilitating script conversion while preserving phonetic integrity across diverse languages. Using the PRISMA framework, this review analyzes 73 selected studies on machine transliteration, covering both methodological advancements and its role in NLP applications. Among these, 37 studies focus on transliteration methods (rule-based, statistical, machine learning, hybrid, and semantic), while 32 studies explore their application in NLP tasks such as machine translation, sentiment analysis, and text normalization. Rule-based methods provide structured frameworks but face challenges in adapting to linguistic variability. Statistical techniques demonstrate robustness yet depend heavily on the availability of parallel corpora. Machine learning models leverage neural architectures to achieve high accuracy but are constrained by data scarcity for low-resource languages. Hybrid approaches integrate multiple methodologies, while semantic knowledge-based models enhance accuracy by incorporating linguistic features. The review highlights transliteration’s role in NLP applications such as machine translation, sentiment analysis, and text normalization, which are critical for improving multilingual language accessibility. Findings show that machine learning-based approaches dominate transliteration research (32 of 73 studies), followed by rule-based and hybrid methods. These approaches contribute to improving multilingual accessibility and NLP performance. This study provides actionable insights for researchers and practitioners by synthesizing advancements and identifying challenges. These insights enable the development more efficient and inclusive transliteration systems, ultimately supporting linguistic diversity and advancing multilingual NLP technologies. The review identifies gaps in addressing underrepresented languages like Javanese, where complex character sets, orthographic rules, and scriptio continua remain underexplored.