Sudeshna Sani, Samudra Vijaya, Suryakanth V Gangashetty
{"title":"印度语言 MT 方法调查:MTC 的挑战、可用性和平行语料库的制作、政府政策和研究方向","authors":"Sudeshna Sani, Samudra Vijaya, Suryakanth V Gangashetty","doi":"10.12785/ijcds/1501107","DOIUrl":null,"url":null,"abstract":": Since 1991, machine translation has been a prominent research area in India, with IIT Kanpur pioneering the original work which has since been expanded to several universities. Only 10 percent of India’s 1.3 billion inhabitants can read, write, and speak English with varying degrees of competence, which makes machine translation crucial in overcoming the linguistic barrier to the internet. The Indian market for commercial products and events is greatly influenced by local languages, making the development and translation of region-based content an essential research topic nowadays. However, Indic-to-Indic language direct translation has faced several challenges and is still going through the experimental phase. Several government-sponsored projects are being undertaken in this regard. Still, there are limited sentence-aligned parallel bi-text resources available for the majority of Indian language pairs. This paper presents a detailed survey of the current trends of research on machine translation between Indian languages, along with their challenges over time. It also presents a timeline of recent research conducted and key findings of past surveys conducted over a decade. Under a single canopy, this paper provides sources of data, the progress made in developing datasets for low-resource Indian languages, various models of translation, encouragement from Indian Govt., and finally, new research directions.","PeriodicalId":37180,"journal":{"name":"International Journal of Computing and Digital Systems","volume":"9 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Survey on the MT Methods for Indian Languages: MT\\nChallenges, Availability, and Production of Parallel Corpora,\\nGovernment Policies and Research Directions\",\"authors\":\"Sudeshna Sani, Samudra Vijaya, Suryakanth V Gangashetty\",\"doi\":\"10.12785/ijcds/1501107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Since 1991, machine translation has been a prominent research area in India, with IIT Kanpur pioneering the original work which has since been expanded to several universities. Only 10 percent of India’s 1.3 billion inhabitants can read, write, and speak English with varying degrees of competence, which makes machine translation crucial in overcoming the linguistic barrier to the internet. The Indian market for commercial products and events is greatly influenced by local languages, making the development and translation of region-based content an essential research topic nowadays. However, Indic-to-Indic language direct translation has faced several challenges and is still going through the experimental phase. Several government-sponsored projects are being undertaken in this regard. Still, there are limited sentence-aligned parallel bi-text resources available for the majority of Indian language pairs. This paper presents a detailed survey of the current trends of research on machine translation between Indian languages, along with their challenges over time. It also presents a timeline of recent research conducted and key findings of past surveys conducted over a decade. Under a single canopy, this paper provides sources of data, the progress made in developing datasets for low-resource Indian languages, various models of translation, encouragement from Indian Govt., and finally, new research directions.\",\"PeriodicalId\":37180,\"journal\":{\"name\":\"International Journal of Computing and Digital Systems\",\"volume\":\"9 5\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computing and Digital Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.12785/ijcds/1501107\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computing and Digital Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12785/ijcds/1501107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Survey on the MT Methods for Indian Languages: MT
Challenges, Availability, and Production of Parallel Corpora,
Government Policies and Research Directions
: Since 1991, machine translation has been a prominent research area in India, with IIT Kanpur pioneering the original work which has since been expanded to several universities. Only 10 percent of India’s 1.3 billion inhabitants can read, write, and speak English with varying degrees of competence, which makes machine translation crucial in overcoming the linguistic barrier to the internet. The Indian market for commercial products and events is greatly influenced by local languages, making the development and translation of region-based content an essential research topic nowadays. However, Indic-to-Indic language direct translation has faced several challenges and is still going through the experimental phase. Several government-sponsored projects are being undertaken in this regard. Still, there are limited sentence-aligned parallel bi-text resources available for the majority of Indian language pairs. This paper presents a detailed survey of the current trends of research on machine translation between Indian languages, along with their challenges over time. It also presents a timeline of recent research conducted and key findings of past surveys conducted over a decade. Under a single canopy, this paper provides sources of data, the progress made in developing datasets for low-resource Indian languages, various models of translation, encouragement from Indian Govt., and finally, new research directions.