Priyanka Pandey, Manju Khari, Raghavendra Kumar, Dac-Nhuong Le
{"title":"印地语Wordnet词集的自动生成","authors":"Priyanka Pandey, Manju Khari, Raghavendra Kumar, Dac-Nhuong Le","doi":"10.4018/IJNCR.2018040103","DOIUrl":null,"url":null,"abstract":"India is a land of 122 languages and numerous dialects. Lack of competent lexical resources for Indian languages is a ubiquitous fact, which negatively affects the development of tools for NLP of Indian languages. Recent advancements like the Indo WordNet project has significantly contributed to dealing with the scarcity of lexicons, but the progress and coverage is a matter of dispute. The bottlenecks, cost, time, and skilled lexicographers further slackens the progress. In this article, the authors propose a technique to automate the generation of lexical entries using a machine learning approach which visibly expedites the process of lexicon generation like WordNet. The reluctance to adopt an automated approach is majorly credited to a lack of accuracy, the inability to capture a regional touch of a language, incorrect back-translation, etc. To overcome this issue, the author will use Wikipedia to validate the synsets.","PeriodicalId":369881,"journal":{"name":"Int. J. Nat. Comput. Res.","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Automatic Generation of Synsets for Wordnet of Hindi Language\",\"authors\":\"Priyanka Pandey, Manju Khari, Raghavendra Kumar, Dac-Nhuong Le\",\"doi\":\"10.4018/IJNCR.2018040103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"India is a land of 122 languages and numerous dialects. Lack of competent lexical resources for Indian languages is a ubiquitous fact, which negatively affects the development of tools for NLP of Indian languages. Recent advancements like the Indo WordNet project has significantly contributed to dealing with the scarcity of lexicons, but the progress and coverage is a matter of dispute. The bottlenecks, cost, time, and skilled lexicographers further slackens the progress. In this article, the authors propose a technique to automate the generation of lexical entries using a machine learning approach which visibly expedites the process of lexicon generation like WordNet. The reluctance to adopt an automated approach is majorly credited to a lack of accuracy, the inability to capture a regional touch of a language, incorrect back-translation, etc. To overcome this issue, the author will use Wikipedia to validate the synsets.\",\"PeriodicalId\":369881,\"journal\":{\"name\":\"Int. J. Nat. Comput. Res.\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Nat. Comput. Res.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/IJNCR.2018040103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Nat. Comput. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/IJNCR.2018040103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatic Generation of Synsets for Wordnet of Hindi Language
India is a land of 122 languages and numerous dialects. Lack of competent lexical resources for Indian languages is a ubiquitous fact, which negatively affects the development of tools for NLP of Indian languages. Recent advancements like the Indo WordNet project has significantly contributed to dealing with the scarcity of lexicons, but the progress and coverage is a matter of dispute. The bottlenecks, cost, time, and skilled lexicographers further slackens the progress. In this article, the authors propose a technique to automate the generation of lexical entries using a machine learning approach which visibly expedites the process of lexicon generation like WordNet. The reluctance to adopt an automated approach is majorly credited to a lack of accuracy, the inability to capture a regional touch of a language, incorrect back-translation, etc. To overcome this issue, the author will use Wikipedia to validate the synsets.