{"title":"Inclusion of Wikipedia, a language specific knowledge resource to generate and update a synset in WordNet","authors":"Priyanka Pandey, Amita Jain, Sunny Rai","doi":"10.1504/ijtpm.2019.10025766","DOIUrl":null,"url":null,"abstract":"Lack of competent lexical resources is a ubiquitous fact that negatively affects the development of natural language processing tools for not so widely spoken languages. Recently, projects such as Indo WordNet have significantly reduced the scarcity of lexicons for Indian languages. However, their coverage is still a matter of concern. The cost and time incurred are other limiting factors. The reluctance to automate the process of lexicon generation is majorly credited to the poor precision of the generated synsets. In this paper, we strive to tackle these issues by incorporating language-specific knowledge resources which ensures the authenticity of the generated synsets along with the inclusion of endemic words. We propose a corpus-based approach for automated synset generation which visibly improves the quality of the generated synsets. The experiments performed on a manually created dataset of Hindi words provide a precision of 81.56% and an F-measure of more than 72%.","PeriodicalId":55889,"journal":{"name":"International Journal of Technology, Policy and Management","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Technology, Policy and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1504/ijtpm.2019.10025766","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Business, Management and Accounting","Score":null,"Total":0}
引用次数: 0
Abstract
Lack of competent lexical resources is a ubiquitous fact that negatively affects the development of natural language processing tools for not so widely spoken languages. Recently, projects such as Indo WordNet have significantly reduced the scarcity of lexicons for Indian languages. However, their coverage is still a matter of concern. The cost and time incurred are other limiting factors. The reluctance to automate the process of lexicon generation is majorly credited to the poor precision of the generated synsets. In this paper, we strive to tackle these issues by incorporating language-specific knowledge resources which ensures the authenticity of the generated synsets along with the inclusion of endemic words. We propose a corpus-based approach for automated synset generation which visibly improves the quality of the generated synsets. The experiments performed on a manually created dataset of Hindi words provide a precision of 81.56% and an F-measure of more than 72%.
期刊介绍:
IJTPM is a refereed international journal that provides a professional and scholarly forum in the emerging field of decision making and problem solving in the integrated area of technology policy and management at the operational, organisational and public policy levels.