{"title":"A Fast Algorithm of Computing Word Similarity","authors":"Xingyuan Chen, Xia Yang, Bingjun Su","doi":"10.1109/CIS.2013.92","DOIUrl":null,"url":null,"abstract":"Computing distributional similarity is an effective strategy for finding synonyms. The time complexity of the naive nearest-neighbor approach of computing distributional word similarity is O(n*n*m), it is inefficient for accurately representing synonymy using large corpus. We find a parse property of triple that the growth rate of average triples number of each word leveled off as corpus's size increases. Using this property we design a fast algorithm for computing word similarity whose time complexity is O(n*n). We demonstrate the efficiency of this algorithm based on the English Gig word corpus.","PeriodicalId":294223,"journal":{"name":"2013 Ninth International Conference on Computational Intelligence and Security","volume":"124 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Ninth International Conference on Computational Intelligence and Security","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIS.2013.92","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Computing distributional similarity is an effective strategy for finding synonyms. The time complexity of the naive nearest-neighbor approach of computing distributional word similarity is O(n*n*m), it is inefficient for accurately representing synonymy using large corpus. We find a parse property of triple that the growth rate of average triples number of each word leveled off as corpus's size increases. Using this property we design a fast algorithm for computing word similarity whose time complexity is O(n*n). We demonstrate the efficiency of this algorithm based on the English Gig word corpus.