Jie Mei, Xinxin Kou, Zhimin Yao, A. Rau-Chaplin, Aminul Islam, A. Mohammad, E. Milios
{"title":"基于共现词相关度的高效计算","authors":"Jie Mei, Xinxin Kou, Zhimin Yao, A. Rau-Chaplin, Aminul Islam, A. Mohammad, E. Milios","doi":"10.1145/2682571.2797088","DOIUrl":null,"url":null,"abstract":"Measuring document relatedness using unsupervised co-occurrence based word relatedness methods is a processing-time and memory consuming task. This paper introduces the application of compact data structures for efficient computation of word relatedness based on corpus statistics. The data structure is used to efficiently lookup: (1) the corpus statistics for the Common Word Relatedness Approach, (2) the pairwise word relatedness for the Algorithm Specific Word Relatedness Approach. These two approaches significantly accelerate the processing time of word relatedness methods and reduce the space cost of storing co-occurrence statistics in memory, making text mining tasks like classification and clustering based on word relatedness practical.","PeriodicalId":106339,"journal":{"name":"Proceedings of the 2015 ACM Symposium on Document Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Efficient Computation of Co-occurrence Based Word Relatedness\",\"authors\":\"Jie Mei, Xinxin Kou, Zhimin Yao, A. Rau-Chaplin, Aminul Islam, A. Mohammad, E. Milios\",\"doi\":\"10.1145/2682571.2797088\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Measuring document relatedness using unsupervised co-occurrence based word relatedness methods is a processing-time and memory consuming task. This paper introduces the application of compact data structures for efficient computation of word relatedness based on corpus statistics. The data structure is used to efficiently lookup: (1) the corpus statistics for the Common Word Relatedness Approach, (2) the pairwise word relatedness for the Algorithm Specific Word Relatedness Approach. These two approaches significantly accelerate the processing time of word relatedness methods and reduce the space cost of storing co-occurrence statistics in memory, making text mining tasks like classification and clustering based on word relatedness practical.\",\"PeriodicalId\":106339,\"journal\":{\"name\":\"Proceedings of the 2015 ACM Symposium on Document Engineering\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 ACM Symposium on Document Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2682571.2797088\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM Symposium on Document Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2682571.2797088","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient Computation of Co-occurrence Based Word Relatedness
Measuring document relatedness using unsupervised co-occurrence based word relatedness methods is a processing-time and memory consuming task. This paper introduces the application of compact data structures for efficient computation of word relatedness based on corpus statistics. The data structure is used to efficiently lookup: (1) the corpus statistics for the Common Word Relatedness Approach, (2) the pairwise word relatedness for the Algorithm Specific Word Relatedness Approach. These two approaches significantly accelerate the processing time of word relatedness methods and reduce the space cost of storing co-occurrence statistics in memory, making text mining tasks like classification and clustering based on word relatedness practical.