Jilin Chen, Jun Yan, Benyu Zhang, Qiang Yang, Zheng Chen
{"title":"Diverse Topic Phrase Extraction through Latent Semantic Analysis","authors":"Jilin Chen, Jun Yan, Benyu Zhang, Qiang Yang, Zheng Chen","doi":"10.1109/ICDM.2006.61","DOIUrl":null,"url":null,"abstract":"We propose a novel algorithm for extracting diverse topic phrases in order to provide summary for large corpora. Previous works often ignore the importance of diversity and thus extract phrases crowded on some hot topics while failing to cover other less obvious but important topics. We solve this problem through document re-weighting and phrase diversification by using latent semantic analysis (LSA). Experiments on various datasets show that our new algorithm can improve relevance as well as diversity over different topics for topic phrase extraction problems.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sixth International Conference on Data Mining (ICDM'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2006.61","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
We propose a novel algorithm for extracting diverse topic phrases in order to provide summary for large corpora. Previous works often ignore the importance of diversity and thus extract phrases crowded on some hot topics while failing to cover other less obvious but important topics. We solve this problem through document re-weighting and phrase diversification by using latent semantic analysis (LSA). Experiments on various datasets show that our new algorithm can improve relevance as well as diversity over different topics for topic phrase extraction problems.