{"title":"Using Centroid Keywords and Word Mover's Distance for Single Document Extractive Summarization","authors":"Dauken Seitkali, R. Mussabayev","doi":"10.1145/3342827.3342852","DOIUrl":null,"url":null,"abstract":"This paper presents unsupervised method of single document extractive summarization. The main idea behind the method is in selecting sentences based on Word Mover's Distance Similarity between each sentence and set of centroid keywords. This approach leverages both compositional property of word embeddings and advantages of recently discovered powerful text to text distance metric. ROUGE results on DUC 2002 data set showed that quality of produced summaries can compete with well-known state of the art systems. In this work we also discuss limitations of gold summaries in evaluating quality of summarization systems.","PeriodicalId":254461,"journal":{"name":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3342827.3342852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
This paper presents unsupervised method of single document extractive summarization. The main idea behind the method is in selecting sentences based on Word Mover's Distance Similarity between each sentence and set of centroid keywords. This approach leverages both compositional property of word embeddings and advantages of recently discovered powerful text to text distance metric. ROUGE results on DUC 2002 data set showed that quality of produced summaries can compete with well-known state of the art systems. In this work we also discuss limitations of gold summaries in evaluating quality of summarization systems.