{"title":"基于mdl的快速多样性汇总方法","authors":"N. Vanetik, Marina Litvak","doi":"10.1109/WI.2018.00-17","DOIUrl":null,"url":null,"abstract":"Automated text summarization extracts essential information from original text and presents it in a predefined number of words. In this paper, we introduce an unsupervised extractive summarization approach that takes its roots from the SLIM dataset compression algorithm [1] based on the Minimum Description Length (MDL) principle [2], [3]. Our approach represents text as a transactional dataset, where sentences are transactions and normalized words are items. We use the SLIM algorithm (SLIM is not an abbreviation, it is Dutch word for 'smart') to solve the main bottleneck of the MDL computation, which is the generation of all frequent itemsets as a first step of the model construction. Additionally, we add a diversity constraint to the model in order to decrease appearance of repeated information in a summary. We introduce DRIM (Diversed SLIM) algorithm that performs unsupervised summarization, both generic and query-based, and does not require parameter tuning. We evaluate our summarizer on texts in English, but it can be easily extended to other languages.","PeriodicalId":405966,"journal":{"name":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"21 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"DRIM: MDL-Based Approach for Fast Diverse Summarization\",\"authors\":\"N. Vanetik, Marina Litvak\",\"doi\":\"10.1109/WI.2018.00-17\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automated text summarization extracts essential information from original text and presents it in a predefined number of words. In this paper, we introduce an unsupervised extractive summarization approach that takes its roots from the SLIM dataset compression algorithm [1] based on the Minimum Description Length (MDL) principle [2], [3]. Our approach represents text as a transactional dataset, where sentences are transactions and normalized words are items. We use the SLIM algorithm (SLIM is not an abbreviation, it is Dutch word for 'smart') to solve the main bottleneck of the MDL computation, which is the generation of all frequent itemsets as a first step of the model construction. Additionally, we add a diversity constraint to the model in order to decrease appearance of repeated information in a summary. We introduce DRIM (Diversed SLIM) algorithm that performs unsupervised summarization, both generic and query-based, and does not require parameter tuning. We evaluate our summarizer on texts in English, but it can be easily extended to other languages.\",\"PeriodicalId\":405966,\"journal\":{\"name\":\"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)\",\"volume\":\"21 7\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI.2018.00-17\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2018.00-17","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
DRIM: MDL-Based Approach for Fast Diverse Summarization
Automated text summarization extracts essential information from original text and presents it in a predefined number of words. In this paper, we introduce an unsupervised extractive summarization approach that takes its roots from the SLIM dataset compression algorithm [1] based on the Minimum Description Length (MDL) principle [2], [3]. Our approach represents text as a transactional dataset, where sentences are transactions and normalized words are items. We use the SLIM algorithm (SLIM is not an abbreviation, it is Dutch word for 'smart') to solve the main bottleneck of the MDL computation, which is the generation of all frequent itemsets as a first step of the model construction. Additionally, we add a diversity constraint to the model in order to decrease appearance of repeated information in a summary. We introduce DRIM (Diversed SLIM) algorithm that performs unsupervised summarization, both generic and query-based, and does not require parameter tuning. We evaluate our summarizer on texts in English, but it can be easily extended to other languages.