Yong Zhang, Jinzhi Liao, Jiuyang Tang, W. Xiao, Yuheng Wang
{"title":"Extractive Document Summarization Based on Hierarchical GRU","authors":"Yong Zhang, Jinzhi Liao, Jiuyang Tang, W. Xiao, Yuheng Wang","doi":"10.1109/ICRIS.2018.00092","DOIUrl":null,"url":null,"abstract":"Neural network has provided an efficient approach for extractive document summarization, which means selecting sentences from the text to form the summary. However, there are two shortcomings about the conventional methods: they directly extract summary from the whole document which contains huge redundancy, and they neglect relations between abstraction and the document. The paper proposes TSERNN, a two-stage structure, the first of which is a key-sentence extraction, followed by the Recurrent Neural Network-based model to handle the extractive summarization of documents. In the extraction phase, it conceives a hybrid sentence similarity measure by combining sentence vector and Levenshtein distance, and integrates it into graph model to extract key sentences. In the second phase, it constructs GRU as basic blocks, and put the representation of entire document based on LDA as a feature to support summarization. Finally, the model is tested on CNN/Daily Mail corpus, and experimental results verify the accuracy and validity of the proposed method.","PeriodicalId":194515,"journal":{"name":"2018 International Conference on Robots & Intelligent System (ICRIS)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 International Conference on Robots & Intelligent System (ICRIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRIS.2018.00092","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Neural network has provided an efficient approach for extractive document summarization, which means selecting sentences from the text to form the summary. However, there are two shortcomings about the conventional methods: they directly extract summary from the whole document which contains huge redundancy, and they neglect relations between abstraction and the document. The paper proposes TSERNN, a two-stage structure, the first of which is a key-sentence extraction, followed by the Recurrent Neural Network-based model to handle the extractive summarization of documents. In the extraction phase, it conceives a hybrid sentence similarity measure by combining sentence vector and Levenshtein distance, and integrates it into graph model to extract key sentences. In the second phase, it constructs GRU as basic blocks, and put the representation of entire document based on LDA as a feature to support summarization. Finally, the model is tested on CNN/Daily Mail corpus, and experimental results verify the accuracy and validity of the proposed method.