Extractive Summarization using Cohesion Network Analysis and Submodular Set Functions

Valentin Sergiu Cioaca, M. Dascalu, D. McNamara
{"title":"Extractive Summarization using Cohesion Network Analysis and Submodular Set Functions","authors":"Valentin Sergiu Cioaca, M. Dascalu, D. McNamara","doi":"10.1109/SYNASC51798.2020.00035","DOIUrl":null,"url":null,"abstract":"Numerous approaches have been introduced to automate the process of text summarization, but only few can be easily adapted to multiple languages. This paper introduces a multilingual text processing pipeline integrated in the open-source ReaderBench framework, which can be retrofit to cover more than 50 languages. While considering the extensibility of the approach and the problem of missing labeled data for training in various languages besides English, an unsupervised algorithm was preferred to perform extractive summarization (i.e., select the most representative sentences from the original document). Specifically, two different approaches relying on text cohesion were implemented: a) a graph-based text representation derived from Cohesion Network Analysis that extends TextRank, and b) a class of submodular set functions. Evaluations were performed on the DUC dataset and use as baseline the implementation of TextRank from Gensim. Our results using the submodular set functions outperform the baseline. In addition, two use cases on English and Romanian languages are presented, with corresponding graphical representations for the two methods.","PeriodicalId":278104,"journal":{"name":"2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","volume":"103 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC51798.2020.00035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Numerous approaches have been introduced to automate the process of text summarization, but only few can be easily adapted to multiple languages. This paper introduces a multilingual text processing pipeline integrated in the open-source ReaderBench framework, which can be retrofit to cover more than 50 languages. While considering the extensibility of the approach and the problem of missing labeled data for training in various languages besides English, an unsupervised algorithm was preferred to perform extractive summarization (i.e., select the most representative sentences from the original document). Specifically, two different approaches relying on text cohesion were implemented: a) a graph-based text representation derived from Cohesion Network Analysis that extends TextRank, and b) a class of submodular set functions. Evaluations were performed on the DUC dataset and use as baseline the implementation of TextRank from Gensim. Our results using the submodular set functions outperform the baseline. In addition, two use cases on English and Romanian languages are presented, with corresponding graphical representations for the two methods.
基于内聚网络分析和子模集函数的抽取摘要
已经引入了许多方法来自动化文本摘要过程,但只有少数方法可以轻松地适应多种语言。本文介绍了一个集成在开源ReaderBench框架中的多语言文本处理管道,该管道可以扩展到50多种语言。同时考虑到该方法的可扩展性以及在除英语以外的多种语言中训练缺少标记数据的问题,我们更倾向于使用无监督算法进行抽取摘要(即从原始文档中选择最具代表性的句子)。具体来说,实现了两种依赖于文本内聚的不同方法:a)从扩展TextRank的内聚网络分析派生的基于图的文本表示,b)一类子模块集合函数。对DUC数据集进行评估,并使用Gensim的TextRank作为基线。我们使用子模集合函数的结果优于基线。此外,还介绍了英语和罗马尼亚语的两个用例,并为这两种方法提供了相应的图形表示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信