Summary Sentence Classification Using Stylometry

R. Shams, Robert E. Mercer
{"title":"Summary Sentence Classification Using Stylometry","authors":"R. Shams, Robert E. Mercer","doi":"10.1109/ICMLA.2015.181","DOIUrl":null,"url":null,"abstract":"Summary sentence classification is an important step to generate document surrogates known as summary extracts. The quality of an extract depends much on the correctness of this step. We aim to classify potential summary sentences using a statistical learning method that models sentences according to a linguistic technique which examines writing styles, known as Stylometry. The sentences in documents are represented using a novel set of stylometric attributes. For learning, an innovative two-stage classification is set up that comprises two learners in subsequent steps: k-Nearest Neighbour and Naive Bayes. We train and test the learners with the newswire documents collected from two benchmark datasets, viz., the CAST and the DUC2002 datasets. Extensive experimentation strongly suggests that our method has outstanding performance for the single document summarization task. However, its performance is mixed for classifying summary sentences from multiple documents. Finally, comparisons show that our method performs significantly better than most of the popular extractive summarization methods.","PeriodicalId":288427,"journal":{"name":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2015.181","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Summary sentence classification is an important step to generate document surrogates known as summary extracts. The quality of an extract depends much on the correctness of this step. We aim to classify potential summary sentences using a statistical learning method that models sentences according to a linguistic technique which examines writing styles, known as Stylometry. The sentences in documents are represented using a novel set of stylometric attributes. For learning, an innovative two-stage classification is set up that comprises two learners in subsequent steps: k-Nearest Neighbour and Naive Bayes. We train and test the learners with the newswire documents collected from two benchmark datasets, viz., the CAST and the DUC2002 datasets. Extensive experimentation strongly suggests that our method has outstanding performance for the single document summarization task. However, its performance is mixed for classifying summary sentences from multiple documents. Finally, comparisons show that our method performs significantly better than most of the popular extractive summarization methods.
用文体学进行摘要句分类
摘要句子分类是生成被称为摘要摘录的文档替代品的重要步骤。提取物的质量在很大程度上取决于这一步的正确性。我们的目标是使用统计学习方法对潜在的总结句进行分类,该方法根据一种检查写作风格的语言技术(称为文体学)对句子进行建模。文档中的句子使用一组新颖的文体属性来表示。对于学习,我们建立了一种创新的两阶段分类,它包括两个后续步骤的学习器:k近邻和朴素贝叶斯。我们使用从两个基准数据集(即CAST和DUC2002数据集)收集的新闻通讯社文档来训练和测试学习者。大量的实验强烈表明,我们的方法对于单个文档摘要任务具有出色的性能。然而,在对多个文档中的摘要句进行分类时,它的性能好坏参半。最后,对比表明,我们的方法明显优于大多数流行的提取摘要方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信