使用监督学习和自然语言处理的抽取文本摘要

Sarita Mandal, P. Achary, Shubhada Phalke, K. Poorvaja, Madhuri Kulkarni
{"title":"使用监督学习和自然语言处理的抽取文本摘要","authors":"Sarita Mandal, P. Achary, Shubhada Phalke, K. Poorvaja, Madhuri Kulkarni","doi":"10.1109/CONIT51480.2021.9498322","DOIUrl":null,"url":null,"abstract":"The amount of textual data that we are exposed to is growing each day. It is very difficult to browse through all the available textual matter to find relevant material or to read through all the information in order to stay updated. To keep up with the pace, the need for a tool that can automatically reduce the amount of content while also retaining the key points and essence of long pieces of text arises. Automatic text summarization mechanisms form a solution well suited to this problem which is what our proposed model aims to implement. In this paper, a Natural Language Processing based extractive approach is used for summarization of a single document. An extractive summary is assembled by selection of a subset of information rich sentences from the source document. A supervised approach is used here in which Support Vector Machine, K-Nearest Neighbour and Decision Tree algorithms are used to generate models whose performances are compared using ROUGE metric. The highest scoring model is used to summarize an unseen document. The summary is displayed as text and converted to audio form. The results obtained using the proposed approach are sufficiently good as average F1 scores secured for ROUGE-1, ROUGE-2 and ROUGE-L are 0.706, 0.630 and 0.434 respectively.","PeriodicalId":426131,"journal":{"name":"2021 International Conference on Intelligent Technologies (CONIT)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Extractive Text Summarization Using Supervised Learning and Natural Language Processing\",\"authors\":\"Sarita Mandal, P. Achary, Shubhada Phalke, K. Poorvaja, Madhuri Kulkarni\",\"doi\":\"10.1109/CONIT51480.2021.9498322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The amount of textual data that we are exposed to is growing each day. It is very difficult to browse through all the available textual matter to find relevant material or to read through all the information in order to stay updated. To keep up with the pace, the need for a tool that can automatically reduce the amount of content while also retaining the key points and essence of long pieces of text arises. Automatic text summarization mechanisms form a solution well suited to this problem which is what our proposed model aims to implement. In this paper, a Natural Language Processing based extractive approach is used for summarization of a single document. An extractive summary is assembled by selection of a subset of information rich sentences from the source document. A supervised approach is used here in which Support Vector Machine, K-Nearest Neighbour and Decision Tree algorithms are used to generate models whose performances are compared using ROUGE metric. The highest scoring model is used to summarize an unseen document. The summary is displayed as text and converted to audio form. The results obtained using the proposed approach are sufficiently good as average F1 scores secured for ROUGE-1, ROUGE-2 and ROUGE-L are 0.706, 0.630 and 0.434 respectively.\",\"PeriodicalId\":426131,\"journal\":{\"name\":\"2021 International Conference on Intelligent Technologies (CONIT)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Intelligent Technologies (CONIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONIT51480.2021.9498322\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT51480.2021.9498322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

我们接触到的文本数据量每天都在增长。通过浏览所有可用的文本材料来找到相关材料或通过阅读所有信息来保持更新是非常困难的。为了跟上节奏,需要一种工具,可以自动减少内容量,同时保留长文本的关键点和本质。自动文本摘要机制形成了一个非常适合这个问题的解决方案,这也是我们提出的模型的目标。本文采用一种基于自然语言处理的提取方法对单个文档进行摘要。通过从源文档中选择信息丰富的句子子集来组装提取摘要。这里使用了一种监督方法,其中使用支持向量机,k近邻和决策树算法来生成模型,其性能使用ROUGE度量进行比较。得分最高的模型用于总结未见过的文档。摘要显示为文本并转换为音频形式。使用该方法获得的结果足够好,ROUGE-1、ROUGE-2和ROUGE-L的平均F1分数分别为0.706、0.630和0.434。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Extractive Text Summarization Using Supervised Learning and Natural Language Processing
The amount of textual data that we are exposed to is growing each day. It is very difficult to browse through all the available textual matter to find relevant material or to read through all the information in order to stay updated. To keep up with the pace, the need for a tool that can automatically reduce the amount of content while also retaining the key points and essence of long pieces of text arises. Automatic text summarization mechanisms form a solution well suited to this problem which is what our proposed model aims to implement. In this paper, a Natural Language Processing based extractive approach is used for summarization of a single document. An extractive summary is assembled by selection of a subset of information rich sentences from the source document. A supervised approach is used here in which Support Vector Machine, K-Nearest Neighbour and Decision Tree algorithms are used to generate models whose performances are compared using ROUGE metric. The highest scoring model is used to summarize an unseen document. The summary is displayed as text and converted to audio form. The results obtained using the proposed approach are sufficiently good as average F1 scores secured for ROUGE-1, ROUGE-2 and ROUGE-L are 0.706, 0.630 and 0.434 respectively.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信