使用监督学习和自然语言处理的抽取文本摘要

2021 International Conference on Intelligent Technologies (CONIT) Pub Date : 2021-06-25 DOI:10.1109/CONIT51480.2021.9498322

Sarita Mandal, P. Achary, Shubhada Phalke, K. Poorvaja, Madhuri Kulkarni

{"title":"使用监督学习和自然语言处理的抽取文本摘要","authors":"Sarita Mandal, P. Achary, Shubhada Phalke, K. Poorvaja, Madhuri Kulkarni","doi":"10.1109/CONIT51480.2021.9498322","DOIUrl":null,"url":null,"abstract":"The amount of textual data that we are exposed to is growing each day. It is very difficult to browse through all the available textual matter to find relevant material or to read through all the information in order to stay updated. To keep up with the pace, the need for a tool that can automatically reduce the amount of content while also retaining the key points and essence of long pieces of text arises. Automatic text summarization mechanisms form a solution well suited to this problem which is what our proposed model aims to implement. In this paper, a Natural Language Processing based extractive approach is used for summarization of a single document. An extractive summary is assembled by selection of a subset of information rich sentences from the source document. A supervised approach is used here in which Support Vector Machine, K-Nearest Neighbour and Decision Tree algorithms are used to generate models whose performances are compared using ROUGE metric. The highest scoring model is used to summarize an unseen document. The summary is displayed as text and converted to audio form. The results obtained using the proposed approach are sufficiently good as average F1 scores secured for ROUGE-1, ROUGE-2 and ROUGE-L are 0.706, 0.630 and 0.434 respectively.","PeriodicalId":426131,"journal":{"name":"2021 International Conference on Intelligent Technologies (CONIT)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Extractive Text Summarization Using Supervised Learning and Natural Language Processing\",\"authors\":\"Sarita Mandal, P. Achary, Shubhada Phalke, K. Poorvaja, Madhuri Kulkarni\",\"doi\":\"10.1109/CONIT51480.2021.9498322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The amount of textual data that we are exposed to is growing each day. It is very difficult to browse through all the available textual matter to find relevant material or to read through all the information in order to stay updated. To keep up with the pace, the need for a tool that can automatically reduce the amount of content while also retaining the key points and essence of long pieces of text arises. Automatic text summarization mechanisms form a solution well suited to this problem which is what our proposed model aims to implement. In this paper, a Natural Language Processing based extractive approach is used for summarization of a single document. An extractive summary is assembled by selection of a subset of information rich sentences from the source document. A supervised approach is used here in which Support Vector Machine, K-Nearest Neighbour and Decision Tree algorithms are used to generate models whose performances are compared using ROUGE metric. The highest scoring model is used to summarize an unseen document. The summary is displayed as text and converted to audio form. The results obtained using the proposed approach are sufficiently good as average F1 scores secured for ROUGE-1, ROUGE-2 and ROUGE-L are 0.706, 0.630 and 0.434 respectively.\",\"PeriodicalId\":426131,\"journal\":{\"name\":\"2021 International Conference on Intelligent Technologies (CONIT)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Intelligent Technologies (CONIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONIT51480.2021.9498322\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT51480.2021.9498322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

我们接触到的文本数据量每天都在增长。通过浏览所有可用的文本材料来找到相关材料或通过阅读所有信息来保持更新是非常困难的。为了跟上节奏，需要一种工具，可以自动减少内容量，同时保留长文本的关键点和本质。自动文本摘要机制形成了一个非常适合这个问题的解决方案，这也是我们提出的模型的目标。本文采用一种基于自然语言处理的提取方法对单个文档进行摘要。通过从源文档中选择信息丰富的句子子集来组装提取摘要。这里使用了一种监督方法，其中使用支持向量机，k近邻和决策树算法来生成模型，其性能使用ROUGE度量进行比较。得分最高的模型用于总结未见过的文档。摘要显示为文本并转换为音频形式。使用该方法获得的结果足够好，ROUGE-1、ROUGE-2和ROUGE-L的平均F1分数分别为0.706、0.630和0.434。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Extractive Text Summarization Using Supervised Learning and Natural Language Processing

The amount of textual data that we are exposed to is growing each day. It is very difficult to browse through all the available textual matter to find relevant material or to read through all the information in order to stay updated. To keep up with the pace, the need for a tool that can automatically reduce the amount of content while also retaining the key points and essence of long pieces of text arises. Automatic text summarization mechanisms form a solution well suited to this problem which is what our proposed model aims to implement. In this paper, a Natural Language Processing based extractive approach is used for summarization of a single document. An extractive summary is assembled by selection of a subset of information rich sentences from the source document. A supervised approach is used here in which Support Vector Machine, K-Nearest Neighbour and Decision Tree algorithms are used to generate models whose performances are compared using ROUGE metric. The highest scoring model is used to summarize an unseen document. The summary is displayed as text and converted to audio form. The results obtained using the proposed approach are sufficiently good as average F1 scores secured for ROUGE-1, ROUGE-2 and ROUGE-L are 0.706, 0.630 and 0.434 respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 International Conference on Intelligent Technologies (CONIT)

自引率

0.00%

发文量