基于排序的自动抽取文本摘要语言模型

Pooja Gupta, S. Nigam, R. Singh
{"title":"基于排序的自动抽取文本摘要语言模型","authors":"Pooja Gupta, S. Nigam, R. Singh","doi":"10.1109/ICAITPR51569.2022.9844187","DOIUrl":null,"url":null,"abstract":"Increased availability of the Internet and social media has created another ‘world of data’ comprised of text, audio and video files. It is very difficult for a user to get the accurate summary or to comprehend the relevant and important items from the available media. Additionally, readers or evaluators of these data files are interested only in the relevant content or summary to be retrieved in the less duration from the source files. Automatic text summarization (ATS) is the only way to summarize single or multiple documents to obtain relevant content from the source files. Available ATS systems generate bad summaries and take a lot of time and space for long documents due to inaccurate encoding. Therefore, in this work, we have introduced an approach for extractive text summarization using sentence ranking. Experiments have been performed over BBC and CNN news datasets and evaluated in terms of ROUGE using N-gram Language Model. The quantitative values of the metrics show the effectiveness of the proposed approach for news datasets.","PeriodicalId":262409,"journal":{"name":"2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A Ranking based Language Model for Automatic Extractive Text Summarization\",\"authors\":\"Pooja Gupta, S. Nigam, R. Singh\",\"doi\":\"10.1109/ICAITPR51569.2022.9844187\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Increased availability of the Internet and social media has created another ‘world of data’ comprised of text, audio and video files. It is very difficult for a user to get the accurate summary or to comprehend the relevant and important items from the available media. Additionally, readers or evaluators of these data files are interested only in the relevant content or summary to be retrieved in the less duration from the source files. Automatic text summarization (ATS) is the only way to summarize single or multiple documents to obtain relevant content from the source files. Available ATS systems generate bad summaries and take a lot of time and space for long documents due to inaccurate encoding. Therefore, in this work, we have introduced an approach for extractive text summarization using sentence ranking. Experiments have been performed over BBC and CNN news datasets and evaluated in terms of ROUGE using N-gram Language Model. The quantitative values of the metrics show the effectiveness of the proposed approach for news datasets.\",\"PeriodicalId\":262409,\"journal\":{\"name\":\"2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAITPR51569.2022.9844187\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 First International Conference on Artificial Intelligence Trends and Pattern Recognition (ICAITPR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAITPR51569.2022.9844187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

互联网和社交媒体的日益普及创造了另一个由文本、音频和视频文件组成的“数据世界”。用户很难从现有的媒体中获得准确的摘要或理解相关的重要项目。此外,这些数据文件的读者或评估者只对在较短时间内从源文件中检索到的相关内容或摘要感兴趣。自动文本摘要(ATS)是对单个或多个文档进行摘要以从源文件中获取相关内容的唯一方法。由于编码不准确,现有的ATS系统会生成糟糕的摘要,并且对于长文档占用大量时间和空间。因此,在这项工作中,我们引入了一种使用句子排序的提取文本摘要方法。在BBC和CNN新闻数据集上进行了实验,并使用N-gram语言模型对ROUGE进行了评估。度量的定量值显示了所提出的方法对新闻数据集的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Ranking based Language Model for Automatic Extractive Text Summarization
Increased availability of the Internet and social media has created another ‘world of data’ comprised of text, audio and video files. It is very difficult for a user to get the accurate summary or to comprehend the relevant and important items from the available media. Additionally, readers or evaluators of these data files are interested only in the relevant content or summary to be retrieved in the less duration from the source files. Automatic text summarization (ATS) is the only way to summarize single or multiple documents to obtain relevant content from the source files. Available ATS systems generate bad summaries and take a lot of time and space for long documents due to inaccurate encoding. Therefore, in this work, we have introduced an approach for extractive text summarization using sentence ranking. Experiments have been performed over BBC and CNN news datasets and evaluated in terms of ROUGE using N-gram Language Model. The quantitative values of the metrics show the effectiveness of the proposed approach for news datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信