Leading Sentence News TextRank

Phua Yeong Tsann, Yew Kwang Hooi, Mohd Fadzil bin Hassan, Matthew Teow Yok Wooi
{"title":"Leading Sentence News TextRank","authors":"Phua Yeong Tsann, Yew Kwang Hooi, Mohd Fadzil bin Hassan, Matthew Teow Yok Wooi","doi":"10.1109/ICICyTA53712.2021.9689186","DOIUrl":null,"url":null,"abstract":"Application of automatic text summarization is a popular Natural Language Processing task and often used in extracting lengthy content to produce short summary. This is a tedious yet time-consuming task. This study focuses on Malay news articles with the aim to select representative sentences for Malay news headline generation. The dataset used in the experiment is a collection of multi-genre Malay news published between year of 2017 and 2019 from Bernama.com. In this study, a leading sentence approach is applied in the TextRank with TF-IDF and Word2Vec as language models to perform salient sentence extraction. In the experiment, the top-ranking sentences extracted are based on the 15%, 20%, 25% and 30% of the original news content. The extracted contents are evaluation against the original news headline using ROUGE evaluation matric. The model shows that the inclusion of first sentence and first two sentences from the news are able to achieve significant improvement. This leading sentence approach is able to achieve improvement of the F1 score from 1.36 to 7.98. Besides that, the experiment also proofs that the ROUGE scores decrease as the percentage of extraction increase. Thus, the proposed method is fast and resource efficient as compared to other state-of-the-art Natural Language approach.","PeriodicalId":448148,"journal":{"name":"2021 International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICyTA53712.2021.9689186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Application of automatic text summarization is a popular Natural Language Processing task and often used in extracting lengthy content to produce short summary. This is a tedious yet time-consuming task. This study focuses on Malay news articles with the aim to select representative sentences for Malay news headline generation. The dataset used in the experiment is a collection of multi-genre Malay news published between year of 2017 and 2019 from Bernama.com. In this study, a leading sentence approach is applied in the TextRank with TF-IDF and Word2Vec as language models to perform salient sentence extraction. In the experiment, the top-ranking sentences extracted are based on the 15%, 20%, 25% and 30% of the original news content. The extracted contents are evaluation against the original news headline using ROUGE evaluation matric. The model shows that the inclusion of first sentence and first two sentences from the news are able to achieve significant improvement. This leading sentence approach is able to achieve improvement of the F1 score from 1.36 to 7.98. Besides that, the experiment also proofs that the ROUGE scores decrease as the percentage of extraction increase. Thus, the proposed method is fast and resource efficient as compared to other state-of-the-art Natural Language approach.
引子句新闻文本
文本自动摘要是自然语言处理中常用的一项任务,通常用于提取冗长的内容生成简短的摘要。这是一项乏味而耗时的任务。本研究的重点是马来语新闻文章,目的是选择马来语新闻标题生成的代表性句子。实验中使用的数据集是Bernama.com在2017年至2019年期间发布的多类型马来新闻的集合。本研究以TF-IDF和Word2Vec为语言模型,在TextRank中采用先导句方法进行显著句提取。在实验中,根据原新闻内容的15%、20%、25%和30%提取出排名靠前的句子。提取的内容使用ROUGE评价矩阵对原新闻标题进行评价。模型表明,从新闻中加入第一句和前两句能够取得显著的进步。这种引语的方法能够使F1分数从1.36提高到7.98。此外,实验还证明了ROUGE分数随着提取百分比的增加而降低。因此,与其他最先进的自然语言方法相比,所提出的方法速度快,资源高效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信