印尼新闻自动总结在基础设施发展主题中使用5W+1H考虑

Rendra Budi Hutama, Ali Ridho Barakbah, Afrida Helen
{"title":"印尼新闻自动总结在基础设施发展主题中使用5W+1H考虑","authors":"Rendra Budi Hutama, Ali Ridho Barakbah, Afrida Helen","doi":"10.1109/KCIC.2017.8228596","DOIUrl":null,"url":null,"abstract":"With an average reading speed of 200–500 words per minute, at least human takes 2 to 3 minutes to read and understand one news in online media. The number of news updates on an online media in a few minutes can be a lot and it's time-consuming if a reader has to read the contents of all the news. Reading a summary that represents the main idea of the news can be a solution to save time. This study considers the 5W + 1H element in generating news summaries because this element is important in a news. The single news from online media pages is taken by scanning and grabbing process which is further will be sanitized, then segmentation and tokenizing to break the news into sentences and words. Each sentence classified into multi-label whether it contains 5W + 1H (What, Who, Where, When, Why and/or How) or nothing else by using training data that has been built. Sentences containing 5W + 1H will be selected as summary sentences. Testing of summary results shows the average precision 91%, recall 67% and f-measure 76%.","PeriodicalId":117148,"journal":{"name":"2017 International Electronics Symposium on Knowledge Creation and Intelligent Computing (IES-KCIC)","volume":"207 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Indonesian news auto summarization in infrastructure development topic using 5W+1H consideration\",\"authors\":\"Rendra Budi Hutama, Ali Ridho Barakbah, Afrida Helen\",\"doi\":\"10.1109/KCIC.2017.8228596\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With an average reading speed of 200–500 words per minute, at least human takes 2 to 3 minutes to read and understand one news in online media. The number of news updates on an online media in a few minutes can be a lot and it's time-consuming if a reader has to read the contents of all the news. Reading a summary that represents the main idea of the news can be a solution to save time. This study considers the 5W + 1H element in generating news summaries because this element is important in a news. The single news from online media pages is taken by scanning and grabbing process which is further will be sanitized, then segmentation and tokenizing to break the news into sentences and words. Each sentence classified into multi-label whether it contains 5W + 1H (What, Who, Where, When, Why and/or How) or nothing else by using training data that has been built. Sentences containing 5W + 1H will be selected as summary sentences. Testing of summary results shows the average precision 91%, recall 67% and f-measure 76%.\",\"PeriodicalId\":117148,\"journal\":{\"name\":\"2017 International Electronics Symposium on Knowledge Creation and Intelligent Computing (IES-KCIC)\",\"volume\":\"207 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Electronics Symposium on Knowledge Creation and Intelligent Computing (IES-KCIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KCIC.2017.8228596\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Electronics Symposium on Knowledge Creation and Intelligent Computing (IES-KCIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KCIC.2017.8228596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在平均每分钟200-500字的阅读速度下,阅读和理解一篇网络媒体新闻至少需要2 - 3分钟。在线媒体在几分钟内更新的新闻数量可能很多,如果读者必须阅读所有新闻的内容,这是非常耗时的。阅读代表新闻主旨的摘要可以节省时间。本研究考虑5W + 1H元素生成新闻摘要,因为这个元素在新闻中很重要。从网络媒体页面中获取单个新闻,通过扫描和抓取过程,进一步将其净化,然后分割和标记,将新闻分解为句子和单词。通过使用已构建的训练数据,将每个句子分类为多标签,无论它是否包含5W + 1H (What, Who, Where, When, Why and/or How)或没有其他内容。包含5W + 1H的句子将被选为总结句。对总结结果的测试表明,平均精度为91%,召回率为67%,f-measure为76%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Indonesian news auto summarization in infrastructure development topic using 5W+1H consideration
With an average reading speed of 200–500 words per minute, at least human takes 2 to 3 minutes to read and understand one news in online media. The number of news updates on an online media in a few minutes can be a lot and it's time-consuming if a reader has to read the contents of all the news. Reading a summary that represents the main idea of the news can be a solution to save time. This study considers the 5W + 1H element in generating news summaries because this element is important in a news. The single news from online media pages is taken by scanning and grabbing process which is further will be sanitized, then segmentation and tokenizing to break the news into sentences and words. Each sentence classified into multi-label whether it contains 5W + 1H (What, Who, Where, When, Why and/or How) or nothing else by using training data that has been built. Sentences containing 5W + 1H will be selected as summary sentences. Testing of summary results shows the average precision 91%, recall 67% and f-measure 76%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信