4W1H Keyword Extraction based Summarization Model

Seungyeon Lee, Taewon Park, Minho Lee
{"title":"4W1H Keyword Extraction based Summarization Model","authors":"Seungyeon Lee, Taewon Park, Minho Lee","doi":"10.1109/ICEIC51217.2021.9369820","DOIUrl":null,"url":null,"abstract":"In this internet era, with rapidly growing online information, there is a need for automatic summarization of textual documents from plethora of available information, making it an interesting area of research. Automatic keyword extraction and text summarization are Natural Language Processing (NLP) tasks for extracting relevant information from the large text documents. 4W1H (Who, When, Where, What, How) keywords are crucial for sentence generation. Despite the potential of 4W1H keywords, there have not been approaches that utilize the keywords in NLP tasks, particularly summarization. In this paper, we propose a new summarization method based on 4W1H keywords extraction which extracts the answer to a question corresponding to each event in QA format. We apply our methods to BERT and ELECTRA models to generate a summary, which are well-known pre-trained Language Models (LMs) in NLP domain, as a baseline. In experiments, our 4W1H keyword extraction method shows promising performance on AI Hub**https://www.aihub.or.kr/aidata/86 Machine Reading Comprehension (MRC) dataset, recording an extraction performance of an F1-score as 84.93%. Moreover, we show the results of generating a rule-based summarization using keywords extracted with 4W1H.","PeriodicalId":170294,"journal":{"name":"2021 International Conference on Electronics, Information, and Communication (ICEIC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Electronics, Information, and Communication (ICEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIC51217.2021.9369820","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In this internet era, with rapidly growing online information, there is a need for automatic summarization of textual documents from plethora of available information, making it an interesting area of research. Automatic keyword extraction and text summarization are Natural Language Processing (NLP) tasks for extracting relevant information from the large text documents. 4W1H (Who, When, Where, What, How) keywords are crucial for sentence generation. Despite the potential of 4W1H keywords, there have not been approaches that utilize the keywords in NLP tasks, particularly summarization. In this paper, we propose a new summarization method based on 4W1H keywords extraction which extracts the answer to a question corresponding to each event in QA format. We apply our methods to BERT and ELECTRA models to generate a summary, which are well-known pre-trained Language Models (LMs) in NLP domain, as a baseline. In experiments, our 4W1H keyword extraction method shows promising performance on AI Hub**https://www.aihub.or.kr/aidata/86 Machine Reading Comprehension (MRC) dataset, recording an extraction performance of an F1-score as 84.93%. Moreover, we show the results of generating a rule-based summarization using keywords extracted with 4W1H.
基于关键词抽取的摘要模型
在这个互联网时代,随着在线信息的快速增长,人们需要从大量可用信息中自动总结文本文档,这使得它成为一个有趣的研究领域。自动关键字提取和文本摘要是从大型文本文档中提取相关信息的自然语言处理(NLP)任务。4W1H (Who, When, Where, What, How)关键词对句子生成至关重要。尽管4W1H关键字具有潜力,但目前还没有在NLP任务中利用这些关键字的方法,特别是摘要。本文提出了一种新的基于4W1H关键词提取的摘要方法,该方法将每个事件对应的问题的答案以QA格式提取出来。我们将我们的方法应用于BERT和ELECTRA模型来生成摘要,这些摘要是NLP领域中众所周知的预训练语言模型(LMs),作为基线。在实验中,我们的4W1H关键字提取方法在AI Hub**https://www.aihub.or.kr/aidata/86机器阅读理解(MRC)数据集上表现出了良好的性能,提取性能达到了f1 - 93%。此外,我们还展示了使用4W1H提取的关键字生成基于规则的摘要的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信