4W1H Keyword Extraction based Summarization Model

2021 International Conference on Electronics, Information, and Communication (ICEIC) Pub Date : 2021-01-31 DOI:10.1109/ICEIC51217.2021.9369820

Seungyeon Lee, Taewon Park, Minho Lee

{"title":"4W1H Keyword Extraction based Summarization Model","authors":"Seungyeon Lee, Taewon Park, Minho Lee","doi":"10.1109/ICEIC51217.2021.9369820","DOIUrl":null,"url":null,"abstract":"In this internet era, with rapidly growing online information, there is a need for automatic summarization of textual documents from plethora of available information, making it an interesting area of research. Automatic keyword extraction and text summarization are Natural Language Processing (NLP) tasks for extracting relevant information from the large text documents. 4W1H (Who, When, Where, What, How) keywords are crucial for sentence generation. Despite the potential of 4W1H keywords, there have not been approaches that utilize the keywords in NLP tasks, particularly summarization. In this paper, we propose a new summarization method based on 4W1H keywords extraction which extracts the answer to a question corresponding to each event in QA format. We apply our methods to BERT and ELECTRA models to generate a summary, which are well-known pre-trained Language Models (LMs) in NLP domain, as a baseline. In experiments, our 4W1H keyword extraction method shows promising performance on AI Hub**https://www.aihub.or.kr/aidata/86 Machine Reading Comprehension (MRC) dataset, recording an extraction performance of an F1-score as 84.93%. Moreover, we show the results of generating a rule-based summarization using keywords extracted with 4W1H.","PeriodicalId":170294,"journal":{"name":"2021 International Conference on Electronics, Information, and Communication (ICEIC)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Electronics, Information, and Communication (ICEIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICEIC51217.2021.9369820","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In this internet era, with rapidly growing online information, there is a need for automatic summarization of textual documents from plethora of available information, making it an interesting area of research. Automatic keyword extraction and text summarization are Natural Language Processing (NLP) tasks for extracting relevant information from the large text documents. 4W1H (Who, When, Where, What, How) keywords are crucial for sentence generation. Despite the potential of 4W1H keywords, there have not been approaches that utilize the keywords in NLP tasks, particularly summarization. In this paper, we propose a new summarization method based on 4W1H keywords extraction which extracts the answer to a question corresponding to each event in QA format. We apply our methods to BERT and ELECTRA models to generate a summary, which are well-known pre-trained Language Models (LMs) in NLP domain, as a baseline. In experiments, our 4W1H keyword extraction method shows promising performance on AI Hub**https://www.aihub.or.kr/aidata/86 Machine Reading Comprehension (MRC) dataset, recording an extraction performance of an F1-score as 84.93%. Moreover, we show the results of generating a rule-based summarization using keywords extracted with 4W1H.

查看原文本刊更多论文

基于关键词抽取的摘要模型

在这个互联网时代，随着在线信息的快速增长，人们需要从大量可用信息中自动总结文本文档，这使得它成为一个有趣的研究领域。自动关键字提取和文本摘要是从大型文本文档中提取相关信息的自然语言处理(NLP)任务。4W1H (Who, When, Where, What, How)关键词对句子生成至关重要。尽管4W1H关键字具有潜力，但目前还没有在NLP任务中利用这些关键字的方法，特别是摘要。本文提出了一种新的基于4W1H关键词提取的摘要方法，该方法将每个事件对应的问题的答案以QA格式提取出来。我们将我们的方法应用于BERT和ELECTRA模型来生成摘要，这些摘要是NLP领域中众所周知的预训练语言模型(LMs)，作为基线。在实验中，我们的4W1H关键字提取方法在AI Hub**https://www.aihub.or.kr/aidata/86机器阅读理解(MRC)数据集上表现出了良好的性能，提取性能达到了f1 - 93%。此外，我们还展示了使用4W1H提取的关键字生成基于规则的摘要的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 International Conference on Electronics, Information, and Communication (ICEIC)

自引率

0.00%

发文量