Extractive Text Summarization System for News Texts

International Journal of Applied Mathematics Electronics and Computers Pub Date : 2020-12-31 DOI:10.18100/ijamec.800905

Fahrettin Horasan, Burhan Bilen

{"title":"Extractive Text Summarization System for News Texts","authors":"Fahrettin Horasan, Burhan Bilen","doi":"10.18100/ijamec.800905","DOIUrl":null,"url":null,"abstract":"In today's conditions, it is difficult to obtain information quickly and efficiently due to the size of the data. There are various text documents on the internet and a good extraction algorithm is essential to have the most relevant information from them. Long texts can be boring sometimes. So, readers are eager to get the main idea of the text or any useful information. For this reason, the importance of automatic summarization systems is understood. Text summarization systems can be considered as abstractive summarization or extractive summarization. While abstractive systems produce a summary with new sentences, extractive systems make a selection of sentences from the text used and combine them and present them as a summary. Creating a successful summarization algorithm increases in direct proportion to the success of applying text mining techniques. Text summary systems provide a summary of the text to the user by scoring words and sentences in the main text using various methods and combining high ranked sentences as a result of the process. In this context, many scoring methods have been used. In our study, news data sets are used. The algorithm used is based on extraction and has been evaluated using a task-independent method. After evaluation, the two highest scores taken are ROUGE-1 with 0.68 score and ROUGE-S with 0.54 score. Through all evaluation steps, Precision, Recall and F-Measure values are also specified to see the steps clearly. This is an open access article under the CC BY-SA 4.0 license. (https://creativecommons.org/licenses/by-sa/4.0/)","PeriodicalId":120305,"journal":{"name":"International Journal of Applied Mathematics Electronics and Computers","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Applied Mathematics Electronics and Computers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18100/ijamec.800905","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In today's conditions, it is difficult to obtain information quickly and efficiently due to the size of the data. There are various text documents on the internet and a good extraction algorithm is essential to have the most relevant information from them. Long texts can be boring sometimes. So, readers are eager to get the main idea of the text or any useful information. For this reason, the importance of automatic summarization systems is understood. Text summarization systems can be considered as abstractive summarization or extractive summarization. While abstractive systems produce a summary with new sentences, extractive systems make a selection of sentences from the text used and combine them and present them as a summary. Creating a successful summarization algorithm increases in direct proportion to the success of applying text mining techniques. Text summary systems provide a summary of the text to the user by scoring words and sentences in the main text using various methods and combining high ranked sentences as a result of the process. In this context, many scoring methods have been used. In our study, news data sets are used. The algorithm used is based on extraction and has been evaluated using a task-independent method. After evaluation, the two highest scores taken are ROUGE-1 with 0.68 score and ROUGE-S with 0.54 score. Through all evaluation steps, Precision, Recall and F-Measure values are also specified to see the steps clearly. This is an open access article under the CC BY-SA 4.0 license. (https://creativecommons.org/licenses/by-sa/4.0/)

查看原文本刊更多论文

新闻文本抽取摘要系统

在今天的条件下，由于数据的大小，很难快速有效地获取信息。互联网上有各种各样的文本文档，一个好的提取算法对于从中获得最相关的信息至关重要。长文本有时会很无聊。因此，读者渴望得到文章的主旨或任何有用的信息。因此，自动摘要系统的重要性是可以理解的。文本摘要系统可以分为抽象摘要和抽取摘要两种。抽象系统用新句子生成摘要，而抽取系统从使用的文本中选择句子并将它们组合在一起，并将它们作为摘要呈现。创建一个成功的摘要算法与应用文本挖掘技术的成功成正比。文本摘要系统通过使用各种方法对主要文本中的单词和句子进行评分，并将排名较高的句子组合在一起，从而向用户提供文本摘要。在这种情况下，使用了许多评分方法。在我们的研究中，使用了新闻数据集。所使用的算法基于提取，并使用任务独立方法进行了评估。经评价，得分最高的两个是ROUGE-1和ROUGE-S，分别为0.68分和0.54分。通过所有评估步骤，还指定了Precision, Recall和F-Measure值，以便清楚地看到步骤。这是一篇基于CC BY-SA 4.0许可的开放获取文章。(https://creativecommons.org/licenses/by-sa/4.0/)

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Applied Mathematics Electronics and Computers

自引率

0.00%

发文量