HNTSumm: Hybrid text summarization of transliterated news articles

International Journal of Intelligent Networks Pub Date : 2023-01-01 DOI:10.1016/j.ijin.2023.03.001

Padhma Muniraj , K.R. Sabarmathi , R. Leelavathi , Saravana Balaji B

{"title":"HNTSumm: Hybrid text summarization of transliterated news articles","authors":"Padhma Muniraj , K.R. Sabarmathi , R. Leelavathi , Saravana Balaji B","doi":"10.1016/j.ijin.2023.03.001","DOIUrl":null,"url":null,"abstract":"<div><p>Data generated from social networking sites, blogs, digital magazines, and news websites is the largest human-generated data. Summarization is the process of extracting the crux of a document which when done manually can be tedious and deluging. Automatic text summarization is an approach that encapsulates long documents into a few sentences or words by enwrapping the gist and the principal information of the document. With the growth of social networking sites, eBooks, and e-Papers, the prevalence of transliterated words in text corpora is also on the rise. In this paper, we propose a word embeddings-based algorithm called HNTSumm by combining the advantages of unsupervised and supervised learning methods. The proposed algorithm HNTSumm algorithm is an imminent method for automatic text summarization of huge volumes of data that can learn word embeddings for words transliterated from other languages to English by utilizing weighted word embeddings from a Neural Embedding Model. Further, the amalgamation of extractive and abstractive approaches yields a concise and unambiguous summary of the text documents as the extractive approach eliminates redundant information. We employ a hybrid version of the Sequence-to-sequence models to generate an abstractive summary for the transliterated words. The feasibility of this algorithm was evaluated using two different news summary datasets and the accuracy scores were computed with the ROUGE evaluation metric. Experimental results corroborate the higher performance of the proposed algorithm and show HNTSumm outperforms relevant state-of-the-art algorithms for datasets with transliterated words.</p></div>","PeriodicalId":100702,"journal":{"name":"International Journal of Intelligent Networks","volume":"4 ","pages":"Pages 53-61"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Networks","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666603023000027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Data generated from social networking sites, blogs, digital magazines, and news websites is the largest human-generated data. Summarization is the process of extracting the crux of a document which when done manually can be tedious and deluging. Automatic text summarization is an approach that encapsulates long documents into a few sentences or words by enwrapping the gist and the principal information of the document. With the growth of social networking sites, eBooks, and e-Papers, the prevalence of transliterated words in text corpora is also on the rise. In this paper, we propose a word embeddings-based algorithm called HNTSumm by combining the advantages of unsupervised and supervised learning methods. The proposed algorithm HNTSumm algorithm is an imminent method for automatic text summarization of huge volumes of data that can learn word embeddings for words transliterated from other languages to English by utilizing weighted word embeddings from a Neural Embedding Model. Further, the amalgamation of extractive and abstractive approaches yields a concise and unambiguous summary of the text documents as the extractive approach eliminates redundant information. We employ a hybrid version of the Sequence-to-sequence models to generate an abstractive summary for the transliterated words. The feasibility of this algorithm was evaluated using two different news summary datasets and the accuracy scores were computed with the ROUGE evaluation metric. Experimental results corroborate the higher performance of the proposed algorithm and show HNTSumm outperforms relevant state-of-the-art algorithms for datasets with transliterated words.

查看原文本刊更多论文

HNTSumm：音译新闻文章的混合文本摘要

社交网站、博客、数字杂志和新闻网站产生的数据是人类产生的最大数据。摘要是提取文档关键的过程，当手动完成时，这可能会非常乏味和冗长。自动文本摘要是一种通过包裹文档的要点和主要信息，将长文档封装成几个句子或单词的方法。随着社交网站、电子书和电子论文的发展，音译词在文本语料库中的普及率也在上升。在本文中，我们结合无监督和有监督学习方法的优点，提出了一种基于单词嵌入的算法HNTSumm。所提出的算法HNTSumm算法是一种即将实现的对大量数据进行自动文本摘要的方法，该方法可以通过利用来自神经嵌入模型的加权单词嵌入来学习从其他语言音译为英语的单词的单词嵌入。此外，由于提取方法消除了多余的信息，提取方法和抽象方法的结合产生了文本文件的简明和明确的摘要。我们使用序列到序列模型的混合版本来生成音译词的摘要。使用两个不同的新闻摘要数据集评估了该算法的可行性，并使用ROUGE评估度量计算了准确性得分。实验结果证实了该算法的较高性能，并表明HNTSumm在音译词数据集上优于现有的相关算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Intelligent Networks

CiteScore

12.00

自引率

0.00%

发文量