Padhma Muniraj , K.R. Sabarmathi , R. Leelavathi , Saravana Balaji B
{"title":"HNTSumm:音译新闻文章的混合文本摘要","authors":"Padhma Muniraj , K.R. Sabarmathi , R. Leelavathi , Saravana Balaji B","doi":"10.1016/j.ijin.2023.03.001","DOIUrl":null,"url":null,"abstract":"<div><p>Data generated from social networking sites, blogs, digital magazines, and news websites is the largest human-generated data. Summarization is the process of extracting the crux of a document which when done manually can be tedious and deluging. Automatic text summarization is an approach that encapsulates long documents into a few sentences or words by enwrapping the gist and the principal information of the document. With the growth of social networking sites, eBooks, and e-Papers, the prevalence of transliterated words in text corpora is also on the rise. In this paper, we propose a word embeddings-based algorithm called HNTSumm by combining the advantages of unsupervised and supervised learning methods. The proposed algorithm HNTSumm algorithm is an imminent method for automatic text summarization of huge volumes of data that can learn word embeddings for words transliterated from other languages to English by utilizing weighted word embeddings from a Neural Embedding Model. Further, the amalgamation of extractive and abstractive approaches yields a concise and unambiguous summary of the text documents as the extractive approach eliminates redundant information. We employ a hybrid version of the Sequence-to-sequence models to generate an abstractive summary for the transliterated words. The feasibility of this algorithm was evaluated using two different news summary datasets and the accuracy scores were computed with the ROUGE evaluation metric. Experimental results corroborate the higher performance of the proposed algorithm and show HNTSumm outperforms relevant state-of-the-art algorithms for datasets with transliterated words.</p></div>","PeriodicalId":100702,"journal":{"name":"International Journal of Intelligent Networks","volume":"4 ","pages":"Pages 53-61"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"HNTSumm: Hybrid text summarization of transliterated news articles\",\"authors\":\"Padhma Muniraj , K.R. Sabarmathi , R. Leelavathi , Saravana Balaji B\",\"doi\":\"10.1016/j.ijin.2023.03.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Data generated from social networking sites, blogs, digital magazines, and news websites is the largest human-generated data. Summarization is the process of extracting the crux of a document which when done manually can be tedious and deluging. Automatic text summarization is an approach that encapsulates long documents into a few sentences or words by enwrapping the gist and the principal information of the document. With the growth of social networking sites, eBooks, and e-Papers, the prevalence of transliterated words in text corpora is also on the rise. In this paper, we propose a word embeddings-based algorithm called HNTSumm by combining the advantages of unsupervised and supervised learning methods. The proposed algorithm HNTSumm algorithm is an imminent method for automatic text summarization of huge volumes of data that can learn word embeddings for words transliterated from other languages to English by utilizing weighted word embeddings from a Neural Embedding Model. Further, the amalgamation of extractive and abstractive approaches yields a concise and unambiguous summary of the text documents as the extractive approach eliminates redundant information. We employ a hybrid version of the Sequence-to-sequence models to generate an abstractive summary for the transliterated words. The feasibility of this algorithm was evaluated using two different news summary datasets and the accuracy scores were computed with the ROUGE evaluation metric. Experimental results corroborate the higher performance of the proposed algorithm and show HNTSumm outperforms relevant state-of-the-art algorithms for datasets with transliterated words.</p></div>\",\"PeriodicalId\":100702,\"journal\":{\"name\":\"International Journal of Intelligent Networks\",\"volume\":\"4 \",\"pages\":\"Pages 53-61\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Intelligent Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666603023000027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Intelligent Networks","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666603023000027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
HNTSumm: Hybrid text summarization of transliterated news articles
Data generated from social networking sites, blogs, digital magazines, and news websites is the largest human-generated data. Summarization is the process of extracting the crux of a document which when done manually can be tedious and deluging. Automatic text summarization is an approach that encapsulates long documents into a few sentences or words by enwrapping the gist and the principal information of the document. With the growth of social networking sites, eBooks, and e-Papers, the prevalence of transliterated words in text corpora is also on the rise. In this paper, we propose a word embeddings-based algorithm called HNTSumm by combining the advantages of unsupervised and supervised learning methods. The proposed algorithm HNTSumm algorithm is an imminent method for automatic text summarization of huge volumes of data that can learn word embeddings for words transliterated from other languages to English by utilizing weighted word embeddings from a Neural Embedding Model. Further, the amalgamation of extractive and abstractive approaches yields a concise and unambiguous summary of the text documents as the extractive approach eliminates redundant information. We employ a hybrid version of the Sequence-to-sequence models to generate an abstractive summary for the transliterated words. The feasibility of this algorithm was evaluated using two different news summary datasets and the accuracy scores were computed with the ROUGE evaluation metric. Experimental results corroborate the higher performance of the proposed algorithm and show HNTSumm outperforms relevant state-of-the-art algorithms for datasets with transliterated words.