Automatic Text summarization in Gujarati language

2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC) Pub Date : 2022-12-15 DOI:10.1109/iSSSC56467.2022.10051338

Harsh Mehta, S. Bharti, Nishant Doshi

{"title":"Automatic Text summarization in Gujarati language","authors":"Harsh Mehta, S. Bharti, Nishant Doshi","doi":"10.1109/iSSSC56467.2022.10051338","DOIUrl":null,"url":null,"abstract":"Automatic text summarization is an essential part of Natural language processing(NLP), a subpart of the Artificial Intelligence domain. Widespread usage of text summarization is due to the massive usage of the internet in every aspect of life. In this research article, we perform a statistical text summarization technique on Gujarati text which is one of the resource-poor South Asian languages. We have performed TF-IDF, LSA, and LDA methods on our custom dataset. We evaluated our summary using the rouge score using 10%,20%, and 30% compression ratios. We have used Rouge-1, Rouge-2, Rouge-w, and Rouge-l to measure the accuracy, and LDA gets the highest rouge score among other methods. All the results are formed in table format with an individual rouge score and an average rouge score of all the methods. This article aims to analyze the performance of the unsupervised method in automatic text summarization methods of Gujarati language without any pre-processing technique. Sentences are selected using a concept-based method based on outside information [4], [5]. The title matching in the main text is covered by the topic-based idea. When the title words are identical, the phrase receives a good grade. If not, the sentence in question won’t be included in the summary [6], [7]. Depending on the topic, the cluster-based method organises comparable sentences. In this procedure, cluster counts must be specified [8] –[11]. The similarity notion is the foundation of the graph-based method. It compares the similarity of all the words and determines the best phrases using those results. Numerous studies on graph-based approaches have been conducted [12] through [16].","PeriodicalId":334645,"journal":{"name":"2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSSSC56467.2022.10051338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Automatic text summarization is an essential part of Natural language processing(NLP), a subpart of the Artificial Intelligence domain. Widespread usage of text summarization is due to the massive usage of the internet in every aspect of life. In this research article, we perform a statistical text summarization technique on Gujarati text which is one of the resource-poor South Asian languages. We have performed TF-IDF, LSA, and LDA methods on our custom dataset. We evaluated our summary using the rouge score using 10%,20%, and 30% compression ratios. We have used Rouge-1, Rouge-2, Rouge-w, and Rouge-l to measure the accuracy, and LDA gets the highest rouge score among other methods. All the results are formed in table format with an individual rouge score and an average rouge score of all the methods. This article aims to analyze the performance of the unsupervised method in automatic text summarization methods of Gujarati language without any pre-processing technique. Sentences are selected using a concept-based method based on outside information [4], [5]. The title matching in the main text is covered by the topic-based idea. When the title words are identical, the phrase receives a good grade. If not, the sentence in question won’t be included in the summary [6], [7]. Depending on the topic, the cluster-based method organises comparable sentences. In this procedure, cluster counts must be specified [8] –[11]. The similarity notion is the foundation of the graph-based method. It compares the similarity of all the words and determines the best phrases using those results. Numerous studies on graph-based approaches have been conducted [12] through [16].

查看原文本刊更多论文

自动文本摘要在古吉拉特语

自动文本摘要是自然语言处理(NLP)的重要组成部分，是人工智能领域的一个分支。文本摘要的广泛使用是由于互联网在生活的各个方面的大量使用。在这篇研究文章中，我们对资源贫乏的南亚语言之一古吉拉特语文本进行了统计文本摘要技术。我们在自定义数据集上执行了TF-IDF、LSA和LDA方法。我们使用10%、20%和30%压缩比的胭脂评分来评估我们的总结。我们用rouge -1, rouge -2, rouge -w和rouge -1来衡量准确率，LDA在其他方法中获得了最高的rouge得分。所有结果以表格形式形成，其中包含单个胭脂评分和所有方法的平均胭脂评分。本文旨在分析无监督方法在没有任何预处理技术的古吉拉特语自动文本摘要方法中的性能。采用基于外部信息的基于概念的方法选择句子[4]，[5]。正文中的标题匹配由基于主题的思想覆盖。当标题词相同时，该短语获得高分。否则，该句子将不会被收录在摘要中[6]，[7]。基于聚类的方法根据主题组织可比较的句子。在此过程中，必须指定集群计数[8]-[11]。相似性概念是基于图的方法的基础。它比较所有单词的相似度，并根据这些结果确定最佳短语。基于图的方法已经进行了大量的研究[12]到[16]。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC)

自引率

0.00%

发文量