{"title":"Automatic Text summarization in Gujarati language","authors":"Harsh Mehta, S. Bharti, Nishant Doshi","doi":"10.1109/iSSSC56467.2022.10051338","DOIUrl":null,"url":null,"abstract":"Automatic text summarization is an essential part of Natural language processing(NLP), a subpart of the Artificial Intelligence domain. Widespread usage of text summarization is due to the massive usage of the internet in every aspect of life. In this research article, we perform a statistical text summarization technique on Gujarati text which is one of the resource-poor South Asian languages. We have performed TF-IDF, LSA, and LDA methods on our custom dataset. We evaluated our summary using the rouge score using 10%,20%, and 30% compression ratios. We have used Rouge-1, Rouge-2, Rouge-w, and Rouge-l to measure the accuracy, and LDA gets the highest rouge score among other methods. All the results are formed in table format with an individual rouge score and an average rouge score of all the methods. This article aims to analyze the performance of the unsupervised method in automatic text summarization methods of Gujarati language without any pre-processing technique. Sentences are selected using a concept-based method based on outside information [4], [5]. The title matching in the main text is covered by the topic-based idea. When the title words are identical, the phrase receives a good grade. If not, the sentence in question won’t be included in the summary [6], [7]. Depending on the topic, the cluster-based method organises comparable sentences. In this procedure, cluster counts must be specified [8] –[11]. The similarity notion is the foundation of the graph-based method. It compares the similarity of all the words and determines the best phrases using those results. Numerous studies on graph-based approaches have been conducted [12] through [16].","PeriodicalId":334645,"journal":{"name":"2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 2nd International Symposium on Sustainable Energy, Signal Processing and Cyber Security (iSSSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/iSSSC56467.2022.10051338","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Automatic text summarization is an essential part of Natural language processing(NLP), a subpart of the Artificial Intelligence domain. Widespread usage of text summarization is due to the massive usage of the internet in every aspect of life. In this research article, we perform a statistical text summarization technique on Gujarati text which is one of the resource-poor South Asian languages. We have performed TF-IDF, LSA, and LDA methods on our custom dataset. We evaluated our summary using the rouge score using 10%,20%, and 30% compression ratios. We have used Rouge-1, Rouge-2, Rouge-w, and Rouge-l to measure the accuracy, and LDA gets the highest rouge score among other methods. All the results are formed in table format with an individual rouge score and an average rouge score of all the methods. This article aims to analyze the performance of the unsupervised method in automatic text summarization methods of Gujarati language without any pre-processing technique. Sentences are selected using a concept-based method based on outside information [4], [5]. The title matching in the main text is covered by the topic-based idea. When the title words are identical, the phrase receives a good grade. If not, the sentence in question won’t be included in the summary [6], [7]. Depending on the topic, the cluster-based method organises comparable sentences. In this procedure, cluster counts must be specified [8] –[11]. The similarity notion is the foundation of the graph-based method. It compares the similarity of all the words and determines the best phrases using those results. Numerous studies on graph-based approaches have been conducted [12] through [16].