Marathi Text Summarization using Extractive Technique

International Journal of Engineering and Advanced Technology Pub Date : 2023-06-30 DOI:10.35940/ijeat.e4200.0612523

Mrs. Kirti Pankaj Kakde, Dr. H. M. Padalikar

{"title":"Marathi Text Summarization using Extractive Technique","authors":"Mrs. Kirti Pankaj Kakde, Dr. H. M. Padalikar","doi":"10.35940/ijeat.e4200.0612523","DOIUrl":null,"url":null,"abstract":"Multilingualism has played a key role in India, where people speak and understand more than one language. Marathi, as one of the official languages inMaharashtra state, is often used in sources such as newspapers or blogs. However, manually summarizing bulky Marathi paragraphs or texts for easy comprehension can be challenging. To address this, text summarization becomes essential to make large documents easily readable and understandable. This research article focuses on single document text summarization using the Natural Language Processing (NLP) approach, a subfield of Artificial Intelligence. Automatic text summarization is employed to extract relevant information in a concise manner. Information Extraction is particularly useful when summarizing documents consisting of multiple sentences into three or four sentences. While extensive research has been conducted on English Text Summarization, the field of Marathi document summarization remains largely unexplored. This research paper explores extractive text summarization techniques specifically for Marathi documents, utilizing the LexRank algorithm along with Genism, a graph-based technique, to generate informative summaries within word limit constraints. The experiment was conducted on the IndicNLP Marathi news article dataset, resulting in 78% precision, 72% recall, and 75% F-measure using the frequency-based method, and 78% precision, 78% recall, and 78% F-measure using the Lex Rank algorithm.","PeriodicalId":13981,"journal":{"name":"International Journal of Engineering and Advanced Technology","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Engineering and Advanced Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35940/ijeat.e4200.0612523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Multilingualism has played a key role in India, where people speak and understand more than one language. Marathi, as one of the official languages inMaharashtra state, is often used in sources such as newspapers or blogs. However, manually summarizing bulky Marathi paragraphs or texts for easy comprehension can be challenging. To address this, text summarization becomes essential to make large documents easily readable and understandable. This research article focuses on single document text summarization using the Natural Language Processing (NLP) approach, a subfield of Artificial Intelligence. Automatic text summarization is employed to extract relevant information in a concise manner. Information Extraction is particularly useful when summarizing documents consisting of multiple sentences into three or four sentences. While extensive research has been conducted on English Text Summarization, the field of Marathi document summarization remains largely unexplored. This research paper explores extractive text summarization techniques specifically for Marathi documents, utilizing the LexRank algorithm along with Genism, a graph-based technique, to generate informative summaries within word limit constraints. The experiment was conducted on the IndicNLP Marathi news article dataset, resulting in 78% precision, 72% recall, and 75% F-measure using the frequency-based method, and 78% precision, 78% recall, and 78% F-measure using the Lex Rank algorithm.

查看原文本刊更多论文

基于抽取技术的马拉地语文本摘要

多语制在印度发挥了关键作用，人们会说并理解一种以上的语言。马拉地语作为马哈拉施特拉邦的官方语言之一，经常被用于报纸或博客等来源。然而，手动总结冗长的马拉地语段落或文本以方便理解可能具有挑战性。为了解决这个问题，文本摘要对于使大型文档易于阅读和理解变得至关重要。本文主要研究人工智能的一个分支——自然语言处理(NLP)方法在单文档文本摘要中的应用。采用自动文本摘要，以简洁的方式提取相关信息。在将由多个句子组成的文档总结为三句或四句时，信息提取特别有用。虽然对英语文本摘要进行了广泛的研究，但马拉地语文档摘要领域仍未得到广泛的探索。本研究论文探索了专门针对马拉地语文档的提取文本摘要技术，利用LexRank算法和Genism(一种基于图的技术)在字数限制约束下生成信息丰富的摘要。在IndicNLP马拉地语新闻文章数据集上进行了实验，使用基于频率的方法获得了78%的精度、72%的召回率和75%的F-measure，使用Lex Rank算法获得了78%的精度、78%的召回率和78%的F-measure。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Engineering and Advanced Technology

自引率

0.00%

发文量