基于抽取技术的马拉地语文本摘要

Mrs. Kirti Pankaj Kakde, Dr. H. M. Padalikar
{"title":"基于抽取技术的马拉地语文本摘要","authors":"Mrs. Kirti Pankaj Kakde, Dr. H. M. Padalikar","doi":"10.35940/ijeat.e4200.0612523","DOIUrl":null,"url":null,"abstract":"Multilingualism has played a key role in India, where people speak and understand more than one language. Marathi, as one of the official languages inMaharashtra state, is often used in sources such as newspapers or blogs. However, manually summarizing bulky Marathi paragraphs or texts for easy comprehension can be challenging. To address this, text summarization becomes essential to make large documents easily readable and understandable. This research article focuses on single document text summarization using the Natural Language Processing (NLP) approach, a subfield of Artificial Intelligence. Automatic text summarization is employed to extract relevant information in a concise manner. Information Extraction is particularly useful when summarizing documents consisting of multiple sentences into three or four sentences. While extensive research has been conducted on English Text Summarization, the field of Marathi document summarization remains largely unexplored. This research paper explores extractive text summarization techniques specifically for Marathi documents, utilizing the LexRank algorithm along with Genism, a graph-based technique, to generate informative summaries within word limit constraints. The experiment was conducted on the IndicNLP Marathi news article dataset, resulting in 78% precision, 72% recall, and 75% F-measure using the frequency-based method, and 78% precision, 78% recall, and 78% F-measure using the Lex Rank algorithm.","PeriodicalId":13981,"journal":{"name":"International Journal of Engineering and Advanced Technology","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Marathi Text Summarization using Extractive Technique\",\"authors\":\"Mrs. Kirti Pankaj Kakde, Dr. H. M. Padalikar\",\"doi\":\"10.35940/ijeat.e4200.0612523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multilingualism has played a key role in India, where people speak and understand more than one language. Marathi, as one of the official languages inMaharashtra state, is often used in sources such as newspapers or blogs. However, manually summarizing bulky Marathi paragraphs or texts for easy comprehension can be challenging. To address this, text summarization becomes essential to make large documents easily readable and understandable. This research article focuses on single document text summarization using the Natural Language Processing (NLP) approach, a subfield of Artificial Intelligence. Automatic text summarization is employed to extract relevant information in a concise manner. Information Extraction is particularly useful when summarizing documents consisting of multiple sentences into three or four sentences. While extensive research has been conducted on English Text Summarization, the field of Marathi document summarization remains largely unexplored. This research paper explores extractive text summarization techniques specifically for Marathi documents, utilizing the LexRank algorithm along with Genism, a graph-based technique, to generate informative summaries within word limit constraints. The experiment was conducted on the IndicNLP Marathi news article dataset, resulting in 78% precision, 72% recall, and 75% F-measure using the frequency-based method, and 78% precision, 78% recall, and 78% F-measure using the Lex Rank algorithm.\",\"PeriodicalId\":13981,\"journal\":{\"name\":\"International Journal of Engineering and Advanced Technology\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Engineering and Advanced Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.35940/ijeat.e4200.0612523\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Engineering and Advanced Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35940/ijeat.e4200.0612523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

多语制在印度发挥了关键作用,人们会说并理解一种以上的语言。马拉地语作为马哈拉施特拉邦的官方语言之一,经常被用于报纸或博客等来源。然而,手动总结冗长的马拉地语段落或文本以方便理解可能具有挑战性。为了解决这个问题,文本摘要对于使大型文档易于阅读和理解变得至关重要。本文主要研究人工智能的一个分支——自然语言处理(NLP)方法在单文档文本摘要中的应用。采用自动文本摘要,以简洁的方式提取相关信息。在将由多个句子组成的文档总结为三句或四句时,信息提取特别有用。虽然对英语文本摘要进行了广泛的研究,但马拉地语文档摘要领域仍未得到广泛的探索。本研究论文探索了专门针对马拉地语文档的提取文本摘要技术,利用LexRank算法和Genism(一种基于图的技术)在字数限制约束下生成信息丰富的摘要。在IndicNLP马拉地语新闻文章数据集上进行了实验,使用基于频率的方法获得了78%的精度、72%的召回率和75%的F-measure,使用Lex Rank算法获得了78%的精度、78%的召回率和78%的F-measure。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Marathi Text Summarization using Extractive Technique
Multilingualism has played a key role in India, where people speak and understand more than one language. Marathi, as one of the official languages inMaharashtra state, is often used in sources such as newspapers or blogs. However, manually summarizing bulky Marathi paragraphs or texts for easy comprehension can be challenging. To address this, text summarization becomes essential to make large documents easily readable and understandable. This research article focuses on single document text summarization using the Natural Language Processing (NLP) approach, a subfield of Artificial Intelligence. Automatic text summarization is employed to extract relevant information in a concise manner. Information Extraction is particularly useful when summarizing documents consisting of multiple sentences into three or four sentences. While extensive research has been conducted on English Text Summarization, the field of Marathi document summarization remains largely unexplored. This research paper explores extractive text summarization techniques specifically for Marathi documents, utilizing the LexRank algorithm along with Genism, a graph-based technique, to generate informative summaries within word limit constraints. The experiment was conducted on the IndicNLP Marathi news article dataset, resulting in 78% precision, 72% recall, and 75% F-measure using the frequency-based method, and 78% precision, 78% recall, and 78% F-measure using the Lex Rank algorithm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信