Opinion Summarization of Bangla Texts using Cosine Simillarity Based Graph Ranking and Relevance Based Approach

2019 International Conference on Bangla Speech and Language Processing (ICBSLP) Pub Date : 2019-09-01 DOI:10.1109/ICBSLP47725.2019.201494

Shofi Ullah, Sagar Hossain, K. M. Azharul Hasan

{"title":"Opinion Summarization of Bangla Texts using Cosine Simillarity Based Graph Ranking and Relevance Based Approach","authors":"Shofi Ullah, Sagar Hossain, K. M. Azharul Hasan","doi":"10.1109/ICBSLP47725.2019.201494","DOIUrl":null,"url":null,"abstract":"The main idea of the automatic extractive text or opinion summarization is to find most important representative small subset of the original document without any loss of important information. There are many existing methods available for text summarization of English, Turkish, Arabic and other languages. But very few attempts has been done for Bangla language because of its having rich morphology and multifaceted structure. In this paper, we propose a joint cosine simillarity based graph ranking and Relevance based scoring and ranking approach for the summarization of bangla text. We developed a stemming algorithm based on Parts of Speech(POS) tagging consisting of around two lakhs POS tags for Bangla texts. A redundancy removal algorithm is also proposed to remove redundancy so that each sentences in the summary represents exactly the most important information in the document. The performance of the proposed approach is evaluated by measuring the recall, precision and f-score based on Rouge metric and it is also showed that proposed approach outperforms to other existing summarization methods for Bangla texts.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBSLP47725.2019.201494","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

The main idea of the automatic extractive text or opinion summarization is to find most important representative small subset of the original document without any loss of important information. There are many existing methods available for text summarization of English, Turkish, Arabic and other languages. But very few attempts has been done for Bangla language because of its having rich morphology and multifaceted structure. In this paper, we propose a joint cosine simillarity based graph ranking and Relevance based scoring and ranking approach for the summarization of bangla text. We developed a stemming algorithm based on Parts of Speech(POS) tagging consisting of around two lakhs POS tags for Bangla texts. A redundancy removal algorithm is also proposed to remove redundancy so that each sentences in the summary represents exactly the most important information in the document. The performance of the proposed approach is evaluated by measuring the recall, precision and f-score based on Rouge metric and it is also showed that proposed approach outperforms to other existing summarization methods for Bangla texts.

查看原文本刊更多论文

基于余弦相似度图排序和关联的孟加拉语文本意见摘要

自动提取文本或意见摘要的主要思想是在不丢失重要信息的情况下，找到原始文档中最重要的代表性小子集。现有的英语、土耳其语、阿拉伯语等语言的文本摘要方法有很多。但由于孟加拉语具有丰富的词法和多方面的结构，对其进行的尝试很少。在本文中，我们提出了一种基于余弦相似度的联合图排序和基于相关性的孟加拉语文本摘要评分排序方法。我们开发了一种基于词性标注的词干提取算法，该算法由大约20万个词性标注组成。提出了一种冗余去除算法，以消除冗余，使摘要中的每个句子都准确地代表了文档中最重要的信息。通过测量基于Rouge度量的查全率、查准率和f分数来评估本文方法的性能，并表明本文方法优于其他现有的孟加拉语文本摘要方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 International Conference on Bangla Speech and Language Processing (ICBSLP)

自引率

0.00%

发文量