Md. Akhter-Uz-Zaman Ashik, S. Shovon, Summit Haque
{"title":"Data Set For Sentiment Analysis On Bengali News Comments And Its Baseline Evaluation","authors":"Md. Akhter-Uz-Zaman Ashik, S. Shovon, Summit Haque","doi":"10.1109/ICBSLP47725.2019.201497","DOIUrl":"https://doi.org/10.1109/ICBSLP47725.2019.201497","url":null,"abstract":"The biggest challenge of Bengali language processing is creating a strong data set to do research on. The main focus of this paper is to introduce an authentic and credible data set and this dataset is open for all to be used for educational purposes1 for Bengali sentiment analysis where the data was extracted from a well known online news portal’s user comments. Here comments on various news were scraped, and for detecting the true sentiments of the sentences, five labels of sentiments were used. An online crowd sourcing platform was used for data annotation. To ensure the credibility and validity of the data set, every entry of the data set was tagged three times. Three models of text classification were used for baseline evaluation to check the validity of the data set. This data set might be of valuable help for future works and researches on Bengali sentiment analysis.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124404903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Opinion Summarization of Bangla Texts using Cosine Simillarity Based Graph Ranking and Relevance Based Approach","authors":"Shofi Ullah, Sagar Hossain, K. M. Azharul Hasan","doi":"10.1109/ICBSLP47725.2019.201494","DOIUrl":"https://doi.org/10.1109/ICBSLP47725.2019.201494","url":null,"abstract":"The main idea of the automatic extractive text or opinion summarization is to find most important representative small subset of the original document without any loss of important information. There are many existing methods available for text summarization of English, Turkish, Arabic and other languages. But very few attempts has been done for Bangla language because of its having rich morphology and multifaceted structure. In this paper, we propose a joint cosine simillarity based graph ranking and Relevance based scoring and ranking approach for the summarization of bangla text. We developed a stemming algorithm based on Parts of Speech(POS) tagging consisting of around two lakhs POS tags for Bangla texts. A redundancy removal algorithm is also proposed to remove redundancy so that each sentences in the summary represents exactly the most important information in the document. The performance of the proposed approach is evaluated by measuring the recall, precision and f-score based on Rouge metric and it is also showed that proposed approach outperforms to other existing summarization methods for Bangla texts.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122357766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Gulzar Hussain, S. Kabir, T. Mahmud, A. Khatun, M. Islam
{"title":"Assessment of Bangla Descriptive Answer Script Digitally","authors":"Md Gulzar Hussain, S. Kabir, T. Mahmud, A. Khatun, M. Islam","doi":"10.1109/ICBSLP47725.2019.202042","DOIUrl":"https://doi.org/10.1109/ICBSLP47725.2019.202042","url":null,"abstract":"Answer script evaluation is an essential part of the student evaluation process in the education system. In an exam, students need to answer subjective and objective questions. In educational institutes, instructors need to evaluate the answer script manually to evaluate the students. In Bangladesh, the number of students and institutes are increasing day by day. For this reason, it is becoming hard to evaluate the answer script in a perfect way by the instructors. So it is necessary to find a way to evaluate the answer script automatically. Many techniques are proposed for the English language, but we didn't find any for Bangla language. Our paper proposed a way to evaluate Bangla subjective answer scrips automatically by keyword matching and linguistic analysis. Using our proposed model we have tested on answer scripts of 20 questions and found the minimum relative error of 1.8%. 15 teachers and 10 students volunteered to evaluate the answer scripts.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132197349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Hasan, Mahathir Mohammad Abir, Md. Ibrahim, M. Sayem, Sohaib Abdullah
{"title":"AIBangla: A Benchmark Dataset for Isolated Bangla Handwritten Basic and Compound Character Recognition","authors":"M. Hasan, Mahathir Mohammad Abir, Md. Ibrahim, M. Sayem, Sohaib Abdullah","doi":"10.1109/ICBSLP47725.2019.201481","DOIUrl":"https://doi.org/10.1109/ICBSLP47725.2019.201481","url":null,"abstract":"Automatic handwritten Bangla character recognition (HBCR) is a challenging problem in computer vision due to numerous variations in writing styles of an individual Bangla character and the presence of similarities in shapes among different characters. Considering the complexity of the problem, we need to develop a modern convolutional neural network (CNN) for accurate recognition, but unfortunately, at present, very few Bangla handwritten dataset contain a large number of image samples for each character suitable for training deep learning-based methods. In this paper, we present AIBangla, a new benchmark image database for isolated handwritten Bangla characters with detailed usage and a performance baseline. Our dataset contains 80,403 hand-written images on 50 Bangla basic characters and 249,911 hand-written images on 171 Bangla compound characters which were written by more than 2,000 unique writers from various institutes across Bangladesh. In addition, we have applied three leading state-of-the-art deep CNN networks on our proposed AIBangla dataset to provide baseline performance. We have achieved a maximum accuracy of 98.13% and 81.83% for basic and compound character classes respectively on the test set of the AIBangla dataset.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130253116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md. Hasan, Md. Ariful Islam, Shafkat Kibria, Mohammad Shahidur Rahman
{"title":"Towards Lexicon-free Bangla Automatic Speech Recognition System","authors":"Md. Hasan, Md. Ariful Islam, Shafkat Kibria, Mohammad Shahidur Rahman","doi":"10.1109/ICBSLP47725.2019.201544","DOIUrl":"https://doi.org/10.1109/ICBSLP47725.2019.201544","url":null,"abstract":"This article presents a lexicon-free Automatic Speech Recognition (ASR) system for the Bangla language and investigates an open-source large Bangla ASR corpus, which proved by OpenSLR. The model has been trained using improved MFCC acoustic features with a deep LSTM as an acoustic model. We have tried two types of decoding techniques in the decoding or the last part of the ASR; one is using a joint decoder of Connectionist Temporal Classification (CTC) and a statistical Language Model (LM) for beam decoding, and another is CTC based greedy decoding. We have trained and investigated the performance of our ASR with non-augmented speech as an input. The achieved results are outstanding compares to the results obtained from past researches that have used the End-to-End approaches for Bangla ASR. On the test dataset, our End-to-End system has obtained different results using two distinct decoders. The obtained results are 39.61% WER and 18.50% CER using the greedy decoder and 27.89% WER and 12.31% CER, which are a little bit improved results, using the beam decoder. This achievement is state of the art for continuous Bangla ASR.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"135 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123426694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rajan Saha Raju, Prithwiraj Bhattacharjee, Arif Ahmad, Mohammad Shahidur Rahman
{"title":"A Bangla Text-to-Speech System using Deep Neural Networks","authors":"Rajan Saha Raju, Prithwiraj Bhattacharjee, Arif Ahmad, Mohammad Shahidur Rahman","doi":"10.1109/ICBSLP47725.2019.202055","DOIUrl":"https://doi.org/10.1109/ICBSLP47725.2019.202055","url":null,"abstract":"We present a Deep Neural Network (DNN) based statistical parametric Text-to-Speech (TTS) system for Bangla (also known as Bengali). A first step in building a DNN-based TTS system is having large speech data. Since good speech dataset for Bangla TTS is not available publicly, we created our own dataset for our system. We prepared a phonetically rich studio-quality speech database containing more than 40 hours of speech. The database consists of 12,500 utterances. We also prepared a pronunciation dictionary (lexicon) of 1,35,000 words for front-end text processing, which, to our knowledge, is the largest lexicon for Bangla. Our system extracts linguistic features from input text. Then it uses deep neural networks for mapping these linguistic features to acoustic features. We developed two TTS voices using our dataset - one male and one female voice. Both objective and subjective evaluation tests show that our system performs significantly better than the traditional Bangla TTS systems and is comparable to the commercially available best Bangla TTS system.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129145352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lyricist Identification using Stylometric Features utilizing BanglaMusicStylo Dataset","authors":"A. Marouf, Rafayet Hossian","doi":"10.1109/ICBSLP47725.2019.201534","DOIUrl":"https://doi.org/10.1109/ICBSLP47725.2019.201534","url":null,"abstract":"This paper presents a profile-based approach utilizing supervised learning methods to identify the lyricist of Bangla songs written by two legendary poets & novelist Kazi Nazrul Islam and Rabindranath Tagore. The problem statement for this paper could be considered as authorship attribution using stylometric features on Bangla lyrics. We have utilized the BanglaMusicStylo dataset, which consists of 856 and 620 songs of Rabindranath Tagore and Kazi Nazrul Islam, respectively. The traditional authorship attribution works found in the literature are based on the novels written by the authors, not Bangla song lyrics. Using the Bangla song lyrics made it a challenging task, as the word choices made by the authors in songs depends on the rhythms, completeness, situation and many more. In this paper, we have tried to fusion different types of stylometric features, such as lexical, structural, stylistic etc. For experimentation, we have designed the prediction model based on supervised learning exploiting Naïve Bayes (NB), Simple Logistic Regression (SLR), Decision Tree (DT), Support Vector Machine (SVM), and Multilayer Perceptron (MLP). The experimental model consists of several steps including data pre-processing, feature extraction, data processing, and classification model. After performance evaluation, we have got approximately 86.29% accuracy from SLR, which is quite satisfactory.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131336450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Koushik Roy, Abu Mohammad Shabbir Khan, Mohammad Zariff Ahsham Ali, Sazid Rahman Simanto, Nabeel Mohammed, Muhammad Asif Atick, S. Islam, Kazi Mejbaul Islam
{"title":"An Analytical Approach for Enhancing the Automatic Detection and Recognition of Skewed Bangla License Plates","authors":"Koushik Roy, Abu Mohammad Shabbir Khan, Mohammad Zariff Ahsham Ali, Sazid Rahman Simanto, Nabeel Mohammed, Muhammad Asif Atick, S. Islam, Kazi Mejbaul Islam","doi":"10.1109/ICBSLP47725.2019.201528","DOIUrl":"https://doi.org/10.1109/ICBSLP47725.2019.201528","url":null,"abstract":"Although there has been a huge body of work on Bangla license plate detection and recognition, the successes of these works have largely been limited to correct detection and recognition of undistorted license plates whose images are taken chiefly from the front or the back of vehicles with slight angular variations. As a result, most Bangla automatic license plate recognition (ALPR) systems in practice struggle when the license plates are skewed on the viewing or the image planes of the license plates. In this paper, we address this issue by proposing an analytical approach that can enhance the ALPR of both normal and skewed license plates and can be incorporated into existing Bangla ALPR systems without modifying their internal structures. Specifically, we demonstrate how existing ALPR systems can be treated as black boxes and analyzed to understand what sort of license plate images they work best on and introduce a novel pipeline that combines deep learning and an algorithmic procedure for transforming images of both normal and skewed license plates into formats that are best suited for the ALPR systems. We note that our proposed method can be easily generalized and applied to non-Bangla license plates as well.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130757072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tapotosh Ghosh, M. Abedin, Shayer Mahmud Chowdhury, M. Yousuf
{"title":"A Comprehensive Review on Recognition Techniques for Bangla Handwritten Characters","authors":"Tapotosh Ghosh, M. Abedin, Shayer Mahmud Chowdhury, M. Yousuf","doi":"10.1109/ICBSLP47725.2019.202051","DOIUrl":"https://doi.org/10.1109/ICBSLP47725.2019.202051","url":null,"abstract":"Handwritten character recognition is a challenging task in OCR and for a cursive and complex character set like Bangla, it is even harder to implement. Many researchers have proposed different methods for recognizing Bangla Handwritten character set. It is done through analyzing the structure of the characters or through some machine learning process. This paper represents an analysis and overview of the existing methods for recognizing handwritten basic and compound characters. Methods, success rate, limitations and future scope has been mentioned in this paper. The purpose of this paper is to find out the fields in which the systems necessitate improvement and contribute to establish an ideal Bangla Handwritten Character Recognition System.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"281 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131425531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
MD Shahidul Salim Shakib, Tanim Ahmed, K. M. Azharul Hasan
{"title":"Designing a Bangla Stemmer using rule based approach","authors":"MD Shahidul Salim Shakib, Tanim Ahmed, K. M. Azharul Hasan","doi":"10.1109/ICBSLP47725.2019.201533","DOIUrl":"https://doi.org/10.1109/ICBSLP47725.2019.201533","url":null,"abstract":"Stemming is a preprocessing task for natural language processing that involves normalizing inflected words representing the same concept of the original word. Steaming is a process of text normalization that has many applications. There are many techniques for steaming of inflected words for different languages but very few works for Bangla word steaming. Therefore, stemming Bangla word is a unsolved problem. There are many different situations that can occur in Bangla language for word steaming. In this paper, we present a rule based algorithm to stem Bangla words. We developed the rules for infection detection for verb inflection (বিভক্তি), number inflection (বচন), and others. Using our rules, we developed a system to find the root word of Bangla words and found good performance. Sufficient examples are provided to explain the proposed system.","PeriodicalId":413077,"journal":{"name":"2019 International Conference on Bangla Speech and Language Processing (ICBSLP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125672428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}