Salma Tabashum, M. M. Hossain, Md. Ariful Islam, Mun Yea Mahafi Taz Zahara, Fahmida Naznin Fami
{"title":"Performance Analysis of Most Prominent Machine Learning and Deep Learning Algorithms In Classifying Bangla Crime News Articles","authors":"Salma Tabashum, M. M. Hossain, Md. Ariful Islam, Mun Yea Mahafi Taz Zahara, Fahmida Naznin Fami","doi":"10.1109/TENSYMP50017.2020.9230785","DOIUrl":null,"url":null,"abstract":"This work is dedicated to Bangla Crime Type Classification. As very few works had been done for Bangla crime classifier. To carry out this research, first we have developed a Bangla crime dataset which contains around 24,295 news articles and made most of them publicly available at github. Then we have built our crime classifier model and trained the classifier with our own dataset. We have analyzed word vectors like bag of words, TF-IDF in state-of-art machine learning algorithms as well as most promising semantic and syntactic word embeddings like Word2Vec, GloVe, fast-Text in both shallow and deep CNN and RNN to select best word embeddings for our classifier module. Finally we have summarized the experimental result in tabular form. We can see that significant improved accuracy can be achieved using deep learning algorithms over state-of-art machine learning algorithms in classifying Bangla crime data. The final experimental result shows that using shallow CNN with fastText,proposed model is able to achieve 93.70% accuracy.","PeriodicalId":6721,"journal":{"name":"2020 IEEE Region 10 Symposium (TENSYMP)","volume":"48 1","pages":"1273-1277"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Region 10 Symposium (TENSYMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENSYMP50017.2020.9230785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
This work is dedicated to Bangla Crime Type Classification. As very few works had been done for Bangla crime classifier. To carry out this research, first we have developed a Bangla crime dataset which contains around 24,295 news articles and made most of them publicly available at github. Then we have built our crime classifier model and trained the classifier with our own dataset. We have analyzed word vectors like bag of words, TF-IDF in state-of-art machine learning algorithms as well as most promising semantic and syntactic word embeddings like Word2Vec, GloVe, fast-Text in both shallow and deep CNN and RNN to select best word embeddings for our classifier module. Finally we have summarized the experimental result in tabular form. We can see that significant improved accuracy can be achieved using deep learning algorithms over state-of-art machine learning algorithms in classifying Bangla crime data. The final experimental result shows that using shallow CNN with fastText,proposed model is able to achieve 93.70% accuracy.