N. Khan, Md Shamiul Islam, Fuad Chowdhury, Abdur Samad Siham, Nazmus Sakib
{"title":"基于新闻标题的孟加拉语犯罪新闻分类","authors":"N. Khan, Md Shamiul Islam, Fuad Chowdhury, Abdur Samad Siham, Nazmus Sakib","doi":"10.1109/ICCIT57492.2022.10055391","DOIUrl":null,"url":null,"abstract":"In our daily lives, newspapers and online news portals have become ubiquitous. These provide us with information on global events. Of all the news available in newspapers, crime news is the most significant. People read this kind of news with sincerity and considerable curiosity. We read a lot of Bangla newspapers and news sources, but we didn’t find any news on crime that was categorized. Perhaps categorizing the Bangla crime news would be helpful for the readers. Therefore, we decided to work on Bengali crime news classification, which will have a big influence in the Bengali community. However, categorizing crime news from daily newspaper headlines is not an easy task for a human. In this paper, we introduced a practical model to automatically annotate crime news from Bengali newspaper headlines in 6 predetermined crimes. In order to accomplish this goal, we have used TF-IDF for extracting features with 8 different machine learning and language classifier models (SVM, Decision Tree,Random Forest, LSTM, Bi-LSTM, BERT etc) and got best result by Sagor Sarkar’s Bangla-Bert-Base. The experimental result with 6293 training and 1574 testing samples shows 90.15% accuracy. This research output and dataset can be utilized by enthusiasts for further research purposes like subsetting crimes, crime status or judgment analysis etc. Our dataset will be available upon request @https://tinyurl.com/5n7wwaek.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Bengali Crime News Classification Based on Newspaper Headlines using NLP\",\"authors\":\"N. Khan, Md Shamiul Islam, Fuad Chowdhury, Abdur Samad Siham, Nazmus Sakib\",\"doi\":\"10.1109/ICCIT57492.2022.10055391\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In our daily lives, newspapers and online news portals have become ubiquitous. These provide us with information on global events. Of all the news available in newspapers, crime news is the most significant. People read this kind of news with sincerity and considerable curiosity. We read a lot of Bangla newspapers and news sources, but we didn’t find any news on crime that was categorized. Perhaps categorizing the Bangla crime news would be helpful for the readers. Therefore, we decided to work on Bengali crime news classification, which will have a big influence in the Bengali community. However, categorizing crime news from daily newspaper headlines is not an easy task for a human. In this paper, we introduced a practical model to automatically annotate crime news from Bengali newspaper headlines in 6 predetermined crimes. In order to accomplish this goal, we have used TF-IDF for extracting features with 8 different machine learning and language classifier models (SVM, Decision Tree,Random Forest, LSTM, Bi-LSTM, BERT etc) and got best result by Sagor Sarkar’s Bangla-Bert-Base. The experimental result with 6293 training and 1574 testing samples shows 90.15% accuracy. This research output and dataset can be utilized by enthusiasts for further research purposes like subsetting crimes, crime status or judgment analysis etc. Our dataset will be available upon request @https://tinyurl.com/5n7wwaek.\",\"PeriodicalId\":255498,\"journal\":{\"name\":\"2022 25th International Conference on Computer and Information Technology (ICCIT)\",\"volume\":\"91 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 25th International Conference on Computer and Information Technology (ICCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIT57492.2022.10055391\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT57492.2022.10055391","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Bengali Crime News Classification Based on Newspaper Headlines using NLP
In our daily lives, newspapers and online news portals have become ubiquitous. These provide us with information on global events. Of all the news available in newspapers, crime news is the most significant. People read this kind of news with sincerity and considerable curiosity. We read a lot of Bangla newspapers and news sources, but we didn’t find any news on crime that was categorized. Perhaps categorizing the Bangla crime news would be helpful for the readers. Therefore, we decided to work on Bengali crime news classification, which will have a big influence in the Bengali community. However, categorizing crime news from daily newspaper headlines is not an easy task for a human. In this paper, we introduced a practical model to automatically annotate crime news from Bengali newspaper headlines in 6 predetermined crimes. In order to accomplish this goal, we have used TF-IDF for extracting features with 8 different machine learning and language classifier models (SVM, Decision Tree,Random Forest, LSTM, Bi-LSTM, BERT etc) and got best result by Sagor Sarkar’s Bangla-Bert-Base. The experimental result with 6293 training and 1574 testing samples shows 90.15% accuracy. This research output and dataset can be utilized by enthusiasts for further research purposes like subsetting crimes, crime status or judgment analysis etc. Our dataset will be available upon request @https://tinyurl.com/5n7wwaek.