Irfan Ali Kandhro, Sahar Zafar Jumani, K. Kumar, Abdul Hafeez, F. Ali
{"title":"罗马乌尔都语标题新闻文本分类使用RNN, LSTM和CNN","authors":"Irfan Ali Kandhro, Sahar Zafar Jumani, K. Kumar, Abdul Hafeez, F. Ali","doi":"10.1142/s2424922x20500084","DOIUrl":null,"url":null,"abstract":"This paper presents the automated tool for the classification of text with respect to predefined categories. It has always been considered as a vital method to manage and process a huge number of documents in digital forms which are widespread and continuously increasing. Most of the research work in text classification has been done in Urdu, English and other languages. But limited research work has been carried out on roman data. Technically, the process of the text classification follows two steps: the first step consists of choosing the main features from all the available features of the text documents with the usage of feature extraction techniques. The second step applies classification algorithms on those chosen features. The data set is collected through scraping tools from the most popular news websites Awaji Awaze and Daily Jhoongar. Furthermore, the data set splits in training and testing 70%, 30%, respectively. In this paper, the deep learning models, such as RNN, LSTM, and CNN, are used for classification of roman Urdu headline news. The testing accuracy of RNN (81%), LSTM (82%), and CNN (79%), and the experimental results demonstrate that the performance of the LSTM method is state-of-art method compared to CNN and RNN.","PeriodicalId":47145,"journal":{"name":"Advances in Data Science and Adaptive Analysis","volume":"87 1","pages":"2050008:1-2050008:13"},"PeriodicalIF":0.5000,"publicationDate":"2020-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Roman Urdu Headline News Text Classification Using RNN, LSTM and CNN\",\"authors\":\"Irfan Ali Kandhro, Sahar Zafar Jumani, K. Kumar, Abdul Hafeez, F. Ali\",\"doi\":\"10.1142/s2424922x20500084\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents the automated tool for the classification of text with respect to predefined categories. It has always been considered as a vital method to manage and process a huge number of documents in digital forms which are widespread and continuously increasing. Most of the research work in text classification has been done in Urdu, English and other languages. But limited research work has been carried out on roman data. Technically, the process of the text classification follows two steps: the first step consists of choosing the main features from all the available features of the text documents with the usage of feature extraction techniques. The second step applies classification algorithms on those chosen features. The data set is collected through scraping tools from the most popular news websites Awaji Awaze and Daily Jhoongar. Furthermore, the data set splits in training and testing 70%, 30%, respectively. In this paper, the deep learning models, such as RNN, LSTM, and CNN, are used for classification of roman Urdu headline news. The testing accuracy of RNN (81%), LSTM (82%), and CNN (79%), and the experimental results demonstrate that the performance of the LSTM method is state-of-art method compared to CNN and RNN.\",\"PeriodicalId\":47145,\"journal\":{\"name\":\"Advances in Data Science and Adaptive Analysis\",\"volume\":\"87 1\",\"pages\":\"2050008:1-2050008:13\"},\"PeriodicalIF\":0.5000,\"publicationDate\":\"2020-11-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advances in Data Science and Adaptive Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s2424922x20500084\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Data Science and Adaptive Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s2424922x20500084","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICS, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Roman Urdu Headline News Text Classification Using RNN, LSTM and CNN
This paper presents the automated tool for the classification of text with respect to predefined categories. It has always been considered as a vital method to manage and process a huge number of documents in digital forms which are widespread and continuously increasing. Most of the research work in text classification has been done in Urdu, English and other languages. But limited research work has been carried out on roman data. Technically, the process of the text classification follows two steps: the first step consists of choosing the main features from all the available features of the text documents with the usage of feature extraction techniques. The second step applies classification algorithms on those chosen features. The data set is collected through scraping tools from the most popular news websites Awaji Awaze and Daily Jhoongar. Furthermore, the data set splits in training and testing 70%, 30%, respectively. In this paper, the deep learning models, such as RNN, LSTM, and CNN, are used for classification of roman Urdu headline news. The testing accuracy of RNN (81%), LSTM (82%), and CNN (79%), and the experimental results demonstrate that the performance of the LSTM method is state-of-art method compared to CNN and RNN.