{"title":"Multilevel Classification of Pakistani News using Machine Learning","authors":"Anum Ilyas, S. Obaid, N. Bawany","doi":"10.1109/acit53391.2021.9677431","DOIUrl":null,"url":null,"abstract":"The availability of innumerable sources of online news has benefitted the masses as they have opportunity to gather news from a diverse set of sources. However, classification of this huge data being generated on regular basis has never been a simple task. This textual information can be invaluable only when it is processed to maximize its usefulness which is possible with automated text classification. Natural Language Processing (NLP) and Machine learning techniques have been extensively applied in this particular domain to address this challenge. Text classification is helpful in several scenarios such as product mining, emotions or sentiment analysis, etc. News classification is one of its applications through which content of news is processed and analyzed to assign predefined label(s). This research is focused on classification of Pakistani news obtained from dataset available on Open Data Pakistan. We have applied various machine learning algorithms including Logistic Regression, Random Forest, Support Vector Machine, and Naïve Bayes for first-level classification and Logistic Regression for multilevel classification. Comparative analysis of these algorithms is also presented. We achieved a maximum of 97.8% accuracy through Support Vector Machine in single-level classification and 83% through Logistic Regression in multilevel text classification.","PeriodicalId":302120,"journal":{"name":"2021 22nd International Arab Conference on Information Technology (ACIT)","volume":"731 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 22nd International Arab Conference on Information Technology (ACIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/acit53391.2021.9677431","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The availability of innumerable sources of online news has benefitted the masses as they have opportunity to gather news from a diverse set of sources. However, classification of this huge data being generated on regular basis has never been a simple task. This textual information can be invaluable only when it is processed to maximize its usefulness which is possible with automated text classification. Natural Language Processing (NLP) and Machine learning techniques have been extensively applied in this particular domain to address this challenge. Text classification is helpful in several scenarios such as product mining, emotions or sentiment analysis, etc. News classification is one of its applications through which content of news is processed and analyzed to assign predefined label(s). This research is focused on classification of Pakistani news obtained from dataset available on Open Data Pakistan. We have applied various machine learning algorithms including Logistic Regression, Random Forest, Support Vector Machine, and Naïve Bayes for first-level classification and Logistic Regression for multilevel classification. Comparative analysis of these algorithms is also presented. We achieved a maximum of 97.8% accuracy through Support Vector Machine in single-level classification and 83% through Logistic Regression in multilevel text classification.