Muhammad Owais Raza, Naeem Ahmed Mahoto, Asadullah Shaikh, Nazia Pathan, Hani Alshahrani, M. A. Elmagzoub
{"title":"A Machine Learning Approach of Text Classification for High- and Low-Resource Languages","authors":"Muhammad Owais Raza, Naeem Ahmed Mahoto, Asadullah Shaikh, Nazia Pathan, Hani Alshahrani, M. A. Elmagzoub","doi":"10.1111/coin.70114","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>A large amount of data have been published online in textual format for the last decade because of the advancement of information and communication technologies. This is an open challenge to organize and classify large amounts of textual data automatically, especially for a language that has limited resources available online. In this study, two types of approaches are adopted for experiments. First one is a traditional strategy that uses six (06) classical state-of-the-art classification models (1. decision tree (DT), 2. logistic regression (LR), 3. support vector machine (SVM), 4. k-nearest neighbour (k-NN), 5. Naive Bayes (NB), and 6. random forest (RF)) along with two (02) ensemble methods (1. Adaboost and 2. gradient boosting (GB)) and second modeling technique is our proposed voting based ensembling scheme. Models are trained on a 75-25 split where 75% of data is used for training and 25% for testing. The evaluation of the classification models is carried out based on accuracy, precision, recall, and F1-score indexes. The experimental outcomes witnessed that for the traditional approach, gradient boosting outperformed for the limited resource language with 98.08% F1-score, while SVM performed better (97.34% F1-score) for the resource-rich language.</p>\n </div>","PeriodicalId":55228,"journal":{"name":"Computational Intelligence","volume":"41 4","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1111/coin.70114","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
A large amount of data have been published online in textual format for the last decade because of the advancement of information and communication technologies. This is an open challenge to organize and classify large amounts of textual data automatically, especially for a language that has limited resources available online. In this study, two types of approaches are adopted for experiments. First one is a traditional strategy that uses six (06) classical state-of-the-art classification models (1. decision tree (DT), 2. logistic regression (LR), 3. support vector machine (SVM), 4. k-nearest neighbour (k-NN), 5. Naive Bayes (NB), and 6. random forest (RF)) along with two (02) ensemble methods (1. Adaboost and 2. gradient boosting (GB)) and second modeling technique is our proposed voting based ensembling scheme. Models are trained on a 75-25 split where 75% of data is used for training and 25% for testing. The evaluation of the classification models is carried out based on accuracy, precision, recall, and F1-score indexes. The experimental outcomes witnessed that for the traditional approach, gradient boosting outperformed for the limited resource language with 98.08% F1-score, while SVM performed better (97.34% F1-score) for the resource-rich language.
期刊介绍:
This leading international journal promotes and stimulates research in the field of artificial intelligence (AI). Covering a wide range of issues - from the tools and languages of AI to its philosophical implications - Computational Intelligence provides a vigorous forum for the publication of both experimental and theoretical research, as well as surveys and impact studies. The journal is designed to meet the needs of a wide range of AI workers in academic and industrial research.