Evaristus Didik Madyatmadja, None Aldi, Fiona Fheren, Helen Angelica, Hanny Juwitasary, David Jumpa Malem Sembiring
{"title":"短信业务分类算法比较研究","authors":"Evaristus Didik Madyatmadja, None Aldi, Fiona Fheren, Helen Angelica, Hanny Juwitasary, David Jumpa Malem Sembiring","doi":"10.3844/jcssp.2023.1333.1344","DOIUrl":null,"url":null,"abstract":"This research aims to classify Short Message Service (SMS) data by applying classification models that have studied SMS data to classify SMS data into SMS spam and SMS ham. The classification model is made from data mining algorithms: Naive Bayes and support vector machine. Before implementing the two algorithms, the SMS data will go through a text preprocessing stage, including data cleaning (whitespace removal, removal of punctuation, and removal of numbers), case folding, stemming, tokenizing, and stop word removal. In this research, a comparison of the accuracy of the two data mining methods will be carried out to see and get the best classification algorithm. Researchers also implemented several experiments by comparing the use of testing data by 20 and 30% and comparing the application of preprocessing stemming and without stemming. This study found that the support vector machine algorithm using testing data of 20% by applying the stemming stage had the highest accuracy rate, 97.5%.","PeriodicalId":40005,"journal":{"name":"Journal of Computer Science","volume":"41 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Study: Algorithms for Short Message Service Classification\",\"authors\":\"Evaristus Didik Madyatmadja, None Aldi, Fiona Fheren, Helen Angelica, Hanny Juwitasary, David Jumpa Malem Sembiring\",\"doi\":\"10.3844/jcssp.2023.1333.1344\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research aims to classify Short Message Service (SMS) data by applying classification models that have studied SMS data to classify SMS data into SMS spam and SMS ham. The classification model is made from data mining algorithms: Naive Bayes and support vector machine. Before implementing the two algorithms, the SMS data will go through a text preprocessing stage, including data cleaning (whitespace removal, removal of punctuation, and removal of numbers), case folding, stemming, tokenizing, and stop word removal. In this research, a comparison of the accuracy of the two data mining methods will be carried out to see and get the best classification algorithm. Researchers also implemented several experiments by comparing the use of testing data by 20 and 30% and comparing the application of preprocessing stemming and without stemming. This study found that the support vector machine algorithm using testing data of 20% by applying the stemming stage had the highest accuracy rate, 97.5%.\",\"PeriodicalId\":40005,\"journal\":{\"name\":\"Journal of Computer Science\",\"volume\":\"41 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3844/jcssp.2023.1333.1344\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3844/jcssp.2023.1333.1344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparative Study: Algorithms for Short Message Service Classification
This research aims to classify Short Message Service (SMS) data by applying classification models that have studied SMS data to classify SMS data into SMS spam and SMS ham. The classification model is made from data mining algorithms: Naive Bayes and support vector machine. Before implementing the two algorithms, the SMS data will go through a text preprocessing stage, including data cleaning (whitespace removal, removal of punctuation, and removal of numbers), case folding, stemming, tokenizing, and stop word removal. In this research, a comparison of the accuracy of the two data mining methods will be carried out to see and get the best classification algorithm. Researchers also implemented several experiments by comparing the use of testing data by 20 and 30% and comparing the application of preprocessing stemming and without stemming. This study found that the support vector machine algorithm using testing data of 20% by applying the stemming stage had the highest accuracy rate, 97.5%.
期刊介绍:
Journal of Computer Science is aimed to publish research articles on theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems. JCS updated twelve times a year and is a peer reviewed journal covers the latest and most compelling research of the time.