On Term Weighting for Spam SMS Filtering

Turgut Dogan
{"title":"On Term Weighting for Spam SMS Filtering","authors":"Turgut Dogan","doi":"10.35377/saucis.03.03.735463","DOIUrl":null,"url":null,"abstract":"Due to rapid development of the technology, the usage of mobile telephones and short message services (SMS) have become widespread. Thus, the number of spam SMS messages has dramatically increased and the significance of identifying and filtering of suchlike messages raised. Moreover, since they have also risk to steal users’ personal information; the problem of identifying and filtering of Spam SMS messages stays popular in terms of also information and data security. In this study, the classification performances of five different term weighting methods on three different datasets containing SMS messages categorized as Spam and legitimate are compared by using two classifiers for corresponding problem. The results obtained showed that reasonable weighting of SMS contents plays an important role in identifying of spam SMS messages. On the other hand, it can be expressed that real classification potential of term weighting schemes reflected betterly the with feature vectors created by using fifty and higher number of terms on especially Turkish and English SMS message datasets. In addition, it has been observed that value ranges of the classification results of obtained from term weighting methods on Turkish SMS message dataset is wider for than ones obtained in English SMS message datasets.","PeriodicalId":257636,"journal":{"name":"Sakarya University Journal of Computer and Information Sciences","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sakarya University Journal of Computer and Information Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35377/saucis.03.03.735463","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Due to rapid development of the technology, the usage of mobile telephones and short message services (SMS) have become widespread. Thus, the number of spam SMS messages has dramatically increased and the significance of identifying and filtering of suchlike messages raised. Moreover, since they have also risk to steal users’ personal information; the problem of identifying and filtering of Spam SMS messages stays popular in terms of also information and data security. In this study, the classification performances of five different term weighting methods on three different datasets containing SMS messages categorized as Spam and legitimate are compared by using two classifiers for corresponding problem. The results obtained showed that reasonable weighting of SMS contents plays an important role in identifying of spam SMS messages. On the other hand, it can be expressed that real classification potential of term weighting schemes reflected betterly the with feature vectors created by using fifty and higher number of terms on especially Turkish and English SMS message datasets. In addition, it has been observed that value ranges of the classification results of obtained from term weighting methods on Turkish SMS message dataset is wider for than ones obtained in English SMS message datasets.
垃圾短信过滤中的词权研究
由于技术的快速发展,移动电话和短信服务(SMS)的使用已经变得普遍。因此,垃圾短信的数量急剧增加,识别和过滤此类短信的重要性也随之提高。此外,由于他们也有窃取用户个人信息的风险;识别和过滤垃圾短信的问题在信息和数据安全方面仍然很受欢迎。在本研究中,通过使用两个分类器对相应问题进行分类,比较了五种不同的词加权方法在包含垃圾短信和合法短信的三种不同数据集上的分类性能。结果表明,合理的短信内容权重对识别垃圾短信具有重要作用。另一方面,在土耳其语和英语短信数据集上,术语加权方案的真实分类潜力更好地反映了使用50个及以上数量的术语创建的特征向量。此外,我们还观察到,在土耳其语短信数据集上使用术语加权方法得到的分类结果的取值范围比在英语短信数据集上得到的分类结果的取值范围更宽。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信