{"title":"The significant effect of feature selection methods in spam risk assessment using dendritic cell algorithm","authors":"Kamahazira Zainal, M. Z. Jali","doi":"10.1109/ICOICT.2017.8074688","DOIUrl":null,"url":null,"abstract":"The vast amount of online documentation and the thriving of Internet especially mobile technology have caused a crucial demand to handle and organize unstructured data appropriately. An information retrieval or even knowledge discovery can be enhanced when a proper and structured data are available. This paper studies empirically the effect of pre-selected term weighting schemes, namely as Term Frequency (TF), Information Gain Ratio (IG Ratio) and Chi-Square (CHI2) in the assessment of a threat's impact loss. This feature selection method then further fed in conjunction with the Dendritic Cell Algorithm (DCA) as the classifier to measure the risk concentration of a spam message. The final outcome of this research is very much expected to be able in assisting people to make a decision once they knew the possible impact caused by a particular spam. The findings showed that TF is the best feature selection methods and well suited to be demonstrated together with the DCA, resulted with high accuracy risk classification rate.","PeriodicalId":244500,"journal":{"name":"2017 5th International Conference on Information and Communication Technology (ICoIC7)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 5th International Conference on Information and Communication Technology (ICoIC7)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOICT.2017.8074688","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The vast amount of online documentation and the thriving of Internet especially mobile technology have caused a crucial demand to handle and organize unstructured data appropriately. An information retrieval or even knowledge discovery can be enhanced when a proper and structured data are available. This paper studies empirically the effect of pre-selected term weighting schemes, namely as Term Frequency (TF), Information Gain Ratio (IG Ratio) and Chi-Square (CHI2) in the assessment of a threat's impact loss. This feature selection method then further fed in conjunction with the Dendritic Cell Algorithm (DCA) as the classifier to measure the risk concentration of a spam message. The final outcome of this research is very much expected to be able in assisting people to make a decision once they knew the possible impact caused by a particular spam. The findings showed that TF is the best feature selection methods and well suited to be demonstrated together with the DCA, resulted with high accuracy risk classification rate.