{"title":"特征选择方法在树突状细胞算法垃圾邮件风险评估中的显著效果","authors":"Kamahazira Zainal, M. Z. Jali","doi":"10.1109/ICOICT.2017.8074688","DOIUrl":null,"url":null,"abstract":"The vast amount of online documentation and the thriving of Internet especially mobile technology have caused a crucial demand to handle and organize unstructured data appropriately. An information retrieval or even knowledge discovery can be enhanced when a proper and structured data are available. This paper studies empirically the effect of pre-selected term weighting schemes, namely as Term Frequency (TF), Information Gain Ratio (IG Ratio) and Chi-Square (CHI2) in the assessment of a threat's impact loss. This feature selection method then further fed in conjunction with the Dendritic Cell Algorithm (DCA) as the classifier to measure the risk concentration of a spam message. The final outcome of this research is very much expected to be able in assisting people to make a decision once they knew the possible impact caused by a particular spam. The findings showed that TF is the best feature selection methods and well suited to be demonstrated together with the DCA, resulted with high accuracy risk classification rate.","PeriodicalId":244500,"journal":{"name":"2017 5th International Conference on Information and Communication Technology (ICoIC7)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"The significant effect of feature selection methods in spam risk assessment using dendritic cell algorithm\",\"authors\":\"Kamahazira Zainal, M. Z. Jali\",\"doi\":\"10.1109/ICOICT.2017.8074688\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The vast amount of online documentation and the thriving of Internet especially mobile technology have caused a crucial demand to handle and organize unstructured data appropriately. An information retrieval or even knowledge discovery can be enhanced when a proper and structured data are available. This paper studies empirically the effect of pre-selected term weighting schemes, namely as Term Frequency (TF), Information Gain Ratio (IG Ratio) and Chi-Square (CHI2) in the assessment of a threat's impact loss. This feature selection method then further fed in conjunction with the Dendritic Cell Algorithm (DCA) as the classifier to measure the risk concentration of a spam message. The final outcome of this research is very much expected to be able in assisting people to make a decision once they knew the possible impact caused by a particular spam. The findings showed that TF is the best feature selection methods and well suited to be demonstrated together with the DCA, resulted with high accuracy risk classification rate.\",\"PeriodicalId\":244500,\"journal\":{\"name\":\"2017 5th International Conference on Information and Communication Technology (ICoIC7)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 5th International Conference on Information and Communication Technology (ICoIC7)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICOICT.2017.8074688\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 5th International Conference on Information and Communication Technology (ICoIC7)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOICT.2017.8074688","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
摘要
大量的在线文档和互联网(尤其是移动技术)的蓬勃发展导致了对适当处理和组织非结构化数据的重要需求。当有适当的结构化数据可用时,可以增强信息检索甚至知识发现。本文实证研究了预先选择的术语加权方案,即术语频率(term Frequency, TF)、信息增益比(Information Gain Ratio, IG Ratio)和卡方(Chi-Square, CHI2)在评估威胁影响损失中的作用。然后将该特征选择方法与树突状细胞算法(DCA)结合起来作为分类器来测量垃圾邮件的风险集中度。我们非常期待这项研究的最终结果能够帮助人们在了解特定垃圾邮件可能造成的影响后做出决定。结果表明,TF是最佳的特征选择方法,适合与DCA一起演示,具有较高的准确率风险分类率。
The significant effect of feature selection methods in spam risk assessment using dendritic cell algorithm
The vast amount of online documentation and the thriving of Internet especially mobile technology have caused a crucial demand to handle and organize unstructured data appropriately. An information retrieval or even knowledge discovery can be enhanced when a proper and structured data are available. This paper studies empirically the effect of pre-selected term weighting schemes, namely as Term Frequency (TF), Information Gain Ratio (IG Ratio) and Chi-Square (CHI2) in the assessment of a threat's impact loss. This feature selection method then further fed in conjunction with the Dendritic Cell Algorithm (DCA) as the classifier to measure the risk concentration of a spam message. The final outcome of this research is very much expected to be able in assisting people to make a decision once they knew the possible impact caused by a particular spam. The findings showed that TF is the best feature selection methods and well suited to be demonstrated together with the DCA, resulted with high accuracy risk classification rate.