{"title":"使用数据挖掘技术检测蠕虫:在班级噪声存在下学习","authors":"I. Ismail, M. N. Marsono, S. Nor","doi":"10.1109/SITIS.2010.41","DOIUrl":null,"url":null,"abstract":"Worms are self-contained programs that spread over the Internet. Worms cause problems such as lost of information, information theft and denial-of-service attacks. The first part of the paper evaluates the detection of worms based on content classification by using all machine learning techniques available in WEKA data mining tools. Four most accurate and quite fast classifiers are identified for further analysis–Naive Bayes, J48, SMO and Winnow. Results show that classification using machine learning techniques could classify worms to 99% accuracy. From the accuracy perspective, J48 performs better than other algorithms meanwhile Naive Bayes and Winnow show the best performances in terms of speed. The second part of the paper analyzes the accuracy these four classifiers under the presence of class noise in learning corpora. By injecting class noise ranging between 0% and 50% into positive and negative corpora, results from the simulation show gradual decrease in accuracy and increase in false positive and false negative for all analyzed techniques. The presence of the classes noise affects false positive more significantly compared to false negative. The results show that worm detection with classification algorithms could not tolerate the presence of classes noise in learning corpora.","PeriodicalId":128396,"journal":{"name":"2010 Sixth International Conference on Signal-Image Technology and Internet Based Systems","volume":"258 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"Detecting Worms Using Data Mining Techniques: Learning in the Presence of Class Noise\",\"authors\":\"I. Ismail, M. N. Marsono, S. Nor\",\"doi\":\"10.1109/SITIS.2010.41\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Worms are self-contained programs that spread over the Internet. Worms cause problems such as lost of information, information theft and denial-of-service attacks. The first part of the paper evaluates the detection of worms based on content classification by using all machine learning techniques available in WEKA data mining tools. Four most accurate and quite fast classifiers are identified for further analysis–Naive Bayes, J48, SMO and Winnow. Results show that classification using machine learning techniques could classify worms to 99% accuracy. From the accuracy perspective, J48 performs better than other algorithms meanwhile Naive Bayes and Winnow show the best performances in terms of speed. The second part of the paper analyzes the accuracy these four classifiers under the presence of class noise in learning corpora. By injecting class noise ranging between 0% and 50% into positive and negative corpora, results from the simulation show gradual decrease in accuracy and increase in false positive and false negative for all analyzed techniques. The presence of the classes noise affects false positive more significantly compared to false negative. The results show that worm detection with classification algorithms could not tolerate the presence of classes noise in learning corpora.\",\"PeriodicalId\":128396,\"journal\":{\"name\":\"2010 Sixth International Conference on Signal-Image Technology and Internet Based Systems\",\"volume\":\"258 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Sixth International Conference on Signal-Image Technology and Internet Based Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SITIS.2010.41\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Sixth International Conference on Signal-Image Technology and Internet Based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SITIS.2010.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Detecting Worms Using Data Mining Techniques: Learning in the Presence of Class Noise
Worms are self-contained programs that spread over the Internet. Worms cause problems such as lost of information, information theft and denial-of-service attacks. The first part of the paper evaluates the detection of worms based on content classification by using all machine learning techniques available in WEKA data mining tools. Four most accurate and quite fast classifiers are identified for further analysis–Naive Bayes, J48, SMO and Winnow. Results show that classification using machine learning techniques could classify worms to 99% accuracy. From the accuracy perspective, J48 performs better than other algorithms meanwhile Naive Bayes and Winnow show the best performances in terms of speed. The second part of the paper analyzes the accuracy these four classifiers under the presence of class noise in learning corpora. By injecting class noise ranging between 0% and 50% into positive and negative corpora, results from the simulation show gradual decrease in accuracy and increase in false positive and false negative for all analyzed techniques. The presence of the classes noise affects false positive more significantly compared to false negative. The results show that worm detection with classification algorithms could not tolerate the presence of classes noise in learning corpora.