{"title":"多态垃圾邮件的个性化过滤","authors":"Masaru Takesue","doi":"10.1109/SECURWARE.2009.45","DOIUrl":null,"url":null,"abstract":"Which of emails are spams depends on the recipient's interest, so it is desirable to filter spams based on his/her interest. We store the fingerprints (FPs) of k portions of each spam's content in our filter and examine the metrics for detecting the polymorphic spams devised with intent to thwart the detection. For a smaller size of the filter, we exploit two Bloom filters (in fact, merged into a single one to reduce cache miss) to replace the least recently matched spams by recently matched ones. We use as the metrics the number $N_t (≤ k)$ of FPs in the filter matching with those of an incoming email, but also of the $N_T$ FPs, the greatest number $N_d$ of FPs stored for a single spam. We plot spams and legitimate emails in the $N_d-N_t$ space and detect spams by a piecewise linear function. The experiments with about 4,000 real world emails show that our filter achieves the false negative rate of about 0.36 with no false positive.","PeriodicalId":382947,"journal":{"name":"2009 Third International Conference on Emerging Security Information, Systems and Technologies","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Personalized Filtering of Polymorphic E-mail Spam\",\"authors\":\"Masaru Takesue\",\"doi\":\"10.1109/SECURWARE.2009.45\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Which of emails are spams depends on the recipient's interest, so it is desirable to filter spams based on his/her interest. We store the fingerprints (FPs) of k portions of each spam's content in our filter and examine the metrics for detecting the polymorphic spams devised with intent to thwart the detection. For a smaller size of the filter, we exploit two Bloom filters (in fact, merged into a single one to reduce cache miss) to replace the least recently matched spams by recently matched ones. We use as the metrics the number $N_t (≤ k)$ of FPs in the filter matching with those of an incoming email, but also of the $N_T$ FPs, the greatest number $N_d$ of FPs stored for a single spam. We plot spams and legitimate emails in the $N_d-N_t$ space and detect spams by a piecewise linear function. The experiments with about 4,000 real world emails show that our filter achieves the false negative rate of about 0.36 with no false positive.\",\"PeriodicalId\":382947,\"journal\":{\"name\":\"2009 Third International Conference on Emerging Security Information, Systems and Technologies\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Third International Conference on Emerging Security Information, Systems and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SECURWARE.2009.45\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Third International Conference on Emerging Security Information, Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SECURWARE.2009.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Which of emails are spams depends on the recipient's interest, so it is desirable to filter spams based on his/her interest. We store the fingerprints (FPs) of k portions of each spam's content in our filter and examine the metrics for detecting the polymorphic spams devised with intent to thwart the detection. For a smaller size of the filter, we exploit two Bloom filters (in fact, merged into a single one to reduce cache miss) to replace the least recently matched spams by recently matched ones. We use as the metrics the number $N_t (≤ k)$ of FPs in the filter matching with those of an incoming email, but also of the $N_T$ FPs, the greatest number $N_d$ of FPs stored for a single spam. We plot spams and legitimate emails in the $N_d-N_t$ space and detect spams by a piecewise linear function. The experiments with about 4,000 real world emails show that our filter achieves the false negative rate of about 0.36 with no false positive.