{"title":"一个可靠的基于组件的电子邮件过滤体系结构","authors":"W. Gansterer, A. Janecek, P. Lechner","doi":"10.1109/ARES.2007.20","DOIUrl":null,"url":null,"abstract":"A three-component architecture for the classification and filtering of unsolicited bulk and commercial e-mail (\"spam\") is introduced. The first component, an enhanced self-learning variant of greylisting, sets the stage for the following feature extraction and classification components. Through the temporary rejection of selected messages by the greylisting component time becomes available for an \"offline\" in-depth examination of the e-mail content before the message is accepted and delivered to the final recipient. Within the feature extraction component a set of features for each newly arriving e-mail message is determined. These features are then used for the categorization of a message within the classification engine, which contains the adaptation of a vector space model. Based on this model, an implementation of latent semantic indexing for spam filtering is investigated. The architecture proposed contributes to the goal of minimizing the waste of resources caused by spam and is able to react to high load situations (including DoS attacks) via adaptations in the feature extraction and classification components","PeriodicalId":383015,"journal":{"name":"The Second International Conference on Availability, Reliability and Security (ARES'07)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"A Reliable Component-Based Architecture for E-Mail Filtering\",\"authors\":\"W. Gansterer, A. Janecek, P. Lechner\",\"doi\":\"10.1109/ARES.2007.20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A three-component architecture for the classification and filtering of unsolicited bulk and commercial e-mail (\\\"spam\\\") is introduced. The first component, an enhanced self-learning variant of greylisting, sets the stage for the following feature extraction and classification components. Through the temporary rejection of selected messages by the greylisting component time becomes available for an \\\"offline\\\" in-depth examination of the e-mail content before the message is accepted and delivered to the final recipient. Within the feature extraction component a set of features for each newly arriving e-mail message is determined. These features are then used for the categorization of a message within the classification engine, which contains the adaptation of a vector space model. Based on this model, an implementation of latent semantic indexing for spam filtering is investigated. The architecture proposed contributes to the goal of minimizing the waste of resources caused by spam and is able to react to high load situations (including DoS attacks) via adaptations in the feature extraction and classification components\",\"PeriodicalId\":383015,\"journal\":{\"name\":\"The Second International Conference on Availability, Reliability and Security (ARES'07)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Second International Conference on Availability, Reliability and Security (ARES'07)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ARES.2007.20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Second International Conference on Availability, Reliability and Security (ARES'07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARES.2007.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Reliable Component-Based Architecture for E-Mail Filtering
A three-component architecture for the classification and filtering of unsolicited bulk and commercial e-mail ("spam") is introduced. The first component, an enhanced self-learning variant of greylisting, sets the stage for the following feature extraction and classification components. Through the temporary rejection of selected messages by the greylisting component time becomes available for an "offline" in-depth examination of the e-mail content before the message is accepted and delivered to the final recipient. Within the feature extraction component a set of features for each newly arriving e-mail message is determined. These features are then used for the categorization of a message within the classification engine, which contains the adaptation of a vector space model. Based on this model, an implementation of latent semantic indexing for spam filtering is investigated. The architecture proposed contributes to the goal of minimizing the waste of resources caused by spam and is able to react to high load situations (including DoS attacks) via adaptations in the feature extraction and classification components