{"title":"一个安全的方法来缩小电子邮件样本集,同时保持平衡之间的垃圾邮件和正常","authors":"Lili Diao, Hao Wang","doi":"10.1109/SSIRI.2009.66","DOIUrl":null,"url":null,"abstract":"To deal with any possible cases for training antispam machine learning models, it is crucial to design a safe way to shrink the size of training sample set via reducing redundancies with minimal information loss for classification as well as make distribution of samples balanced. Presently, there is no such solution to do so. In this paper, we propose a safe approach to address these problems and improve the quality of training email sample pool (set) for getting high quality machine learning models for better anti-spam engine with non-biased high spam detection rates as well as low false positive rates.","PeriodicalId":196276,"journal":{"name":"2009 Third IEEE International Conference on Secure Software Integration and Reliability Improvement","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Safe Approach to Shrink Email Sample Set while Keeping Balance between Spam and Normal\",\"authors\":\"Lili Diao, Hao Wang\",\"doi\":\"10.1109/SSIRI.2009.66\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To deal with any possible cases for training antispam machine learning models, it is crucial to design a safe way to shrink the size of training sample set via reducing redundancies with minimal information loss for classification as well as make distribution of samples balanced. Presently, there is no such solution to do so. In this paper, we propose a safe approach to address these problems and improve the quality of training email sample pool (set) for getting high quality machine learning models for better anti-spam engine with non-biased high spam detection rates as well as low false positive rates.\",\"PeriodicalId\":196276,\"journal\":{\"name\":\"2009 Third IEEE International Conference on Secure Software Integration and Reliability Improvement\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 Third IEEE International Conference on Secure Software Integration and Reliability Improvement\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSIRI.2009.66\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Third IEEE International Conference on Secure Software Integration and Reliability Improvement","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSIRI.2009.66","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Safe Approach to Shrink Email Sample Set while Keeping Balance between Spam and Normal
To deal with any possible cases for training antispam machine learning models, it is crucial to design a safe way to shrink the size of training sample set via reducing redundancies with minimal information loss for classification as well as make distribution of samples balanced. Presently, there is no such solution to do so. In this paper, we propose a safe approach to address these problems and improve the quality of training email sample pool (set) for getting high quality machine learning models for better anti-spam engine with non-biased high spam detection rates as well as low false positive rates.