{"title":"Less naive Bayes spai detection","authors":"Hongming Yang, Maurice Stassen, Tjalling Tjalkens","doi":"10.1109/ITA.2007.4357608","DOIUrl":null,"url":null,"abstract":"We consider a binary classification problem with a feature vector of high dimensionality. Spam mail filters are a popular example hereof. A naive Bayes filter assumes conditional independence of the feature vector components. We use the context tree weighting method as an application of the minimum description length principle to allow for dependencies between the feature vector components. It turns out that, due to the limited amount of training data, we must assume conditional independence between groups of vector components. We consider several ad-hoc algorithms to find good groupings and good conditional models.","PeriodicalId":439952,"journal":{"name":"2007 Information Theory and Applications Workshop","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 Information Theory and Applications Workshop","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITA.2007.4357608","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We consider a binary classification problem with a feature vector of high dimensionality. Spam mail filters are a popular example hereof. A naive Bayes filter assumes conditional independence of the feature vector components. We use the context tree weighting method as an application of the minimum description length principle to allow for dependencies between the feature vector components. It turns out that, due to the limited amount of training data, we must assume conditional independence between groups of vector components. We consider several ad-hoc algorithms to find good groupings and good conditional models.