André Gustavo Maletzke, Denis Moreira dos Reis, Waqar Hassan, Gustavo E. A. P. A. Batista
{"title":"准确量化得分变异性","authors":"André Gustavo Maletzke, Denis Moreira dos Reis, Waqar Hassan, Gustavo E. A. P. A. Batista","doi":"10.1109/ICDM51629.2021.00149","DOIUrl":null,"url":null,"abstract":"The quantification objective is to predict the class distribution of a data sample. Therefore, this task intrinsically involves a drift in the class distribution that causes a mismatch between the training and test sets. However, existing quantification approaches assume that the feature distribution is stationary. We analyse for the first time how score-based quantifiers are affected by concept drifts and propose a novel drift-resilient quantifier for binary classes. Our proposal does not model the different types of concept drifts. Instead, we model the changes that such changes cause in the classification scores. This observation simplifies our analysis since distribution changes can only increase, decrease or maintain the overlap of the positive and negative classes in a rank induced by the scores. Our paper has two main contributions. The first one is MoSS, a model for synthetic scores. We use this model to show that state-of-the-art quantifiers underperform in the occurrence of any concept drift that changes the score distribution. Our second contribution is a quantifier, DySyn, that uses MoSS to estimate the class distribution. We show that DySyn statistically outperforms state-of-the-art quantifiers in a comprehensive comparison with real-world and benchmark datasets in the presence of concept drifts.","PeriodicalId":320970,"journal":{"name":"2021 IEEE International Conference on Data Mining (ICDM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Accurately Quantifying under Score Variability\",\"authors\":\"André Gustavo Maletzke, Denis Moreira dos Reis, Waqar Hassan, Gustavo E. A. P. A. Batista\",\"doi\":\"10.1109/ICDM51629.2021.00149\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The quantification objective is to predict the class distribution of a data sample. Therefore, this task intrinsically involves a drift in the class distribution that causes a mismatch between the training and test sets. However, existing quantification approaches assume that the feature distribution is stationary. We analyse for the first time how score-based quantifiers are affected by concept drifts and propose a novel drift-resilient quantifier for binary classes. Our proposal does not model the different types of concept drifts. Instead, we model the changes that such changes cause in the classification scores. This observation simplifies our analysis since distribution changes can only increase, decrease or maintain the overlap of the positive and negative classes in a rank induced by the scores. Our paper has two main contributions. The first one is MoSS, a model for synthetic scores. We use this model to show that state-of-the-art quantifiers underperform in the occurrence of any concept drift that changes the score distribution. Our second contribution is a quantifier, DySyn, that uses MoSS to estimate the class distribution. We show that DySyn statistically outperforms state-of-the-art quantifiers in a comprehensive comparison with real-world and benchmark datasets in the presence of concept drifts.\",\"PeriodicalId\":320970,\"journal\":{\"name\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM51629.2021.00149\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM51629.2021.00149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The quantification objective is to predict the class distribution of a data sample. Therefore, this task intrinsically involves a drift in the class distribution that causes a mismatch between the training and test sets. However, existing quantification approaches assume that the feature distribution is stationary. We analyse for the first time how score-based quantifiers are affected by concept drifts and propose a novel drift-resilient quantifier for binary classes. Our proposal does not model the different types of concept drifts. Instead, we model the changes that such changes cause in the classification scores. This observation simplifies our analysis since distribution changes can only increase, decrease or maintain the overlap of the positive and negative classes in a rank induced by the scores. Our paper has two main contributions. The first one is MoSS, a model for synthetic scores. We use this model to show that state-of-the-art quantifiers underperform in the occurrence of any concept drift that changes the score distribution. Our second contribution is a quantifier, DySyn, that uses MoSS to estimate the class distribution. We show that DySyn statistically outperforms state-of-the-art quantifiers in a comprehensive comparison with real-world and benchmark datasets in the presence of concept drifts.