Example-Based Feature Tweaking Using Random Forests

2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI) Pub Date : 2019-07-01 DOI:10.1109/IRI.2019.00022

Tony Lindgren, P. Papapetrou, Isak Samsten, L. Asker

{"title":"Example-Based Feature Tweaking Using Random Forests","authors":"Tony Lindgren, P. Papapetrou, Isak Samsten, L. Asker","doi":"10.1109/IRI.2019.00022","DOIUrl":null,"url":null,"abstract":"In certain application areas when using predictive models, it is not enough to make an accurate prediction for an example, instead it might be more important to change a prediction from an undesired class into a desired class. In this paper we investigate methods for changing predictions of examples. To this end, we introduce a novel algorithm for changing predictions of examples and we compare this novel method to an existing method and a baseline method. In an empirical evaluation we compare the three methods on a total of 22 datasets. The results show that the novel method and the baseline method can change an example from an undesired class into a desired class in more cases than the competitor method (and in some cases this difference is statistically significant). We also show that the distance, as measured by the euclidean norm, is higher for the novel and baseline methods (and in some cases this difference is statistically significantly) than for state-of-the-art. The methods and their proposed changes are also evaluated subjectively in a medical domain with interesting results.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2019.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In certain application areas when using predictive models, it is not enough to make an accurate prediction for an example, instead it might be more important to change a prediction from an undesired class into a desired class. In this paper we investigate methods for changing predictions of examples. To this end, we introduce a novel algorithm for changing predictions of examples and we compare this novel method to an existing method and a baseline method. In an empirical evaluation we compare the three methods on a total of 22 datasets. The results show that the novel method and the baseline method can change an example from an undesired class into a desired class in more cases than the competitor method (and in some cases this difference is statistically significant). We also show that the distance, as measured by the euclidean norm, is higher for the novel and baseline methods (and in some cases this difference is statistically significantly) than for state-of-the-art. The methods and their proposed changes are also evaluated subjectively in a medical domain with interesting results.

查看原文本刊更多论文

使用随机森林进行基于示例的特征调整

在使用预测模型的某些应用领域中，仅仅对示例进行准确的预测是不够的，相反，将预测从不需要的类更改为所需的类可能更为重要。本文研究了样例变化预测的方法。为此，我们引入了一种新的算法来改变样本的预测，并将这种新方法与现有方法和基线方法进行了比较。在实证评估中，我们在总共22个数据集上比较了三种方法。结果表明，与竞争对手的方法相比，新方法和基线方法在更多情况下可以将示例从不需要的类更改为所需的类(在某些情况下，这种差异在统计上是显著的)。我们还表明，通过欧几里得范数测量的距离，对于新方法和基线方法(在某些情况下，这种差异在统计上显着)比最先进的方法要高。这些方法及其提出的变化也在医学领域进行了主观评估，并产生了有趣的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)

自引率

0.00%

发文量