基于局部贝叶斯风险最小化的分层分类停止策略

2017 IEEE International Conference on Data Mining (ICDM) Pub Date : 2017-11-01 DOI:10.1109/ICDM.2017.61

Yu Wang, Q. Hu, Yucan Zhou, Hong Zhao, Y. Qian, Jiye Liang

{"title":"基于局部贝叶斯风险最小化的分层分类停止策略","authors":"Yu Wang, Q. Hu, Yucan Zhou, Hong Zhao, Y. Qian, Jiye Liang","doi":"10.1109/ICDM.2017.61","DOIUrl":null,"url":null,"abstract":"In large-scale data classification tasks, it is becoming more and more challenging in finding a true class from a huge amount of candidate categories. Fortunately, a hierarchical structure usually exists in these massive categories. The task of utilizing this structure for effective classification is called hierarchical classification. It usually follows a top-down fashion which predicts a sample from the root node with a coarse-grained category to a leaf node with a fine-grained category. However, misclassification is inevitable if the information is insufficient or large uncertainty exists in the prediction process. In this scenario, we can design a stopping strategy to stop the sample at an internal node with a coarser category, instead of predicting a wrong leaf node. Several studies address the problem by improving performance in terms of hierarchical accuracy and informative prediction. However, all of these researches ignore an important issue: when predicting a sample at the current node, the error is inclined to occur if large uncertainty exists in the next lower level children nodes. In this paper, we integrate this uncertainty into a risk problem: when predicting a sample at a decision node, it will take precipitance risk in predicting the sample to a children node in the next lower level on one hand, and take conservative risk in stopping at the current node on the other. We address the risk problem by designing a Local Bayes Risk Minimization (LBRM) framework, which divides the prediction process into recursively deciding to stop or to go down at each decision node by balancing these two risks in a top-down fashion. Rather than setting a global loss function in the traditional Bayes risk framework, we replace it with different uncertainty in the two risks for each decision node. The uncertainty on the precipitance risk and the conservative risk are measured by information entropy on children nodes and information gain from the current node to children nodes, respectively. We propose a Weighted Tree Induced Error (WTIE) to obtain the predictions of minimum risk with different emphasis on the two risks. Experimental results on various datasets show the effectiveness of the proposed LBRM algorithm.","PeriodicalId":254086,"journal":{"name":"2017 IEEE International Conference on Data Mining (ICDM)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Local Bayes Risk Minimization Based Stopping Strategy for Hierarchical Classification\",\"authors\":\"Yu Wang, Q. Hu, Yucan Zhou, Hong Zhao, Y. Qian, Jiye Liang\",\"doi\":\"10.1109/ICDM.2017.61\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In large-scale data classification tasks, it is becoming more and more challenging in finding a true class from a huge amount of candidate categories. Fortunately, a hierarchical structure usually exists in these massive categories. The task of utilizing this structure for effective classification is called hierarchical classification. It usually follows a top-down fashion which predicts a sample from the root node with a coarse-grained category to a leaf node with a fine-grained category. However, misclassification is inevitable if the information is insufficient or large uncertainty exists in the prediction process. In this scenario, we can design a stopping strategy to stop the sample at an internal node with a coarser category, instead of predicting a wrong leaf node. Several studies address the problem by improving performance in terms of hierarchical accuracy and informative prediction. However, all of these researches ignore an important issue: when predicting a sample at the current node, the error is inclined to occur if large uncertainty exists in the next lower level children nodes. In this paper, we integrate this uncertainty into a risk problem: when predicting a sample at a decision node, it will take precipitance risk in predicting the sample to a children node in the next lower level on one hand, and take conservative risk in stopping at the current node on the other. We address the risk problem by designing a Local Bayes Risk Minimization (LBRM) framework, which divides the prediction process into recursively deciding to stop or to go down at each decision node by balancing these two risks in a top-down fashion. Rather than setting a global loss function in the traditional Bayes risk framework, we replace it with different uncertainty in the two risks for each decision node. The uncertainty on the precipitance risk and the conservative risk are measured by information entropy on children nodes and information gain from the current node to children nodes, respectively. We propose a Weighted Tree Induced Error (WTIE) to obtain the predictions of minimum risk with different emphasis on the two risks. Experimental results on various datasets show the effectiveness of the proposed LBRM algorithm.\",\"PeriodicalId\":254086,\"journal\":{\"name\":\"2017 IEEE International Conference on Data Mining (ICDM)\",\"volume\":\"83 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Conference on Data Mining (ICDM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2017.61\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Conference on Data Mining (ICDM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2017.61","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

在大规模的数据分类任务中，从大量的候选类别中找到一个真正的类变得越来越具有挑战性。幸运的是，在这些庞大的类别中通常存在分层结构。利用这种结构进行有效分类的任务称为分层分类。它通常遵循自顶向下的方式，从具有粗粒度类别的根节点预测样本到具有细粒度类别的叶节点。然而，如果在预测过程中信息不足或存在较大的不确定性，则不可避免地会出现误分类。在这种情况下，我们可以设计一个停止策略，将样本停在一个具有较粗类别的内部节点上，而不是预测一个错误的叶节点。一些研究通过提高分层精度和信息预测方面的性能来解决这个问题。然而，这些研究都忽略了一个重要的问题:在当前节点预测样本时，如果下一级子节点存在较大的不确定性，则容易产生误差。在本文中，我们将这种不确定性整合到一个风险问题中:当在一个决策节点上预测样本时，一方面将样本预测到下一层次的子节点上会有仓促性风险，另一方面将样本停在当前节点上会有保守性风险。我们通过设计一个局部贝叶斯风险最小化(LBRM)框架来解决风险问题，该框架通过自上而下的方式平衡这两种风险，将预测过程划分为递归地决定在每个决策节点停止或下降。在传统的贝叶斯风险框架中，我们没有设置全局损失函数，而是为每个决策节点的两个风险设置不同的不确定性。仓促性风险和保守性风险的不确定性分别用子节点信息熵和当前节点到子节点的信息增益来度量。我们提出加权树诱导误差(加权树诱导误差)来获得两种风险不同侧重点下的最小风险预测。在不同数据集上的实验结果表明了LBRM算法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Local Bayes Risk Minimization Based Stopping Strategy for Hierarchical Classification

In large-scale data classification tasks, it is becoming more and more challenging in finding a true class from a huge amount of candidate categories. Fortunately, a hierarchical structure usually exists in these massive categories. The task of utilizing this structure for effective classification is called hierarchical classification. It usually follows a top-down fashion which predicts a sample from the root node with a coarse-grained category to a leaf node with a fine-grained category. However, misclassification is inevitable if the information is insufficient or large uncertainty exists in the prediction process. In this scenario, we can design a stopping strategy to stop the sample at an internal node with a coarser category, instead of predicting a wrong leaf node. Several studies address the problem by improving performance in terms of hierarchical accuracy and informative prediction. However, all of these researches ignore an important issue: when predicting a sample at the current node, the error is inclined to occur if large uncertainty exists in the next lower level children nodes. In this paper, we integrate this uncertainty into a risk problem: when predicting a sample at a decision node, it will take precipitance risk in predicting the sample to a children node in the next lower level on one hand, and take conservative risk in stopping at the current node on the other. We address the risk problem by designing a Local Bayes Risk Minimization (LBRM) framework, which divides the prediction process into recursively deciding to stop or to go down at each decision node by balancing these two risks in a top-down fashion. Rather than setting a global loss function in the traditional Bayes risk framework, we replace it with different uncertainty in the two risks for each decision node. The uncertainty on the precipitance risk and the conservative risk are measured by information entropy on children nodes and information gain from the current node to children nodes, respectively. We propose a Weighted Tree Induced Error (WTIE) to obtain the predictions of minimum risk with different emphasis on the two risks. Experimental results on various datasets show the effectiveness of the proposed LBRM algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE International Conference on Data Mining (ICDM)

自引率

0.00%

发文量