Cooperative induction of decision trees

2013 IEEE Symposium on Intelligent Agents (IA) Pub Date : 2013-04-16 DOI:10.1109/IA.2013.6595190

A. Bazzan

{"title":"Cooperative induction of decision trees","authors":"A. Bazzan","doi":"10.1109/IA.2013.6595190","DOIUrl":null,"url":null,"abstract":"Currently many problems related to data mining and knowledge discovery have two relevant characteristics: they produce data that is distributed over several locations, while also generating large volumes of data that need to be classified in an online fashion. Examples of such applications are related to bioinformatics, e-commerce, and sensor data. Regarding classification by means of decision trees, some efficient approaches have been proposed, which are centralized and based on restructuring the decision tree using new instances. However, there are some issues. First, most proposed approaches require that new instances are fully labeled. Second, in some environments, the agent in charge of the classification task cannot re-induce the classifier or restructure the decision tree each time it observes a new instance. Moreover, because this agent does not see the whole dataset, the induced classifier is not likely to be very accurate unless information is exchanged among the agents that are, each, in charge of pieces of the data. Thus, a decrease in accuracy may occur because attributes and classes may be misrepresented in the training dataset used so far. Instead of re-inducing the classification model with arbitrary frequency in a centralized way, this paper proposes an approach based on reinforcement learning that allows agents to go on using the existing classifier as basis for some exploration in the space of possible classifications. We use a quality assessment of the learned model in order to let each agent decide when it is time to get a new model, either by borrowing it from another agent, or by inducing a new classifier. Results using UCI datasets with various characteristics show that this method can be used as a compromise between costly methods for re-inducing the classifier at all times, and using only a static and centralized classification model.","PeriodicalId":114295,"journal":{"name":"2013 IEEE Symposium on Intelligent Agents (IA)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Symposium on Intelligent Agents (IA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IA.2013.6595190","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Currently many problems related to data mining and knowledge discovery have two relevant characteristics: they produce data that is distributed over several locations, while also generating large volumes of data that need to be classified in an online fashion. Examples of such applications are related to bioinformatics, e-commerce, and sensor data. Regarding classification by means of decision trees, some efficient approaches have been proposed, which are centralized and based on restructuring the decision tree using new instances. However, there are some issues. First, most proposed approaches require that new instances are fully labeled. Second, in some environments, the agent in charge of the classification task cannot re-induce the classifier or restructure the decision tree each time it observes a new instance. Moreover, because this agent does not see the whole dataset, the induced classifier is not likely to be very accurate unless information is exchanged among the agents that are, each, in charge of pieces of the data. Thus, a decrease in accuracy may occur because attributes and classes may be misrepresented in the training dataset used so far. Instead of re-inducing the classification model with arbitrary frequency in a centralized way, this paper proposes an approach based on reinforcement learning that allows agents to go on using the existing classifier as basis for some exploration in the space of possible classifications. We use a quality assessment of the learned model in order to let each agent decide when it is time to get a new model, either by borrowing it from another agent, or by inducing a new classifier. Results using UCI datasets with various characteristics show that this method can be used as a compromise between costly methods for re-inducing the classifier at all times, and using only a static and centralized classification model.

查看原文本刊更多论文

决策树的合作归纳

目前，许多与数据挖掘和知识发现相关的问题都有两个相关的特点:它们产生的数据分布在多个位置，同时也产生大量需要在线分类的数据。这类应用的例子与生物信息学、电子商务和传感器数据有关。针对基于决策树的分类方法，提出了一些集中的、基于新实例重构决策树的分类方法。然而，也存在一些问题。首先，大多数建议的方法要求对新实例进行完全标记。其次，在某些环境中，负责分类任务的智能体不能在每次观察到一个新实例时重新诱导分类器或重构决策树。此外，由于该代理没有看到整个数据集，除非在每个代理之间交换信息，否则诱导分类器不太可能非常准确，每个代理负责数据块。因此，准确性可能会下降，因为属性和类可能在目前使用的训练数据集中被错误地表示。本文提出了一种基于强化学习的方法，允许智能体继续使用现有的分类器作为基础，在可能分类的空间中进行一些探索，而不是集中地以任意频率重新归纳分类模型。我们使用学习模型的质量评估，以便让每个智能体决定何时获得新模型，或者通过从另一个智能体借用模型，或者通过引入新的分类器。使用具有各种特征的UCI数据集的结果表明，该方法可以作为一种折衷的方法，在任何时候都需要重新诱导分类器的昂贵方法和仅使用静态和集中的分类模型之间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE Symposium on Intelligent Agents (IA)

自引率

0.00%

发文量