{"title":"Cooperative induction of decision trees","authors":"A. Bazzan","doi":"10.1109/IA.2013.6595190","DOIUrl":null,"url":null,"abstract":"Currently many problems related to data mining and knowledge discovery have two relevant characteristics: they produce data that is distributed over several locations, while also generating large volumes of data that need to be classified in an online fashion. Examples of such applications are related to bioinformatics, e-commerce, and sensor data. Regarding classification by means of decision trees, some efficient approaches have been proposed, which are centralized and based on restructuring the decision tree using new instances. However, there are some issues. First, most proposed approaches require that new instances are fully labeled. Second, in some environments, the agent in charge of the classification task cannot re-induce the classifier or restructure the decision tree each time it observes a new instance. Moreover, because this agent does not see the whole dataset, the induced classifier is not likely to be very accurate unless information is exchanged among the agents that are, each, in charge of pieces of the data. Thus, a decrease in accuracy may occur because attributes and classes may be misrepresented in the training dataset used so far. Instead of re-inducing the classification model with arbitrary frequency in a centralized way, this paper proposes an approach based on reinforcement learning that allows agents to go on using the existing classifier as basis for some exploration in the space of possible classifications. We use a quality assessment of the learned model in order to let each agent decide when it is time to get a new model, either by borrowing it from another agent, or by inducing a new classifier. Results using UCI datasets with various characteristics show that this method can be used as a compromise between costly methods for re-inducing the classifier at all times, and using only a static and centralized classification model.","PeriodicalId":114295,"journal":{"name":"2013 IEEE Symposium on Intelligent Agents (IA)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Symposium on Intelligent Agents (IA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IA.2013.6595190","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Currently many problems related to data mining and knowledge discovery have two relevant characteristics: they produce data that is distributed over several locations, while also generating large volumes of data that need to be classified in an online fashion. Examples of such applications are related to bioinformatics, e-commerce, and sensor data. Regarding classification by means of decision trees, some efficient approaches have been proposed, which are centralized and based on restructuring the decision tree using new instances. However, there are some issues. First, most proposed approaches require that new instances are fully labeled. Second, in some environments, the agent in charge of the classification task cannot re-induce the classifier or restructure the decision tree each time it observes a new instance. Moreover, because this agent does not see the whole dataset, the induced classifier is not likely to be very accurate unless information is exchanged among the agents that are, each, in charge of pieces of the data. Thus, a decrease in accuracy may occur because attributes and classes may be misrepresented in the training dataset used so far. Instead of re-inducing the classification model with arbitrary frequency in a centralized way, this paper proposes an approach based on reinforcement learning that allows agents to go on using the existing classifier as basis for some exploration in the space of possible classifications. We use a quality assessment of the learned model in order to let each agent decide when it is time to get a new model, either by borrowing it from another agent, or by inducing a new classifier. Results using UCI datasets with various characteristics show that this method can be used as a compromise between costly methods for re-inducing the classifier at all times, and using only a static and centralized classification model.