{"title":"Optimization of inner and general rules","authors":"Beata Zielosko , Mikhail Moshkov , Evans Teiko Tetteh","doi":"10.1016/j.ins.2025.122466","DOIUrl":null,"url":null,"abstract":"<div><div>The subject of the paper concerns the problem of deriving decision rules from distributed data. The paper examines issues of learning general and inner decision rules from a set of decision trees. Inner rules refer to the routes within decision trees from the root to leaf nodes, while general rules are arbitrary rules derived from attributes found in the set of decision trees. The paper illustrates that the optimization of general decision rules is NP-hard problem, so the authors propose heuristics <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> and <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> for this issue. Taking into account induction and optimization of inner decision rules an algorithm <span><math><mi>A</mi></math></span> is employed. Additionally, an approach based on global optimization relative to length, support, and sequential optimization is proposed. The presented algorithms were studied considering two perspectives (i) knowledge discovery from data and (ii) knowledge representation. In the first case, it is possible to discover patterns from the data and verify the induced model, in the second case, it is possible to represent knowledge in a comprehensible and explainable way. These elements are important in an era of heterogeneous, distributed data sources. Experiments were carried out on selected datasets from UCI ML and Kaggle repositories. In order to create a distributed data structure, an approach based on reducts induced by a genetic algorithm was employed. Obtained results show that there are cases where the global rule-based classifiers built in the framework of optimization of inner decision rules perform better in terms of accuracy than that of local models induced directly from subtables. In the case of algorithms <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> and <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>, the low complexity of models based on decision rules induced from a set of decision trees is noted.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"719 ","pages":"Article 122466"},"PeriodicalIF":6.8000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525005985","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The subject of the paper concerns the problem of deriving decision rules from distributed data. The paper examines issues of learning general and inner decision rules from a set of decision trees. Inner rules refer to the routes within decision trees from the root to leaf nodes, while general rules are arbitrary rules derived from attributes found in the set of decision trees. The paper illustrates that the optimization of general decision rules is NP-hard problem, so the authors propose heuristics and for this issue. Taking into account induction and optimization of inner decision rules an algorithm is employed. Additionally, an approach based on global optimization relative to length, support, and sequential optimization is proposed. The presented algorithms were studied considering two perspectives (i) knowledge discovery from data and (ii) knowledge representation. In the first case, it is possible to discover patterns from the data and verify the induced model, in the second case, it is possible to represent knowledge in a comprehensible and explainable way. These elements are important in an era of heterogeneous, distributed data sources. Experiments were carried out on selected datasets from UCI ML and Kaggle repositories. In order to create a distributed data structure, an approach based on reducts induced by a genetic algorithm was employed. Obtained results show that there are cases where the global rule-based classifiers built in the framework of optimization of inner decision rules perform better in terms of accuracy than that of local models induced directly from subtables. In the case of algorithms and , the low complexity of models based on decision rules induced from a set of decision trees is noted.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.