Optimization of inner and general rules

IF 6.8 1区计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Sciences Pub Date : 2025-07-02 DOI:10.1016/j.ins.2025.122466

Beata Zielosko , Mikhail Moshkov , Evans Teiko Tetteh

{"title":"Optimization of inner and general rules","authors":"Beata Zielosko , Mikhail Moshkov , Evans Teiko Tetteh","doi":"10.1016/j.ins.2025.122466","DOIUrl":null,"url":null,"abstract":"<div><div>The subject of the paper concerns the problem of deriving decision rules from distributed data. The paper examines issues of learning general and inner decision rules from a set of decision trees. Inner rules refer to the routes within decision trees from the root to leaf nodes, while general rules are arbitrary rules derived from attributes found in the set of decision trees. The paper illustrates that the optimization of general decision rules is NP-hard problem, so the authors propose heuristics <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> and <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> for this issue. Taking into account induction and optimization of inner decision rules an algorithm <span><math><mi>A</mi></math></span> is employed. Additionally, an approach based on global optimization relative to length, support, and sequential optimization is proposed. The presented algorithms were studied considering two perspectives (i) knowledge discovery from data and (ii) knowledge representation. In the first case, it is possible to discover patterns from the data and verify the induced model, in the second case, it is possible to represent knowledge in a comprehensible and explainable way. These elements are important in an era of heterogeneous, distributed data sources. Experiments were carried out on selected datasets from UCI ML and Kaggle repositories. In order to create a distributed data structure, an approach based on reducts induced by a genetic algorithm was employed. Obtained results show that there are cases where the global rule-based classifiers built in the framework of optimization of inner decision rules perform better in terms of accuracy than that of local models induced directly from subtables. In the case of algorithms <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> and <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>, the low complexity of models based on decision rules induced from a set of decision trees is noted.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"719 ","pages":"Article 122466"},"PeriodicalIF":6.8000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525005985","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The subject of the paper concerns the problem of deriving decision rules from distributed data. The paper examines issues of learning general and inner decision rules from a set of decision trees. Inner rules refer to the routes within decision trees from the root to leaf nodes, while general rules are arbitrary rules derived from attributes found in the set of decision trees. The paper illustrates that the optimization of general decision rules is NP-hard problem, so the authors propose heuristics

H_{1}

and

H_{2}

for this issue. Taking into account induction and optimization of inner decision rules an algorithm

A

is employed. Additionally, an approach based on global optimization relative to length, support, and sequential optimization is proposed. The presented algorithms were studied considering two perspectives (i) knowledge discovery from data and (ii) knowledge representation. In the first case, it is possible to discover patterns from the data and verify the induced model, in the second case, it is possible to represent knowledge in a comprehensible and explainable way. These elements are important in an era of heterogeneous, distributed data sources. Experiments were carried out on selected datasets from UCI ML and Kaggle repositories. In order to create a distributed data structure, an approach based on reducts induced by a genetic algorithm was employed. Obtained results show that there are cases where the global rule-based classifiers built in the framework of optimization of inner decision rules perform better in terms of accuracy than that of local models induced directly from subtables. In the case of algorithms

H_{1}

and

H_{2}

, the low complexity of models based on decision rules induced from a set of decision trees is noted.

查看原文本刊更多论文

内部和一般规则的优化

本文的主题是关于从分布式数据中推导决策规则的问题。本文研究了从一组决策树中学习一般决策规则和内部决策规则的问题。内部规则是指决策树中从根节点到叶节点的路由，而一般规则是指从决策树集中找到的属性派生的任意规则。由于一般决策规则的优化是np困难问题，因此提出了启发式算法H1和H2。考虑到内部决策规则的归纳和优化，采用了A算法。此外，提出了一种基于长度、支持度和顺序优化的全局优化方法。提出的算法从两个角度进行了研究(i)从数据中发现知识和（ii）知识表示。在第一种情况下，可以从数据中发现模式并验证归纳模型，在第二种情况下，可以以可理解和可解释的方式表示知识。这些元素在异构、分布式数据源的时代非常重要。实验在UCI ML和Kaggle知识库中选定的数据集上进行。为了创建分布式数据结构，采用了一种基于遗传算法的约简方法。结果表明，在内部决策规则优化框架下构建的全局规则分类器在准确率方面优于直接从子表中导出的局部模型。在算法H1和H2的情况下，注意到基于一组决策树导出的决策规则的模型的低复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Sciences 工程技术-计算机：信息系统

CiteScore

14.00

自引率

17.30%

发文量

1322

审稿时长

10.4 months

期刊介绍： Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.