内部和一般规则的优化

IF 6.8 1区 计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS
Beata Zielosko , Mikhail Moshkov , Evans Teiko Tetteh
{"title":"内部和一般规则的优化","authors":"Beata Zielosko ,&nbsp;Mikhail Moshkov ,&nbsp;Evans Teiko Tetteh","doi":"10.1016/j.ins.2025.122466","DOIUrl":null,"url":null,"abstract":"<div><div>The subject of the paper concerns the problem of deriving decision rules from distributed data. The paper examines issues of learning general and inner decision rules from a set of decision trees. Inner rules refer to the routes within decision trees from the root to leaf nodes, while general rules are arbitrary rules derived from attributes found in the set of decision trees. The paper illustrates that the optimization of general decision rules is NP-hard problem, so the authors propose heuristics <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> and <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> for this issue. Taking into account induction and optimization of inner decision rules an algorithm <span><math><mi>A</mi></math></span> is employed. Additionally, an approach based on global optimization relative to length, support, and sequential optimization is proposed. The presented algorithms were studied considering two perspectives (i) knowledge discovery from data and (ii) knowledge representation. In the first case, it is possible to discover patterns from the data and verify the induced model, in the second case, it is possible to represent knowledge in a comprehensible and explainable way. These elements are important in an era of heterogeneous, distributed data sources. Experiments were carried out on selected datasets from UCI ML and Kaggle repositories. In order to create a distributed data structure, an approach based on reducts induced by a genetic algorithm was employed. Obtained results show that there are cases where the global rule-based classifiers built in the framework of optimization of inner decision rules perform better in terms of accuracy than that of local models induced directly from subtables. In the case of algorithms <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> and <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>, the low complexity of models based on decision rules induced from a set of decision trees is noted.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"719 ","pages":"Article 122466"},"PeriodicalIF":6.8000,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Optimization of inner and general rules\",\"authors\":\"Beata Zielosko ,&nbsp;Mikhail Moshkov ,&nbsp;Evans Teiko Tetteh\",\"doi\":\"10.1016/j.ins.2025.122466\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The subject of the paper concerns the problem of deriving decision rules from distributed data. The paper examines issues of learning general and inner decision rules from a set of decision trees. Inner rules refer to the routes within decision trees from the root to leaf nodes, while general rules are arbitrary rules derived from attributes found in the set of decision trees. The paper illustrates that the optimization of general decision rules is NP-hard problem, so the authors propose heuristics <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> and <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span> for this issue. Taking into account induction and optimization of inner decision rules an algorithm <span><math><mi>A</mi></math></span> is employed. Additionally, an approach based on global optimization relative to length, support, and sequential optimization is proposed. The presented algorithms were studied considering two perspectives (i) knowledge discovery from data and (ii) knowledge representation. In the first case, it is possible to discover patterns from the data and verify the induced model, in the second case, it is possible to represent knowledge in a comprehensible and explainable way. These elements are important in an era of heterogeneous, distributed data sources. Experiments were carried out on selected datasets from UCI ML and Kaggle repositories. In order to create a distributed data structure, an approach based on reducts induced by a genetic algorithm was employed. Obtained results show that there are cases where the global rule-based classifiers built in the framework of optimization of inner decision rules perform better in terms of accuracy than that of local models induced directly from subtables. In the case of algorithms <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> and <span><math><msub><mrow><mi>H</mi></mrow><mrow><mn>2</mn></mrow></msub></math></span>, the low complexity of models based on decision rules induced from a set of decision trees is noted.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"719 \",\"pages\":\"Article 122466\"},\"PeriodicalIF\":6.8000,\"publicationDate\":\"2025-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025525005985\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525005985","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

本文的主题是关于从分布式数据中推导决策规则的问题。本文研究了从一组决策树中学习一般决策规则和内部决策规则的问题。内部规则是指决策树中从根节点到叶节点的路由,而一般规则是指从决策树集中找到的属性派生的任意规则。由于一般决策规则的优化是np困难问题,因此提出了启发式算法H1和H2。考虑到内部决策规则的归纳和优化,采用了A算法。此外,提出了一种基于长度、支持度和顺序优化的全局优化方法。提出的算法从两个角度进行了研究(i)从数据中发现知识和(ii)知识表示。在第一种情况下,可以从数据中发现模式并验证归纳模型,在第二种情况下,可以以可理解和可解释的方式表示知识。这些元素在异构、分布式数据源的时代非常重要。实验在UCI ML和Kaggle知识库中选定的数据集上进行。为了创建分布式数据结构,采用了一种基于遗传算法的约简方法。结果表明,在内部决策规则优化框架下构建的全局规则分类器在准确率方面优于直接从子表中导出的局部模型。在算法H1和H2的情况下,注意到基于一组决策树导出的决策规则的模型的低复杂性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimization of inner and general rules
The subject of the paper concerns the problem of deriving decision rules from distributed data. The paper examines issues of learning general and inner decision rules from a set of decision trees. Inner rules refer to the routes within decision trees from the root to leaf nodes, while general rules are arbitrary rules derived from attributes found in the set of decision trees. The paper illustrates that the optimization of general decision rules is NP-hard problem, so the authors propose heuristics H1 and H2 for this issue. Taking into account induction and optimization of inner decision rules an algorithm A is employed. Additionally, an approach based on global optimization relative to length, support, and sequential optimization is proposed. The presented algorithms were studied considering two perspectives (i) knowledge discovery from data and (ii) knowledge representation. In the first case, it is possible to discover patterns from the data and verify the induced model, in the second case, it is possible to represent knowledge in a comprehensible and explainable way. These elements are important in an era of heterogeneous, distributed data sources. Experiments were carried out on selected datasets from UCI ML and Kaggle repositories. In order to create a distributed data structure, an approach based on reducts induced by a genetic algorithm was employed. Obtained results show that there are cases where the global rule-based classifiers built in the framework of optimization of inner decision rules perform better in terms of accuracy than that of local models induced directly from subtables. In the case of algorithms H1 and H2, the low complexity of models based on decision rules induced from a set of decision trees is noted.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Sciences
Information Sciences 工程技术-计算机:信息系统
CiteScore
14.00
自引率
17.30%
发文量
1322
审稿时长
10.4 months
期刊介绍: Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信