利用彩色Petri网增强机器学习：一种基于模拟的医疗保健及其他领域的方法

IF 3.5 2区计算机科学 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Simulation Modelling Practice and Theory Pub Date : 2025-02-07 DOI:10.1016/j.simpat.2025.103080

Andressa C.M. da Silveira , Álvaro Sobrinho , Leandro Dias da Silva , Danilo F.S. Santos , Muhammad Nauman , Angelo Perkusich

{"title":"利用彩色Petri网增强机器学习：一种基于模拟的医疗保健及其他领域的方法","authors":"Andressa C.M. da Silveira , Álvaro Sobrinho , Leandro Dias da Silva , Danilo F.S. Santos , Muhammad Nauman , Angelo Perkusich","doi":"10.1016/j.simpat.2025.103080","DOIUrl":null,"url":null,"abstract":"<div><div>Many industries use Machine Learning (ML) techniques to enhance systems’ performance. However, integrating ML into these systems poses challenges, often requiring improved explainability and accuracy. Using formal methods is a potential solution to address these challenges. This paper presents a simulation-based method using Coloured Petri Nets (CPN) to enhance the explainability and accuracy of Decision Tree (DT) and Random Forest (RF) models, which industries such as healthcare widely adopt. Our simulation-based method, named RuleXtract/CPN, provides procedures for the automatic extraction of decision rules from an implemented ML model, the generation of these decision rules into a CPN model, the analysis of the CPN model through simulations, and the adjustment of the CPN model to improve explainability and accuracy. Automating the transformation from DT/RF to a CPN model and the analysis procedures can reduce the time and effort needed for modeling tasks. We used web technologies and the Access/CPN framework to implement the procedures defined in our simulation-based method so that users would not need CPN expertise to generate and simulate models, running them in the background. An experiment with three datasets for COVID-19 and five for Influenza screening shows that applying our simulation-based method results in more explainable models. The experiment also shows improvement in accuracy measures for RF models. For instance, the accuracy of the RF model using the Influenza rapid test balanced dataset increased from 84.02% to 86.34%, and the unbalanced dataset from 84.78% to 87.53%. Our results underscore the importance of eliminating duplicated, poorly generalized, and incorrect rules to improve explainability and accuracy. These findings also emphasize the effectiveness of using CPN to improve the models, paving the way for future research.</div></div>","PeriodicalId":49518,"journal":{"name":"Simulation Modelling Practice and Theory","volume":"140 ","pages":"Article 103080"},"PeriodicalIF":3.5000,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Harnessing coloured Petri nets to enhance machine learning:A simulation-based method for healthcare and beyond\",\"authors\":\"Andressa C.M. da Silveira , Álvaro Sobrinho , Leandro Dias da Silva , Danilo F.S. Santos , Muhammad Nauman , Angelo Perkusich\",\"doi\":\"10.1016/j.simpat.2025.103080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Many industries use Machine Learning (ML) techniques to enhance systems’ performance. However, integrating ML into these systems poses challenges, often requiring improved explainability and accuracy. Using formal methods is a potential solution to address these challenges. This paper presents a simulation-based method using Coloured Petri Nets (CPN) to enhance the explainability and accuracy of Decision Tree (DT) and Random Forest (RF) models, which industries such as healthcare widely adopt. Our simulation-based method, named RuleXtract/CPN, provides procedures for the automatic extraction of decision rules from an implemented ML model, the generation of these decision rules into a CPN model, the analysis of the CPN model through simulations, and the adjustment of the CPN model to improve explainability and accuracy. Automating the transformation from DT/RF to a CPN model and the analysis procedures can reduce the time and effort needed for modeling tasks. We used web technologies and the Access/CPN framework to implement the procedures defined in our simulation-based method so that users would not need CPN expertise to generate and simulate models, running them in the background. An experiment with three datasets for COVID-19 and five for Influenza screening shows that applying our simulation-based method results in more explainable models. The experiment also shows improvement in accuracy measures for RF models. For instance, the accuracy of the RF model using the Influenza rapid test balanced dataset increased from 84.02% to 86.34%, and the unbalanced dataset from 84.78% to 87.53%. Our results underscore the importance of eliminating duplicated, poorly generalized, and incorrect rules to improve explainability and accuracy. These findings also emphasize the effectiveness of using CPN to improve the models, paving the way for future research.</div></div>\",\"PeriodicalId\":49518,\"journal\":{\"name\":\"Simulation Modelling Practice and Theory\",\"volume\":\"140 \",\"pages\":\"Article 103080\"},\"PeriodicalIF\":3.5000,\"publicationDate\":\"2025-02-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Simulation Modelling Practice and Theory\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1569190X25000152\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Simulation Modelling Practice and Theory","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569190X25000152","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

许多行业使用机器学习（ML）技术来提高系统性能。然而，将ML集成到这些系统中会带来挑战，通常需要提高可解释性和准确性。使用形式化方法是解决这些挑战的潜在解决方案。本文提出了一种基于仿真的方法，使用彩色Petri网（CPN）来提高决策树（DT）和随机森林（RF）模型的可解释性和准确性，这些模型被医疗保健等行业广泛采用。我们基于仿真的方法RuleXtract/CPN提供了从已实现的ML模型中自动提取决策规则、将这些决策规则生成CPN模型、通过仿真对CPN模型进行分析以及调整CPN模型以提高可解释性和准确性的过程。从DT/RF到CPN模型和分析过程的自动化转换可以减少建模任务所需的时间和精力。我们使用web技术和Access/CPN框架来实现我们基于模拟的方法中定义的程序，这样用户就不需要CPN专业知识来生成和模拟模型，并在后台运行它们。对COVID-19的三个数据集和流感筛查的五个数据集进行的实验表明，应用我们基于模拟的方法可以得到更可解释的模型。实验还显示了射频模型精度测量的改进。例如，使用流感快速测试平衡数据集的RF模型的准确性从84.02%提高到86.34%，使用不平衡数据集的RF模型的准确性从84.78%提高到87.53%。我们的结果强调了消除重复、泛化不良和不正确的规则以提高可解释性和准确性的重要性。这些发现也强调了使用CPN改进模型的有效性，为未来的研究铺平了道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Harnessing coloured Petri nets to enhance machine learning:A simulation-based method for healthcare and beyond

Many industries use Machine Learning (ML) techniques to enhance systems’ performance. However, integrating ML into these systems poses challenges, often requiring improved explainability and accuracy. Using formal methods is a potential solution to address these challenges. This paper presents a simulation-based method using Coloured Petri Nets (CPN) to enhance the explainability and accuracy of Decision Tree (DT) and Random Forest (RF) models, which industries such as healthcare widely adopt. Our simulation-based method, named RuleXtract/CPN, provides procedures for the automatic extraction of decision rules from an implemented ML model, the generation of these decision rules into a CPN model, the analysis of the CPN model through simulations, and the adjustment of the CPN model to improve explainability and accuracy. Automating the transformation from DT/RF to a CPN model and the analysis procedures can reduce the time and effort needed for modeling tasks. We used web technologies and the Access/CPN framework to implement the procedures defined in our simulation-based method so that users would not need CPN expertise to generate and simulate models, running them in the background. An experiment with three datasets for COVID-19 and five for Influenza screening shows that applying our simulation-based method results in more explainable models. The experiment also shows improvement in accuracy measures for RF models. For instance, the accuracy of the RF model using the Influenza rapid test balanced dataset increased from 84.02% to 86.34%, and the unbalanced dataset from 84.78% to 87.53%. Our results underscore the importance of eliminating duplicated, poorly generalized, and incorrect rules to improve explainability and accuracy. These findings also emphasize the effectiveness of using CPN to improve the models, paving the way for future research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Simulation Modelling Practice and Theory 工程技术-计算机：跨学科应用

CiteScore

9.80

自引率

4.80%

发文量

142

审稿时长

21 days

期刊介绍： The journal Simulation Modelling Practice and Theory provides a forum for original, high-quality papers dealing with any aspect of systems simulation and modelling. The journal aims at being a reference and a powerful tool to all those professionally active and/or interested in the methods and applications of simulation. Submitted papers will be peer reviewed and must significantly contribute to modelling and simulation in general or use modelling and simulation in application areas. Paper submission is solicited on: • theoretical aspects of modelling and simulation including formal modelling, model-checking, random number generators, sensitivity analysis, variance reduction techniques, experimental design, meta-modelling, methods and algorithms for validation and verification, selection and comparison procedures etc.; • methodology and application of modelling and simulation in any area, including computer systems, networks, real-time and embedded systems, mobile and intelligent agents, manufacturing and transportation systems, management, engineering, biomedical engineering, economics, ecology and environment, education, transaction handling, etc.; • simulation languages and environments including those, specific to distributed computing, grid computing, high performance computers or computer networks, etc.; • distributed and real-time simulation, simulation interoperability; • tools for high performance computing simulation, including dedicated architectures and parallel computing.