Data & Knowledge Engineering最新文献

筛选
英文 中文
Advancing credit risk assessment in the retail banking industry: A hybrid approach using time series and supervised learning models 推进零售银行业信用风险评估:使用时间序列和监督学习模型的混合方法
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-07-23 DOI: 10.1016/j.datak.2025.102490
Sebastian H. Goldmann, Marcos R. Machado, Joerg R. Osterrieder
{"title":"Advancing credit risk assessment in the retail banking industry: A hybrid approach using time series and supervised learning models","authors":"Sebastian H. Goldmann,&nbsp;Marcos R. Machado,&nbsp;Joerg R. Osterrieder","doi":"10.1016/j.datak.2025.102490","DOIUrl":"10.1016/j.datak.2025.102490","url":null,"abstract":"<div><div>Credit risk assessment remains a central challenge in retail banking, with conventional models often falling short in predictive accuracy and adaptability to granular customer behavior. This study explores the potential of Time Series Classification (TSC) algorithms to enhance credit risk modeling by analyzing customers’ historical end-of-day balance data. We compare traditional Machine Learning (ML) models – including Logistic Regression and XGBoost – with advanced TSC methods such as Shapelets, Long Short-Term Memory (LSTM) networks, and Canonical Interval Forests (CIF). Our results show that TSC algorithms, particularly CIF and Shapelet-based methods, significantly outperform traditional approaches. When using CIF-derived Probability of Default (PD) estimates as additional features in an XGBoost model, predictive performance improved notably: the combined model achieved an Area under the Curve (AUC) of 0.81, compared to 0.79 for CIF alone and 0.77 for XGBoost without the CIF input. These findings underscore the value of integrating temporal features into credit risk assessment frameworks. Moreover, the complementary strengths of the TSC and XGBoost models across different Receiver Operating Characteristic (ROC) curve regions demonstrate the practical benefits of model stacking. However, performance dropped when using aggregated monthly data, highlighting the importance of preserving high-frequency behavioral signals. This research contributes to more accurate, interpretable, and robust credit risk models and offers a pathway for banks to leverage time series data for improved risk forecasting.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102490"},"PeriodicalIF":2.7,"publicationDate":"2025-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144711634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TEDA-driven adaptive stream clustering for concept drift detection 用于概念漂移检测的teda驱动的自适应流聚类
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-07-22 DOI: 10.1016/j.datak.2025.102484
Zahra Rezaei , Hedieh Sajedi
{"title":"TEDA-driven adaptive stream clustering for concept drift detection","authors":"Zahra Rezaei ,&nbsp;Hedieh Sajedi","doi":"10.1016/j.datak.2025.102484","DOIUrl":"10.1016/j.datak.2025.102484","url":null,"abstract":"<div><div>The rapid growth of data-driven applications has underlined the need for strong methods to analyze and cluster streaming data. Data stream clustering is envisioned to uncover interesting knowledge concealed within data streams, typically fast, structure- and pattern-evolving. However, most current methods suffer significant challenges like the inability to detect clusters with arbitrarily shaped, handling outliers, adaptation to concept drift, and reducing dependency on predefined parameters. To tackle these challenges, we propose a novel Typicality and Eccentricity Data Analysis (TEDA)-based concept drift detection stream clustering algorithm, which can divide the clustering problem into two subproblems, micro-clusters and macro-clusters. Our methodology utilizes a TEDA-based concept drift detection approach to enhance data stream clustering. Our method employs two models in monitoring the data stream to keep the information of a previous concept while tracking the emergence of a new concept. The models represent two distinct concepts when the intersection of data samples is significantly low, as described by the Jaccard Index. TEDA-CDD is compared to known methods from the literature in experiments using synthetic and real-world datasets simulating real-world applications. By dynamically updating clusters through model reuse or creation, our algorithm ensures adaptability to real-time changes in data distributions. The proposed algorithm was comprehensively evaluated using the KDDCup-99 dataset, an intrusion detection system benchmark under diverse scenarios, including concept drifts, evolving data distributions, varying cluster sizes, and outlier conditions. Empirical results demonstrated the algorithm’s superiority over baseline approaches such as DenStream, DStream, ClusTree, and DGStream, achieving perfect performance metrics. These findings emphasize the effectiveness of our algorithm in addressing real-world streaming data challenges, combining high sensitivity to concept drift with computational efficiency, adaptability, and robust clustering capabilities.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102484"},"PeriodicalIF":2.7,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144712898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Inference-based schema discovery for RDF data RDF数据的基于推理的模式发现
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-07-19 DOI: 10.1016/j.datak.2025.102491
Redouane Bouhamoum , Zoubida Kedad , Stéphane Lopes
{"title":"Inference-based schema discovery for RDF data","authors":"Redouane Bouhamoum ,&nbsp;Zoubida Kedad ,&nbsp;Stéphane Lopes","doi":"10.1016/j.datak.2025.102491","DOIUrl":"10.1016/j.datak.2025.102491","url":null,"abstract":"<div><div>The Semantic Web represents a huge information space where an increasing number of datasets, described in RDF, are made available to users and applications. In this context, the data is not constrained by a predefined schema. In RDF datasets, the schema may be incomplete or even missing. While this offers high flexibility in creating data sources, it also makes their use difficult. Several works have addressed the problem of automatic schema discovery for RDF datasets, but existing approaches rely only on the explicit information provided by the data source, which may limit the quality of the results. Indeed, in an RDF data source, an entity is described by explicitly declared properties, but also by implicit properties that can be derived using reasoning rules. These implicit properties are not considered by existing schema discovery approaches.</div><div>In this work, we propose a first contribution towards a hybrid schema discovery approach capable of exploiting all the semantics of a data source, which is represented not only by the explicitly declared triples, but also by the ones that can be inferred through reasoning. By considering both explicit and implicit properties, the quality of the generated schema is improved. We provide a scalable design of our approach to enable the processing of large RDF data sources while improving the quality of the results. We present some experiments which demonstrate the efficiency of our proposal and the quality of the discovered schema.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102491"},"PeriodicalIF":2.7,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144670538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting and repairing anomaly patterns in business process event logs 检测和修复业务流程事件日志中的异常模式
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-07-16 DOI: 10.1016/j.datak.2025.102488
Jonghyeon Ko , Marco Comuzzi , Fabrizio Maria Maggi
{"title":"Detecting and repairing anomaly patterns in business process event logs","authors":"Jonghyeon Ko ,&nbsp;Marco Comuzzi ,&nbsp;Fabrizio Maria Maggi","doi":"10.1016/j.datak.2025.102488","DOIUrl":"10.1016/j.datak.2025.102488","url":null,"abstract":"<div><div>Event log anomaly detection and log repairing concern the identification of anomalous traces in an event log and the reconstruction of a correct trace for the anomalous ones, respectively. Trace-level anomalies in event logs often appear according to specific patterns, such events inserted, repeated, or skipped. This paper proposes P-BEAR (Pattern-Based Event Log Anomaly Reconstruction), a semi-supervised pattern-based anomaly detection and log repairing approach that exploits the pattern-based nature of trace-level anomalies in event logs. P-BEAR captures, in a set of ad-hoc graphs, the behaviour of clean traces in a log and uses these to identify anomalous traces, determine the specific anomaly pattern that applies to them, and then reconstruct the correct trace. The proposed approach is evaluated using artificial and real event logs against traditional trace alignment in conformance checking, the edit distance-based alignment method, and an unsupervised method based on deep learning. Overall, the proposed method outperforms the alignment method in anomalous trace reconstruction while providing interpretability with respect to anomaly pattern classification. P-BEAR is also quicker to execute, and its performance is more balanced across different types of anomaly patterns.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102488"},"PeriodicalIF":2.7,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144656577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Behavior Driven Development for 3D games 3D游戏的行为驱动开发
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-07-09 DOI: 10.1016/j.datak.2025.102486
Fernando Pastor Ricós , Beatriz Marín , I.S.W.B. Prasetya , Tanja E.J. Vos , Joseph Davidson , Karel Hovorka
{"title":"Behavior Driven Development for 3D games","authors":"Fernando Pastor Ricós ,&nbsp;Beatriz Marín ,&nbsp;I.S.W.B. Prasetya ,&nbsp;Tanja E.J. Vos ,&nbsp;Joseph Davidson ,&nbsp;Karel Hovorka","doi":"10.1016/j.datak.2025.102486","DOIUrl":"10.1016/j.datak.2025.102486","url":null,"abstract":"<div><div>Computer 3D games are complex software environments that require novel testing processes to ensure high-quality standards. The Intelligent Verification/Validation for Extended Reality Based Systems (<span>iv4XR</span>) framework addresses this need by enabling the implementation of autonomous agents to automate game testing scenarios. This framework facilitates the automation of regression test cases for complex 3D games like Space Engineers. Nevertheless, the technical expertise required to define test scripts using <span>iv4XR</span> can constrain seamless collaboration between developers and testers. This paper reports how integrating a Behavior-Driven Development (BDD) approach with the <span>iv4XR</span> framework allows the industrial company behind Space Engineers to automate regression testing. The success of this industrial collaboration has inspired the <span>iv4XR</span> team to integrate the BDD approach to improve the automation of play-testing for the experimental 3D game LabRecruits. Furthermore, the <span>iv4XR</span> framework has been extended with tactical programming to enable the automation of long-play test scenarios in Space Engineers. These results underscore the versatility of the <span>iv4XR</span> framework in supporting diverse testing approaches while showcasing how BDD empowers users to create, manage, and execute automated game tests using comprehensive and human-readable statements.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102486"},"PeriodicalIF":2.7,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144588942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conceptual modeling: Foundations, a historical perspective, and a vision for the future 概念建模:基础、历史视角和对未来的展望
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-07-07 DOI: 10.1016/j.datak.2025.102483
John Mylopoulos , Giancarlo Guizzardi , Nicola Guarino
{"title":"Conceptual modeling: Foundations, a historical perspective, and a vision for the future","authors":"John Mylopoulos ,&nbsp;Giancarlo Guizzardi ,&nbsp;Nicola Guarino","doi":"10.1016/j.datak.2025.102483","DOIUrl":"10.1016/j.datak.2025.102483","url":null,"abstract":"<div><div>We recount the foundations of Conceptual Modeling in Computer Science, Philosophy and Cognitive Science and their implications on what are concepts, conceptualizations, and conceptual models. We then review the history of the field, considering earlier work by the three co-authors, and highlight some of the contributions that made it what it is. Finally, we propose three research directions whose solutions could advance the field and will hopefully be addressed in the future. Our study is intended to help to circumscribe and characterize the field. It draws ideas from Philosophy, Cognitive Science, Engineering and the Social Sciences, as well as several areas within Computer Science, including Programming languages, Artificial Intelligence, Databases, Software Engineering, and Information Systems Engineering.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102483"},"PeriodicalIF":2.7,"publicationDate":"2025-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144631123","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IDL-BiGRU: Integrated deep learning assisted smart scheduling of big data over cloud environment IDL-BiGRU:集成深度学习辅助云环境下大数据智能调度
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-07-06 DOI: 10.1016/j.datak.2025.102489
Rama Satish K V , Vibha M B , Lovely Sasidharan
{"title":"IDL-BiGRU: Integrated deep learning assisted smart scheduling of big data over cloud environment","authors":"Rama Satish K V ,&nbsp;Vibha M B ,&nbsp;Lovely Sasidharan","doi":"10.1016/j.datak.2025.102489","DOIUrl":"10.1016/j.datak.2025.102489","url":null,"abstract":"<div><div>The rapid expansion of Internet of Things (IoT) applications generates a continuous and massive flow of data, creating significant challenges in both data processing and storage management. Cloud computing offers scalable infrastructure to handle such data intensive workloads, but optimal task scheduling remains critical to ensure performance and resource efficiency. Traditional scheduling algorithms often fall short due to limited adaptability and consideration of only a few system parameters. In this paper, a novel integrated deep learning-assisted scheduling framework is utilized for scheduling big data over a cloud environment. The proposed framework integrated deep reinforcement learning with the bidirectional gated recurrent unit (IDL-BiGRU) model to intelligently schedule tasks based on real-time system states. The IDL-BiGRU model leverages the advantage of deep Q-learning for decision making and BiGRU's ability to capture bidirectional temporal dependencies in task and resource usage patterns. In this work, RAM, CPU, bandwidth utilization of the network, and disk storage are considered for scheduling purposes. The suggested method is to shorten the makespan and increase resource utilization. The Java tool is utilized for conducting the experimental verifications. Analysis and comparison of the suggested deep learning framework's performance with current methods are done. For 1000 tasks, the proposed method attains 0.90 degrees of imbalance, 291.17 ms downtime, 1050 ms throughput, and 721.58 makespan. The performance analysis demonstrates that the suggested strategy outperforms previous methods.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102489"},"PeriodicalIF":2.7,"publicationDate":"2025-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144711633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dirigo: A method to extract event logs for object-centric processes Dirigo:为以对象为中心的进程提取事件日志的方法
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-07-05 DOI: 10.1016/j.datak.2025.102485
Jia Wei , Chun Ouyang , Ying Wang , Lei Huang
{"title":"Dirigo: A method to extract event logs for object-centric processes","authors":"Jia Wei ,&nbsp;Chun Ouyang ,&nbsp;Ying Wang ,&nbsp;Lei Huang","doi":"10.1016/j.datak.2025.102485","DOIUrl":"10.1016/j.datak.2025.102485","url":null,"abstract":"<div><div>Real-world processes involve multiple object types with intricate interrelationships. Traditional event logs (in XES format), which record process execution centred around the case notion, are restricted to a single-object perspective, making it difficult to capture the behaviour of multiple objects and their interactions. To address this limitation, object-centric event logs (OCEL) have been introduced to capture both the objects involved in a process and their interactions with events. The object-centric event data (OCED) metamodel extends the OCEL format by further capturing dynamic object attributes and object-to-object relations. Recently OCEL 2.0 has been proposed based on OCED metamodel. Current research on generating OCEL logs requires specific input data sources, and resulting log data often fails to fully conform to OCEL 2.0. Moreover, the generated OCEL logs vary across different representational formats and their quality remains unevaluated. To address these challenges, a set of quality criteria for evaluating OCEL log representations is established. Guided by these criteria, <em>Dirigo</em> is proposed—a method for extracting event logs that not only conforms to OCEL 2.0 but also extends it by capturing the temporal aspect of dynamic object-to-object relations. Object-role Modelling (ORM), a conceptual data modelling technique, is employed to describe the artifact produced at each step of <em>Dirigo</em>. To validate the applicability of <em>Dirigo</em>, it is applied to a real-life use case. The quality of the log representation of the extracted event log is compared to those of existing OCEL logs using the established quality criteria.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102485"},"PeriodicalIF":2.7,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144614765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PSA-GAT: Integrating position-syntax and cross-aspect graph attention networks for aspect-based sentiment analysis PSA-GAT:整合位置语法和跨方面图注意网络,用于基于方面的情感分析
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-06-26 DOI: 10.1016/j.datak.2025.102477
Ning Zhou, Linfu Sun, Min Han, Songlin He
{"title":"PSA-GAT: Integrating position-syntax and cross-aspect graph attention networks for aspect-based sentiment analysis","authors":"Ning Zhou,&nbsp;Linfu Sun,&nbsp;Min Han,&nbsp;Songlin He","doi":"10.1016/j.datak.2025.102477","DOIUrl":"10.1016/j.datak.2025.102477","url":null,"abstract":"<div><div>Aspect-based sentiment analysis (ABSA) is widely applied in analyzing user review data on web platforms to identify sentiment polarity toward specific aspects of web reviews. However, individual reviews often contain multiple conditions and coordinating and conflicting elements or relationships, which significantly increases the complexity of this task. In recent years, exploiting semantic–syntactic information with graph neural networks has been widely used to address such tasks. However, such methods overlook the features of the location influence factor of words and may provide irrelevant or even interfering noisy signals for ABSA because of the word association relationships mined by the syntax tree and semantic composition tree. To alleviate the effect of noise information and fully strengthen the context for multiple-aspect representation in ABSA, we propose a new framework, PSA-GAT, that mines information on position importance, syntactic–semantic dependencies and cross-aspect correlations. Overall, the structural features of the multi-aspect sentiment set are learned by using various variations of graph neural networks. Moreover, the experimental results on four real-world datasets demonstrate the effectiveness of PSA-GAT compared to state-of-the-art methods. The code is available at <span><span>https://github.com/zhouning6000/PSA_GAT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102477"},"PeriodicalIF":2.7,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144534819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain knowledge in artificial intelligence: Using conceptual modeling to increase machine learning accuracy and explainability 人工智能领域知识:使用概念建模来提高机器学习的准确性和可解释性
IF 2.7 3区 计算机科学
Data & Knowledge Engineering Pub Date : 2025-06-23 DOI: 10.1016/j.datak.2025.102482
Veda C. Storey , Jeffrey Parsons , Arturo Castellanos Bueso , Monica Chiarini Tremblay , Roman Lukyanenko , Alfred Castillo , Wolfgang Maaß
{"title":"Domain knowledge in artificial intelligence: Using conceptual modeling to increase machine learning accuracy and explainability","authors":"Veda C. Storey ,&nbsp;Jeffrey Parsons ,&nbsp;Arturo Castellanos Bueso ,&nbsp;Monica Chiarini Tremblay ,&nbsp;Roman Lukyanenko ,&nbsp;Alfred Castillo ,&nbsp;Wolfgang Maaß","doi":"10.1016/j.datak.2025.102482","DOIUrl":"10.1016/j.datak.2025.102482","url":null,"abstract":"<div><div>Machine learning enables the extraction of useful information from large, diverse datasets. However, despite many successful applications, machine learning continues to suffer from performance and transparency issues. These challenges can be partially attributed to the limited use of domain knowledge by machine learning models. This research proposes using the domain knowledge represented in conceptual models to improve the preparation of the data used to train machine learning models. We develop and demonstrate a method, called the <em>Conceptual Modeling for Machine Learning (CMML)</em>, which is comprised of guidelines for data preparation in machine learning and based on conceptual modeling constructs and principles. To assess the impact of CMML on machine learning outcomes, we first applied it to two real-world problems to evaluate its impact on model performance. We then solicited an assessment by data scientists on the applicability of the method. These results demonstrate the value of CMML for improving machine learning outcomes.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"160 ","pages":"Article 102482"},"PeriodicalIF":2.7,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144534882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信