Data & Knowledge Engineering最新文献_第9页

A survey on big data classification 大数据分类调查

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2025-01-11 DOI: 10.1016/j.datak.2025.102408

Keerthana G , Sherly Puspha Annabel L

{"title":"A survey on big data classification","authors":"Keerthana G , Sherly Puspha Annabel L","doi":"10.1016/j.datak.2025.102408","DOIUrl":"10.1016/j.datak.2025.102408","url":null,"abstract":"<div><div>Big data refers to vast volumes of structured and unstructured data that are too large or complex for traditional data-processing methods to handle efficiently. The importance of big data lies in its ability to provide actionable insights and drive decision-making across various industries, such as healthcare, finance, marketing, and government, by enabling more accurate predictions, and personalized services. Moreover, traditional big data classification approaches, often struggle with big data's complexity. They failed to manage high-dimensionality, deal with non-linearity, or process data in real time. For effective big data classification, robust computing infrastructure, scalable storage solutions, and advanced algorithms are required. This survey provides a thorough assessment of 50 research papers based on big data classification, by identifying the struggle faced by current big data classification techniques to process and classify data efficiently without substantial computational resources. The analysis is enabled on a variety of scenarios and key points. In this case, this survey will enable the classification of the techniques utilized for big data classification that is made based on the rule-based, deep learning-based, optimization-based, machine learning-based techniques and so on. Furthermore, the classification of techniques, tools used, published year, used software tool, and performance metrics are contemplated for the analysis in big data classification. At last, the research gaps and technical problems of the techniques in a way that makes the motivations for creating an efficient model of enabling big data classification optimal.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102408"},"PeriodicalIF":2.7,"publicationDate":"2025-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Textual data augmentation using generative approaches - Impact on named entity recognition tasks 使用生成方法的文本数据增强。对命名实体识别任务的影响

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2025-01-10 DOI: 10.1016/j.datak.2024.102403

Danrun Cao , Nicolas Béchet , Pierre-François Marteau , Oussama Ahmia

引用次数: 0

Automated mapping between SDG indicators and open data: An LLM-augmented knowledge graph approach 可持续发展目标指标和开放数据之间的自动映射：法学硕士增强的知识图谱方法

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2025-01-03 DOI: 10.1016/j.datak.2024.102405

Wissal Benjira , Faten Atigui , Bénédicte Bucher , Malika Grim-Yefsah , Nicolas Travers

{"title":"Automated mapping between SDG indicators and open data: An LLM-augmented knowledge graph approach","authors":"Wissal Benjira , Faten Atigui , Bénédicte Bucher , Malika Grim-Yefsah , Nicolas Travers","doi":"10.1016/j.datak.2024.102405","DOIUrl":"10.1016/j.datak.2024.102405","url":null,"abstract":"<div><div>Meeting the Sustainable Development Goals (SDGs) presents a large-scale challenge for all countries. SDGs established by the United Nations provide a comprehensive framework for addressing global issues. To monitor progress towards these goals, we need to develop key performance indicators and integrate and analyze heterogeneous datasets. The definition of these indicators requires the use of existing data and metadata. However, the diversity of data sources and formats raises major issues in terms of structuring and integration. Despite the abundance of open data and metadata, its exploitation remains limited, leaving untapped potential for guiding urban policies towards sustainability. Thus, this paper introduces a novel approach for SDG indicator computation, leveraging the capabilities of Large Language Models (LLMs) and Knowledge Graphs (KGs). We propose a method that combines rule-based filtering with LLM-powered schema mapping to establish semantic correspondences between diverse data sources and SDG indicators, including disaggregation. Our approach integrates these mappings into a KG, which enables indicator computation by querying graph’s topology. We evaluate our method through a case study focusing on the SDG Indicator 11.7.1 about accessibility of public open spaces. Our experimental results show significant improvements in accuracy, precision, recall, and F1-score compared to traditional schema mapping techniques.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102405"},"PeriodicalIF":2.7,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Augmenting post-hoc explanations for predictive process monitoring with uncertainty quantification via conformalized Monte Carlo dropout 通过符合化蒙特卡罗dropout对不确定性量化的预测过程监测进行事后解释

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2024-12-28 DOI: 10.1016/j.datak.2024.102402

Nijat Mehdiyev, Maxim Majlatow, Peter Fettke

引用次数: 0

Turning Conceptual Modeling Institutional – The prescriptive role of conceptual models in transforming institutional reality 将概念模型转变为制度模型-概念模型在转变制度现实中的规范性作用

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2024-12-25 DOI: 10.1016/j.datak.2024.102404

Owen Eriksson , Paul Johannesson , Maria Bergholtz , Pär Ågerfalk

{"title":"Turning Conceptual Modeling Institutional – The prescriptive role of conceptual models in transforming institutional reality","authors":"Owen Eriksson , Paul Johannesson , Maria Bergholtz , Pär Ågerfalk","doi":"10.1016/j.datak.2024.102404","DOIUrl":"10.1016/j.datak.2024.102404","url":null,"abstract":"<div><div>It has traditionally been assumed that information systems describe physical reality. However, this assumption is becoming obsolete as digital infrastructures are increasingly part of real-world experiences. Digital infrastructures (ubiquitous and scalable information systems) no longer merely map physical reality representations onto digital objects but increasingly assume an active role in creating, shaping, and governing physical reality. We currently witness an “ontological reversal”, where conceptual models and digital infrastructures change physical reality. Still, the fundamental assumption remains that physical reality is the only real world. However, to fully embrace the implications of the ontological reversal, conceptual modeling needs an “institutional turn” that abandons the idea that physical reality always takes priority. Institutional reality, which includes, for example, institutional entities such as organizations, contracts, and payment transactions, is not simply part of physical reality detached from digital infrastructures. Digital infrastructures are part of institutional reality. Accordingly, the research question we address is: What are the fundamental constructs in the design of digital infrastructures that constitute and transform institutional reality? In answering this question, we develop a foundation for conceptual modeling, which we illustrate by modeling the institution of open banking and its associated digital infrastructure. In the article, we identify digital institutional entities, digital agents regulated by software, and digital institutional actions as critical constructs for modeling digital infrastructures in institutional contexts. In so doing, we show how conceptual modeling can improve our understanding of the digital transformation of institutional reality and the prescriptive role of conceptual modeling. We also generate theoretical insights about the need for legitimacy and liability that advance the study and practice of digital infrastructure design and its consequences.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102404"},"PeriodicalIF":2.7,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing the intelligibility of decision trees with concise and reliable probabilistic explanations 用简洁可靠的概率解释增强决策树的可理解性

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2024-12-21 DOI: 10.1016/j.datak.2024.102394

Louenas Bounia, Insaf Setitra

{"title":"Enhancing the intelligibility of decision trees with concise and reliable probabilistic explanations","authors":"Louenas Bounia, Insaf Setitra","doi":"10.1016/j.datak.2024.102394","DOIUrl":"10.1016/j.datak.2024.102394","url":null,"abstract":"<div><div>This work deals with explainable artificial intelligence (XAI), specifically focusing on improving the intelligibility of decision trees through reliable and concise probabilistic explanations. Decision trees are popular because they are considered highly interpretable. Due to cognitive limitations, abductive explanations can be too large to be interpretable by human users. When this happens, decision trees are far from being easily interpretable. In this context, our goal is to enhance the intelligibility of decision trees by using probabilistic explanations. Drawing inspiration from previous work on approximating probabilistic explanations, we propose a greedy algorithm that enables us to derive concise and reliable probabilistic explanations for decision trees. We provide a detailed description of this algorithm and compare it to the state-of-the-art SAT encoding. In the order to highlight the gains in intelligibility while emphasizing its empirical effectiveness, we will conduct in-depth experiments on binary decision trees as well as on cases of multi-class classification. We expect significant gains in intelligibility. Finally, to demonstrate the usefulness of such an approach in a practical context, we chose to carry out additional experiments focused on text classification, in particular the detection of emotions in tweets. Our objective is to determine the set of words explaining the emotion predicted by the decision tree.</div></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"156 ","pages":"Article 102394"},"PeriodicalIF":2.7,"publicationDate":"2024-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143133640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Coupling MDL and Markov chain Monte Carlo to sample diverse pattern sets 耦合MDL和马尔可夫链蒙特卡罗采样不同的模式集

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2024-12-20 DOI: 10.1016/j.datak.2024.102393

François Camelin , Samir Loudni , Gilles Pesant , Charlotte Truchet

引用次数: 0

HiBenchLLM: Historical Inquiry Benchmarking for Large Language Models HiBenchLLM：大型语言模型的历史查询基准测试

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2024-12-15 DOI: 10.1016/j.datak.2024.102383

Mathieu Chartier , Nabil Dakkoune , Guillaume Bourgeois , Stéphane Jean

引用次数: 0

Clustering of timed sequences – Application to the analysis of care pathways 时间序列的聚类。护理路径分析中的应用

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2024-12-10 DOI: 10.1016/j.datak.2024.102401

Thomas Guyet , Pierre Pinson , Enoal Gesny

引用次数: 0

Semantic vs. LLM-based approach: A case study of KOnPoTe vs. Claude for ontology population from French advertisements 语义vs.基于法学硕士的方法：KOnPoTe vs. Claude对法国广告本体人口的案例研究

IF 2.7 3区计算机科学

Data & Knowledge Engineering Pub Date : 2024-12-04 DOI: 10.1016/j.datak.2024.102392

Aya Sahbi , Céline Alec , Pierre Beust

引用次数: 0