{"title":"Managerial risk data analytics applications using grey influence analysis (GINA)","authors":"R. Rajesh","doi":"10.1016/j.datak.2024.102312","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102312","url":null,"abstract":"<div><p>We observe and analyze the causal relations among risk factors in a system, considering the manufacturing supply chains. Seven major categories of risks were identified and scrutinized and the detailed analysis of causal relations using the grey influence analysis (GINA) methodology is outlined. With expert response based survey, we conduct an initial analysis of the risks using risk matrix analysis (RMA) and the risks under high priority are identified. Later, the GINA is implemented to understand the causal relations among various categories of risks, which is particularly useful in group decision-making environments. The results from RMA concludes that the <em>capacity risks (CR)</em> and <em>delays (DL)</em> are in the category of very high priority risks. GINA results also ratify the conclusions from RMA and observes that managers need to control and manage <em>capacity risks (CR)</em> and <em>delays (DL)</em> with high priorities. Additionally from the results of GINA, the causal factors <em>disruptions (DS)</em> and <em>forecast risks (FR)</em> appear to be primary importance and if unattended can lead to the initiation of several other risks in supply chains. Managers are recommended to identify disruptions at an early stage in supply chains and reduce the forecast errors to avoid bullwhips in supply chains.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102312"},"PeriodicalIF":2.5,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140879377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A graph based named entity disambiguation using clique partitioning and semantic relatedness","authors":"Ramla Belalta , Mouhoub Belazzoug , Farid Meziane","doi":"10.1016/j.datak.2024.102308","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102308","url":null,"abstract":"<div><p>Disambiguating name mentions in texts is a crucial task in Natural Language Processing, especially in entity linking. The credibility and efficiency of such systems depend largely on this task. For a given name entity mention in a text, there are many potential candidate entities that may refer to it in the knowledge base. Therefore, it is very difficult to assign the correct candidate from the whole set of candidate entities of this mention. To solve this problem, collective entity disambiguation is a prominent approach. In this paper, we present a novel algorithm called CPSR for collective entity disambiguation, which is based on a graph approach and semantic relatedness. A clique partitioning algorithm is used to find the best clique that contains a set of candidate entities. These candidate entities provide the answers to the corresponding mentions in the disambiguation process. To evaluate our algorithm, we carried out a series of experiments on seven well-known datasets, namely, AIDA/CoNLL2003-TestB, IITB, MSNBC, AQUAINT, ACE2004, Cweb, and Wiki. The Kensho Derived Wikimedia Dataset (KDWD) is used as the knowledge base for our system. From the experimental results, our CPSR algorithm outperforms both the baselines and other well-known state-of-the-art approaches.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"152 ","pages":"Article 102308"},"PeriodicalIF":2.5,"publicationDate":"2024-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140901817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Gerald , Louis Tamames , Sofiane Ettayeb , Ha-Quang Le , Patrick Paroubek , Anne Vilnat
{"title":"CQuAE: A new Contextualized QUestion Answering corpus on Education domain","authors":"Thomas Gerald , Louis Tamames , Sofiane Ettayeb , Ha-Quang Le , Patrick Paroubek , Anne Vilnat","doi":"10.1016/j.datak.2024.102305","DOIUrl":"10.1016/j.datak.2024.102305","url":null,"abstract":"<div><p>Generating education-related questions and answers remains an open issue while being useful for students, teachers, and teaching aids. Given textual course material, we are interested in generating non-factual questions that require an elaborate answer (relying on analysis or reasoning). Despite the availability of annotated corpora of questions and answers, the effort to develop a generator using deep learning faces two main challenges. Firstly, freely accessible and qualitative data are insufficient to train generative approaches. Secondly, for a stand-alone application, we do not have explicit support to guide the generation toward complex questions. To tackle the first issue, we propose a new corpus based on education documents. For the second point, we propose to study several retargetable language algorithms to produce answers by extracting text spans from contextual documents to help the generation of questions. We particularly study the contribution of deep neural syntactic parsing and transformer-based semantic representation, taking into account the question type (according to our specific question typology) and the contextual support text span. Additionally, recent advances in generation models have proven the efficiency of the instruction-based approach for natural language generation. Consequently, we propose a first investigation of very large language models to generate questions related to the education domain.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102305"},"PeriodicalIF":2.5,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140768347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tim Kreuzer, Panagiotis Papapetrou, Jelena Zdravkovic
{"title":"Artificial intelligence in digital twins—A systematic literature review","authors":"Tim Kreuzer, Panagiotis Papapetrou, Jelena Zdravkovic","doi":"10.1016/j.datak.2024.102304","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102304","url":null,"abstract":"<div><p>Artificial intelligence and digital twins have become more popular in recent years and have seen usage across different application domains for various scenarios. This study reviews the literature at the intersection of the two fields, where digital twins integrate an artificial intelligence component. We follow a systematic literature review approach, analyzing a total of 149 related studies. In the assessed literature, a variety of problems are approached with an artificial intelligence-integrated digital twin, demonstrating its applicability across different fields. Our findings indicate that there is a lack of in-depth modeling approaches regarding the digital twin, while many articles focus on the implementation and testing of the artificial intelligence component. The majority of publications do not demonstrate a virtual-to-physical connection between the digital twin and the real-world system. Further, only a small portion of studies base their digital twin on real-time data from a physical system, implementing a physical-to-virtual connection.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102304"},"PeriodicalIF":2.5,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000284/pdfft?md5=7bf249b030dadbb8c82308b54aef035d&pid=1-s2.0-S0169023X24000284-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140549919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging an Isolation Forest to Anomaly Detection and Data Clustering","authors":"Véronne Yepmo , Grégory Smits , Marie-Jeanne Lesot , Olivier Pivert","doi":"10.1016/j.datak.2024.102302","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102302","url":null,"abstract":"<div><p>Understanding why some points in a data set are considered as anomalies cannot be done without taking into account the structure of the regular points. Whereas many machine learning methods are dedicated to the identification of anomalies on one side, or to the identification of the data inner-structure on the other side, a solution is introduced to answers these two tasks using a same data model, a variant of an isolation forest. The initial algorithm to construct an isolation forest is indeed revisited to preserve the data inner structure without affecting the efficiency of the outlier detection. Experiments conducted both on synthetic and real-world data sets show that, in addition to improving the detection of abnormal data points, the proposed variant of isolation forest allows for a reconstruction of the subspaces of high density. Therefore, the former can serve as a basis for a unified approach to detect global and local anomalies, which is a necessary condition to then provide users with informative descriptions of the data.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102302"},"PeriodicalIF":2.5,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140345076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Johannes Lohmöller , Jan Pennekamp , Roman Matzutt , Carolin Victoria Schneider , Eduard Vlad , Christian Trautwein , Klaus Wehrle
{"title":"The unresolved need for dependable guarantees on security, sovereignty, and trust in data ecosystems","authors":"Johannes Lohmöller , Jan Pennekamp , Roman Matzutt , Carolin Victoria Schneider , Eduard Vlad , Christian Trautwein , Klaus Wehrle","doi":"10.1016/j.datak.2024.102301","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102301","url":null,"abstract":"<div><p>Data ecosystems emerged as a new paradigm to facilitate the automated and massive exchange of data from heterogeneous information sources between different stakeholders. However, the corresponding benefits come with unforeseen risks as sensitive information is potentially exposed, questioning data ecosystem reliability. Consequently, data security is of utmost importance and, thus, a central requirement for successfully realizing data ecosystems. Academia has recognized this requirement, and current initiatives foster sovereign participation via a federated infrastructure where participants retain local control over what data they offer to whom. However, recent proposals place significant trust in remote infrastructure by implementing organizational security measures such as certification processes before the admission of a participant. At the same time, the data sensitivity incentivizes participants to bypass the organizational security measures to maximize their benefit. This issue significantly weakens security, sovereignty, and trust guarantees and highlights that organizational security measures are insufficient in this context. In this paper, we argue that data ecosystems must be extended with technical means to (re)establish dependable guarantees. We underpin this need with three representative use cases for data ecosystems, which cover personal, economic, and governmental data, and systematically map the lack of dependable guarantees in related work. To this end, we identify three enablers of dependable guarantees, namely trusted remote policy enforcement, verifiable data tracking, and integration of resource-constrained participants. These enablers are critical for securely implementing data ecosystems in data-sensitive contexts.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102301"},"PeriodicalIF":2.5,"publicationDate":"2024-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000259/pdfft?md5=5d1fb135737fcc7ddf73713a94b46ce0&pid=1-s2.0-S0169023X24000259-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140192029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Insights into commonalities of a sample: A visualization framework to explore unusual subset-dataset relationships","authors":"Nikolas Stege , Michael H. Breitner","doi":"10.1016/j.datak.2024.102299","DOIUrl":"10.1016/j.datak.2024.102299","url":null,"abstract":"<div><p>Domain experts are driven by business needs, while data analysts develop and use various algorithms, methods, and tools, but often without domain knowledge. A major challenge for companies and organizations is to integrate data analytics in business processes and workflows. We deduce an interactive process and visualization framework to enable value creating collaboration in inter- and cross-disciplinary teams. Domain experts and data analysts are both empowered to analyze and discuss results and come to well-founded insights and implications. Inspired by a typical auditing problem, we develop and apply a visualization framework to single out unusual data in general subsets for potential further investigation. Our framework is applicable to both unusual data detected manually by domain experts or by algorithms applied by data analysts. Application examples show typical interaction, collaboration, visualization, and decision support.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102299"},"PeriodicalIF":2.5,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000235/pdfft?md5=5865a6d1aaccbc08965569d170abf88f&pid=1-s2.0-S0169023X24000235-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140151811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Jia , Ruizhe Ma , Li Yan , Weinan Niu , Zongmin Ma
{"title":"Time-aware structure matching for temporal knowledge graph alignment","authors":"Wei Jia , Ruizhe Ma , Li Yan , Weinan Niu , Zongmin Ma","doi":"10.1016/j.datak.2024.102300","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102300","url":null,"abstract":"<div><p>Entity alignment, aiming at identifying equivalent entity pairs across multiple knowledge graphs (KGs), serves as a vital step for knowledge fusion. As the majority of KGs undergo continuous evolution, existing solutions utilize graph neural networks (GNNs) to tackle entity alignment within temporal knowledge graphs (TKGs). However, this prevailing method often overlooks the consequential impact of relation embedding generation on entity embeddings through inherent structures. In this paper, we propose a novel model named Time-aware Structure Matching based on GNNs (TSM-GNN) that encompasses the learning of both topological and inherent structures. Our key innovation lies in a unique method for generating relation embeddings, which can enhance entity embeddings via inherent structure. Specifically, we utilize the translation property of knowledge graphs to obtain the entity embedding that is mapped into a time-aware vector space. Subsequently, we employ GNNs to learn global entity representation. To better capture the useful information from neighboring relations and entities, we introduce a time-aware attention mechanism that assigns different importance weights to different time-aware inherent structures. Experimental results on three real-world datasets demonstrate that TSM-GNN outperforms several state-of-the-art approaches for entity alignment between TKGs.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102300"},"PeriodicalIF":2.5,"publicationDate":"2024-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140138228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcos Da Silveira, Louis Deladiennee, Emmanuel Scolan, Cedric Pruski
{"title":"A knowledge-sharing platform for space resources","authors":"Marcos Da Silveira, Louis Deladiennee, Emmanuel Scolan, Cedric Pruski","doi":"10.1016/j.datak.2024.102286","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102286","url":null,"abstract":"<div><p>The ever-increasing interest of academia, industry, and government institutions in space resource information highlights the difficulty of finding, accessing, integrating, and reusing this information. Although information is regularly published on the internet, it is disseminated on many different websites and in different formats, including scientific publications, patents, news, and reports. We are currently developing a knowledge management and sharing platform for space resources. This tool, which relies on the combined use of knowledge graphs and ontologies, formalises the domain knowledge contained in the above-mentioned documents and makes it more readily available to the community. In this article, we describe the concepts and techniques of knowledge extraction and management adopted during the design and implementation of the platform.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102286"},"PeriodicalIF":2.5,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140042746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Franck Anaël Mbiaya , Christel Vrain , Frédéric Ros , Thi-Bich-Hanh Dao , Yves Lucas
{"title":"Knowledge graph-based image classification","authors":"Franck Anaël Mbiaya , Christel Vrain , Frédéric Ros , Thi-Bich-Hanh Dao , Yves Lucas","doi":"10.1016/j.datak.2024.102285","DOIUrl":"https://doi.org/10.1016/j.datak.2024.102285","url":null,"abstract":"<div><p>This paper introduces a deep learning method for image classification that leverages knowledge formalized as a graph created from information represented by pairs attribute/value. The proposed method investigates a loss function that adaptively combines the classical cross-entropy commonly used in deep learning with a novel penalty function. The novel loss function is derived from the representation of nodes after embedding the knowledge graph and incorporates the proximity between class and image nodes. Its formulation enables the model to focus on identifying the boundary between the most challenging classes to distinguish. Experimental results on several image databases demonstrate improved performance compared to state-of-the-art methods, including classical deep learning algorithms and recent algorithms that incorporate knowledge represented by a graph.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102285"},"PeriodicalIF":2.5,"publicationDate":"2024-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0169023X24000090/pdfft?md5=197a1155c2e53ecde4dd061f7a501a91&pid=1-s2.0-S0169023X24000090-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140113524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}