{"title":"Effective healthcare service recommendation with network representation learning: A recursive neural network approach","authors":"Mouhamed Gaith Ayadi , Haithem Mezni , Rana Alnashwan , Hela Elmannai","doi":"10.1016/j.datak.2023.102233","DOIUrl":"https://doi.org/10.1016/j.datak.2023.102233","url":null,"abstract":"<div><p>Recently, recommender systems have been combined with healthcare systems to recommend needed healthcare items for both patients and medical staff. By monitoring the patients’ states, healthcare services and their consumed smart medical objects can be recommended to a medical team according to the patient’s critical situation and requirements. However, a common drawback of the few existing solutions lies in the limited modeling of the healthcare information network. In addition, current solutions do not consider the typed nature of healthcare items. Moreover, existing healthcare recommender systems lack flexibility, and none of them offers re-configurable healthcare workflows to medical staff. In this paper, we take advantage of collaborative filtering and representation learning principles, by proposing a method for the recommendation of healthcare services. These latter follow a predefined execution pattern, i.e. treatment/medication workflow, that is determined by our framework depending on the patient’s state. To achieve this goal, we model the healthcare information network as a <em>knowledge graph</em>. This latter, based on an <em>incremental learning</em> method, is then transformed into a cuboid space to facilitate its processing. That is by learning latent representations of its content (e.g., smart objects, healthcare services, patients symptoms, etc.). Finally, a <em>collaborative recommendation</em> method is defined to select the high-quality healthcare services that will be composed and executed according to a determined workflow model. Experimental results have proven the efficiency of our solution in terms of recommended services’ quality.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"148 ","pages":"Article 102233"},"PeriodicalIF":2.5,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49749644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel hybrid approach for text encoding: Cognitive Attention To Syntax model to detect online misinformation","authors":"Géraud Faye , Wassila Ouerdane , Guillaume Gadek , Souhir Gahbiche , Sylvain Gatepaille","doi":"10.1016/j.datak.2023.102230","DOIUrl":"https://doi.org/10.1016/j.datak.2023.102230","url":null,"abstract":"<div><p>Most approaches for text encoding rely on the attention mechanism, at the core of the transformers architecture and large language models. The understanding of this mechanism is still limited and present inconvenients such as lack of interpretability, large requirements of data and low generalization. Based on current understanding of the attention mechanism, we propose CATS (Cognitive Attention To Syntax), a neurosymbolic attention encoding approach based on the syntactic understanding of texts. This approach has on-par to better performance compared to classical attention and displays expected advantages of neurosymbolic AI such as better functioning with little data and better explainability. This layer has been tested on the task of misinformation detection but is general and could be used in any task involving natural language processing.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"148 ","pages":"Article 102230"},"PeriodicalIF":2.5,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49749647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A novel ensemble framework driven by diversity and cooperativity for non-stationary data stream classification","authors":"Kuangyan Zhang, Tuyi Zhang, Sanmin Liu","doi":"10.1016/j.datak.2023.102232","DOIUrl":"https://doi.org/10.1016/j.datak.2023.102232","url":null,"abstract":"<div><p>Data stream classification is of great significance to numerous real-world scenarios. Nevertheless, the prevalent data stream classification techniques are influenced by concept drift and demonstrate unreliability in non-stationary environments. Ensemble models are typically successful when they increase diversity among their members. Several ensembles that enhance diversity have been proposed in literatures. Regrettably, there is no established method to verify that cooperativity indeed improves performance. In response to this knowledge gap, we have developed an innovative ensemble learning framework driven by diversity and cooperativity, termed EDDC, to address the issue. EDDC first dynamically maintains multiple groups of classifiers, with primary classifier in each group chosen to enhance diversity. Next, cooperativity is employed to update groups and replace outdated members. Finally, when environment changes, EDDC adaptively selects either diversity or cooperativity as the strategy for predicting labeling of new instances, while also establishing an excellent performance guarantee. Through simulation experiments, we assessed the performance of EDDC and the benefits of cooperativity for enhancing prediction. The results demonstrated that EDDC is efficient and robust in most scenarios, particularly when dealing with gradual drift. Furthermore, EDDC maintains a competitive edge in terms of classification accuracy and other metrics.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"148 ","pages":"Article 102232"},"PeriodicalIF":2.5,"publicationDate":"2023-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49749646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robotic process automation using process mining — A systematic literature review","authors":"Najah Mary El-Gharib, Daniel Amyot","doi":"10.1016/j.datak.2023.102229","DOIUrl":"https://doi.org/10.1016/j.datak.2023.102229","url":null,"abstract":"<div><p>Process mining (PM) aims to construct, from event logs, process maps that can help discover, automate, improve, and monitor organizational processes. Robotic process automation (RPA) uses software robots to perform some tasks usually executed by humans. It is usually difficult to determine what processes and steps to automate, especially with RPA. PM is seen as one way to address such difficulty. This paper aims to assess the applicability of process mining in accelerating and improving the implementation of RPA, along with the challenges encountered throughout project lifecycle.</p><p>A systematic literature review was conducted to examine the approaches where PM techniques were used to understand the as-is processes that can be automated with software robots. Seven databases were used to identify papers on this topic. A total of 32 papers, all published since 2018, were selected from 605 unique candidate papers and then analyzed.</p><p>There is a steady increase in the number of publications in this domain, especially during the year 2022, which suggests a raising interest in the combined use of PM with RPA. The literature mainly focuses on the methods to record the events that occur at the level of user interactions with the application, and on the preprocessing methods that are needed to discover routines with the steps that can be automated. Important challenges are faced with preprocessing such event logs, and many lifecycle steps of automation projects are weakly supported by existing approaches suggesting corresponding research areas in need of further attention.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"148 ","pages":"Article 102229"},"PeriodicalIF":2.5,"publicationDate":"2023-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49749488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LBP feature and hash function based dual watermarking algorithm for database","authors":"De Li, Chi Ma, Haoyang Gao, Xun Jin","doi":"10.1016/j.datak.2023.102228","DOIUrl":"https://doi.org/10.1016/j.datak.2023.102228","url":null,"abstract":"<div><p>In this paper, we propose a local binary pattern (LBP) feature and hash function based dual watermarking algorithm for database. Attribute feature columns are selected to generate zero watermarks using the Pearson correlation method. The zero watermarks are generated by the LBP. The attribute values of the selected feature columns are divided into two parts for embedding and extracting the watermark. Zero watermark feature code is embedded into the lowest significant bit of the selected attribute column by using the database watermarking algorithm with hash function for copyright authentication. The proposed method uses two-layer hashing method to improve the robustness. In the watermark extraction, a method of combining majority voting and Hamming error correction code is used to control the watermark error to ensure the correct extraction rate of the watermark. Experimental results show that the algorithm not only provides good availability of the database after embedding the watermark, but also ensures the correct extraction of the watermark after various attacks.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"148 ","pages":"Article 102228"},"PeriodicalIF":2.5,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49749919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuming Li, Johnny Chan, Gabrielle Peko, David Sundaram
{"title":"Mixed emotion extraction analysis and visualisation of social media text","authors":"Yuming Li, Johnny Chan, Gabrielle Peko, David Sundaram","doi":"10.1016/j.datak.2023.102220","DOIUrl":"https://doi.org/10.1016/j.datak.2023.102220","url":null,"abstract":"<div><p>With the widespread use of social media and accelerated development of artificial intelligence, sentiment analysis is regarded as an important way to help enterprises understand user needs and conduct brand monitoring. It can also assist businesses in making data-driven decisions about product development, marketing strategies, and customer service. However, as social media information continues to grow exponentially, and industry demands increase, sentiment analysis should no longer be limited to fundamental polarity classification of positive, neutral, and negative. Instead, it should move to more precise classification of emotions. Therefore, in this paper, we expand sentiment analysis to analysis of eight different emotions based on Plutchik's wheel of emotions, and define it as a multi-label classification task to identify complex and mixed emotions in text. We achieved an overall precision of 0.7958 for the eight emotions multi-label classification based on the attention-based bidirectional long short-term memory with convolution layer (AC-BiLSTM) model on the SemEval-2018 dataset. In addition, we proposed the introduction of the NRC emotion lexicon and emotion correlation constraints to optimise the emotion classification results. This ultimately increased the overall precision to 0.8228 demonstrating the effectiveness of our approach. Finally, we store and visualise the emotion analysis results in a graph structure, in order to achieve deductibility and traceability of emotions.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"148 ","pages":"Article 102220"},"PeriodicalIF":2.5,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49759198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cristine Griffo , João Paulo A. Almeida , João A.O. Lima , Tiago Prince Sales , Giancarlo Guizzardi
{"title":"Legal powers, subjections, disabilities, and immunities: Ontological analysis and modeling patterns","authors":"Cristine Griffo , João Paulo A. Almeida , João A.O. Lima , Tiago Prince Sales , Giancarlo Guizzardi","doi":"10.1016/j.datak.2023.102219","DOIUrl":"https://doi.org/10.1016/j.datak.2023.102219","url":null,"abstract":"<div><p>The development of dependable information systems in legal contexts requires a precise understanding of the subtleties of the underlying legal phenomena. According to a modern understanding in the philosophy of law, much of these phenomena are relational in nature. In this paper, we employ a theoretically well-grounded legal core ontology (UFO-L) to conduct an ontological analysis focused on fundamental legal relations, namely, the power–subjection and the disability–immunity relations. We show that in certain cases, power–subjection relations are primitive in the sense that by means of institutional acts other legal relations can be generated from them. Examples include relations of rights and duties, permissions and no-rights, liberties, secondary power–subjection, etc. We further show that legal disabilities (and their correlative immunities) are key in constraining the reach of legal powers; together with powers, they form a comprehensive framework for representing the grounds of valid legal acts and to account for the life-cycle of the legal positions that powers create, alter, and possibly extinguish. As a contribution to the practice of conceptual modeling, and leveraging the result of our analysis, we propose conceptual modeling patterns for legal relations, which are then applied to model a real-world case in tax law.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"148 ","pages":"Article 102219"},"PeriodicalIF":2.5,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49749917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erhe Yang , Fei Hao , Jiaxing Shang , Xiaoliang Chen , Doo-Soon Park
{"title":"BT-CKBQA: An efficient approach for Chinese knowledge base question answering","authors":"Erhe Yang , Fei Hao , Jiaxing Shang , Xiaoliang Chen , Doo-Soon Park","doi":"10.1016/j.datak.2023.102204","DOIUrl":"https://doi.org/10.1016/j.datak.2023.102204","url":null,"abstract":"<div><p>Knowledge Base Question Answering (KBQA), as an increasingly essential application, can provide accurate responses to user queries. ensuring that users obtain relevant information and make decisions promptly. The deep learning-based approaches have achieved satisfactory QA results by leveraging the neural network models. However, these approaches require numerous parameters, which increases the workload of tuning model parameters. To address this problem, we propose BT-CKBQA, a practical and highly efficient approach incorporating <u><strong>B</strong></u>M25 and <u><strong>T</strong></u>emplate-based predicate mapping for <u><strong>CKBQA</strong></u>. Besides, a concept lattice based approach is proposed for summarizing the knowledge base, which can largely improve the execution efficiency of QA with little loss of performance. Concretely, BT-CKBQA leverages the BM25 algorithm and custom dictionary to detect the subject of a question sentence. A template-based predicate generation approach is then proposed to generate candidate predicates. Finally, a ranking approach is provided with the joint consideration of character similarity and semantic similarity for predicate mapping. Extensive experiments are conducted over the NLPCC-ICCPOL 2016 and 2018 KBQA datasets, and the experimental results demonstrate the superiority of the proposed approach over the compared baselines. Particularly, the averaged F1-score result of BT-CKBQA for mention detection is up to 98.25%, which outperforms the best method currently available in the literature. For question answering, the proposed approach achieves superior results than most baselines with the F1-score value of 82.68%. Compared to state-of-the-art baselines, the execution efficiency and performance of QA per unit time can be improved with up to 56.39% and 44.06% gains, respectively. The experimental results for the diversification of questions indicate that the proposed approach performs better for diversified questions than domain-specific questions. The case study over a constructed COVID-19 knowledge base illustrates the effectiveness and practicability of BT-CKBQA.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"147 ","pages":"Article 102204"},"PeriodicalIF":2.5,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49752824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An empiric validation of linguistic features in machine learning models for fake news detection","authors":"Eduardo Puraivan , René Venegas , Fabián Riquelme","doi":"10.1016/j.datak.2023.102207","DOIUrl":"https://doi.org/10.1016/j.datak.2023.102207","url":null,"abstract":"<div><p>The diffusion of fake news is a growing problem with a high and negative social impact. There are several approaches to address the detection of fake news. This work focuses on a hybrid approach based on functional linguistic features and machine learning. There are several recent works with this approach. However, there are no clear guidelines on which linguistic features are most appropriate nor how to justify their use. Furthermore, many classification results are modest compared to recent advances in natural language processing. Our proposal considers 88 features organized in surface information, part of speech, discursive characteristics, and readability indices. On a 42 677 news database, we show that the classification results outperform previous work, even outperforming state-of-the-art techniques such as BERT, reaching 99.99% accuracy. A proper selection of linguistic features is crucial for interpretability as well as the performance of the models. In this sense, our proposal contributes to the intentional selection of linguistic features, overcoming current technical issues. We identified 32 features that show differences between the type of news. The results are highly competitive in the classification and simple to implement and interpret.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"147 ","pages":"Article 102207"},"PeriodicalIF":2.5,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49753017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}