{"title":"A motif-based probabilistic approach for community detection in complex networks","authors":"Hossein Hajibabaei, Vahid Seydi, Abbas Koochari","doi":"10.1007/s10844-024-00850-3","DOIUrl":"https://doi.org/10.1007/s10844-024-00850-3","url":null,"abstract":"<p>Community detection in complex networks is an important task for discovering hidden information in network analysis. Neighborhood density between nodes is one of the fundamental indicators of community presence in the network. A community with a high edge density will have correlations between nodes that extend beyond their immediate neighbors, denoted by motifs. Motifs are repetitive patterns of edges observed with high frequency in the network. We proposed the PCDMS method (Probabilistic Community Detection with Motif Structure) that detects communities by estimating the triangular motif in the network. This study employs structural density between nodes, a key concept in graph analysis. The proposed model has the advantage of using a probabilistic generative model that calculates the latent parameters of the probabilistic model and determines the community based on the likelihood of triangular motifs. The relationship between observing two pairs of nodes in multiple communities leads to an increasing likelihood estimation of the existence of a motif structure between them. The output of the proposed model is the intensity of each node in the communities. The efficiency and validity of the proposed method are evaluated through experimental work on both synthetic and real-world networks; the findings will show that the community identified by the proposed method is more accurate and dense than other algorithms with modularity, NMI, and F1score evaluation metrics.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"16 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Early detection of fake news on emerging topics through weak supervision","authors":"Serhat Hakki Akdag, Nihan Kesim Cicekli","doi":"10.1007/s10844-024-00852-1","DOIUrl":"https://doi.org/10.1007/s10844-024-00852-1","url":null,"abstract":"<p>In this paper, we present a methodology for the early detection of fake news on emerging topics through the innovative application of weak supervision. Traditional techniques for fake news detection often rely on fact-checkers or supervised learning with labeled data, which is not readily available for emerging topics. To address this, we introduce the Weakly Supervised Text Classification framework (WeSTeC), an end-to-end solution designed to programmatically label large-scale text datasets within specific domains and train supervised text classifiers using the assigned labels. The proposed framework automatically generates labeling functions through multiple weak labeling strategies and eliminates underperforming ones. Labels assigned through the generated labeling functions are then used to fine-tune a pre-trained RoBERTa classifier for fake news detection. By using a weakly labeled dataset, which contains fake news related to the emerging topic, the trained fake news detection model becomes specialized for the topic under consideration. We explore both semi-supervision and domain adaptation setups, utilizing small amounts of labeled data and labeled data from other domains, respectively. The fake news classification model generated by the proposed framework excels when compared with all baselines in both setups. In addition, when compared to its fully supervised counterpart, our fake news detection model trained through weak labels achieves accuracy within 1%, emphasizing the robustness of the proposed framework’s weak labeling capabilities.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"186 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A hybrid recognition framework of crucial seed spreaders in complex networks with neighborhood overlap","authors":"Tianchi Tong, Min Wang, Wenying Yuan, Qian Dong, Jinsheng Sun, Yuan Jiang","doi":"10.1007/s10844-024-00849-w","DOIUrl":"https://doi.org/10.1007/s10844-024-00849-w","url":null,"abstract":"<p>Recognizing crucial seed spreaders of complex networks is an open issue that studies the dynamic spreading process and analyzes the performance of networks. However, most of the findings design the hierarchical model based on nodes’ degree such as Kshell decomposition for obtaining global information, and identifying effects brought by the weight value of each layer is coarse. In addition, local structural information fails to be effectively captured when neighborhood nodes are sometimes unconnected in the hierarchical structure. To solve these issues, in this paper, we design a novel hierarchical structure based on the shortest path distance by using the interpretative structure model and determine influence weights of each layer. Furthermore, we also design the local neighborhood overlap coefficient and the local index based on the overlap (LIO) by considering two conditions of connected and unconnected neighborhood nodes in the hierarchical structure. For reaching a comprehensive recognition and finding crucial seed spreaders precisely, we introduce influence weights vector, local evaluation index matrix after normalization and the weight vector of local indexes into a new hybrid recognition framework. The proposed method adopts a series of indicators, including the monotonicity relation, Susceptible-Infected-Susceptible model, complementary cumulative distribution function, Kendall’s coefficient, spreading scale ratio and average shortest path length, to execute corresponding experiments and evaluate the diffusion ability in different datasets. Results demonstrate that, our method outperforms involved algorithms in the recognition effects and spreading capability.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"9 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140149906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ensemble of temporal Transformers for financial time series","authors":"Kenniy Olorunnimbe, Herna Viktor","doi":"10.1007/s10844-024-00851-2","DOIUrl":"https://doi.org/10.1007/s10844-024-00851-2","url":null,"abstract":"<p>The accuracy of price forecasts is important for financial market trading strategies and portfolio management. Compared to traditional models such as ARIMA and other state-of-the-art deep learning techniques, temporal Transformers with similarity embedding perform better for multi-horizon forecasts in financial time series, as they account for the conditional heteroscedasticity inherent in financial data. Despite this, the methods employed in generating these forecasts must be optimized to achieve the highest possible level of precision. One approach that has been shown to improve the accuracy of machine learning models is ensemble techniques. To this end, we present an ensemble approach that efficiently utilizes the available data over an extended timeframe. Our ensemble combines multiple temporal Transformer models learned within sliding windows, thereby making optimal use of the data. As combination methods, along with an averaging approach, we also introduced a stacking meta-learner that leverages a quantile estimator to determine the optimal weights for combining the base models of smaller windows. By decomposing the constituent time series of an extended timeframe, we optimize the utilization of the series for financial deep learning. This simplifies the training process of a temporal Transformer model over an extended time series while achieving better performance, particularly when accounting for the non-constant variance of financial time series. Our experiments, conducted across volatile and non-volatile extrapolation periods, using 20 companies from the Dow Jones Industrial Average show more than 40% and 60% improvement in predictive performance compared to the baseline temporal Transformer.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"1 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140017636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing sentiment and emotion translation of review text through MLM knowledge integration in NMT","authors":"Divya Kumari, Asif Ekbal","doi":"10.1007/s10844-024-00843-2","DOIUrl":"https://doi.org/10.1007/s10844-024-00843-2","url":null,"abstract":"<p>Producing a high-quality review translation is a multifaceted process. It goes beyond successful semantic transfer and requires conveying the original message’s tone and style in a way that resonates with the target audience, whether they are human readers or Natural Language Processing (NLP) applications. Capturing these subtle nuances of the review text demands a deeper understanding and better encoding of the source message. In order to achieve this goal, we explore the use of self-supervised masked language modeling (MLM) and a variant called polarity masked language modeling (p-MLM) as auxiliary tasks in a multi-learning setup. MLM is widely recognized for its ability to capture rich linguistic representations of the input and has been shown to achieve state-of-the-art accuracy in various language understanding tasks. Motivated by its effectiveness, in this paper we adopt joint learning, combining the neural machine translation (NMT) task with source polarity-masked language modeling within a shared embedding space to induce a deeper understanding of the emotional nuances of the text. We analyze the results and observe that our multi-task model indeed exhibits a better understanding of linguistic concepts like sentiment and emotion. Intriguingly, this is achieved even without explicit training on sentiment-annotated or domain-specific sentiment corpora. Our multi-task NMT model consistently improves the translation quality of affect sentences from diverse domains in three language pairs.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"135 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-02-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140010850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CMC-MMR: multi-modal recommendation model with cross-modal correction","authors":"","doi":"10.1007/s10844-024-00848-x","DOIUrl":"https://doi.org/10.1007/s10844-024-00848-x","url":null,"abstract":"<h3>Abstract</h3> <p>Multi-modal recommendation using multi-modal features (e.g., image and text features) has received significant attention and has been shown to have more effective recommendation. However, there are currently the following problems with multi-modal recommendation: (1) Multi-modal recommendation often handle individual modes’ raw data directly, leading to noise affecting the model’s effectiveness and the failure to explore interconnections between modes; (2) Different users have different preferences. It’s impractical to treat all modalities equally, as this could interfere with the model’s ability to make recommendation. To address the above problems, this paper proposes a <span>M</span>ulti-<span>m</span>odal <span>r</span>ecommendation model with <span>c</span>ross-<span>m</span>odal <span>c</span>orrection (CMC-MMR). Firstly, in order to reduce the effect of noise in the raw data and to take full advantage of the relationships between modes, we designed a cross-modal correction module to denoise and correct the modes using a cross-modal correction mechanism; Secondly, the similarity between the same modalities of each item is used as a benchmark to build item-item graphs for each modality, and user-item graphs with degree-sensitive pruning strategies are also built to mine higher-order information; Finally, we designed a self-supervised task to adaptively mine user preferences for modality. We conducted comparative experiments with eleven baseline models on four real-world datasets. The experimental results show that CMC-MMR improves 6.202%, 4.975% , 6.054% and 11.368% on average on the four datasets, respectively, demonstrates the effectiveness of CMC-MMR.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"4 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139918440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Querying knowledge graphs through positive and negative examples and feedback","authors":"Akritas Akritidis, Yannis Tzitzikas","doi":"10.1007/s10844-024-00846-z","DOIUrl":"https://doi.org/10.1007/s10844-024-00846-z","url":null,"abstract":"<p>The formulation of structured queries over Knowledge Graphs is not an easy task. To alleviate this problem, we propose a novel interactive method for SPARQL query formulation, for enabling users (plain and advanced) to formulate gradually queries by providing examples and various kinds of positive and negative feedback, in a manner that does not pre-suppose knowledge of the query language or the contents of the Knowledge Graph. In comparison to other example-based query approaches, distinctive features of our approach is the support of negative examples, and the positive/negative feedback on the generated constraints. We detail the algorithmic aspect and we present an interactive user interface that implements the approach. The application of the model on real datasets from DBpedia (Movies, Actors) and other datasets (scientific papers), showcases the feasibility and the effectiveness of the approach. A task-based evaluation that included users that are not familiar with SPARQL, provided positive evidence that the interaction is easy-to-grasp and enabled most users to formulate the desired queries.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"74 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139752017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic-enhanced reasoning question answering over temporal knowledge graphs","authors":"Chenyang Du, Xiaoge Li, Zhongyang Li","doi":"10.1007/s10844-024-00840-5","DOIUrl":"https://doi.org/10.1007/s10844-024-00840-5","url":null,"abstract":"<p>Question Answering Over Temporal Knowledge Graphs (TKGQA) is an important topic in question answering. TKGQA focuses on accurately understanding questions involving temporal constraints and retrieving accurate answers from knowledge graphs. In previous research, the hierarchical structure of question contexts and the constraints imposed by temporal information on different sentence components have been overlooked. In this paper, we propose a framework called “Semantic-Enhanced Reasoning Question Answering” (SERQA) to tackle this problem. First, we adopt a pretrained language model (LM) to obtain the question relation representation vector. Then, we leverage syntactic information from the constituent tree and dependency tree, in combination with Masked Self-Attention (MSA), to enhance temporal constraint features. Finally, we integrate the temporal constraint features into the question relation representation using an information fusion function for answer prediction. Experimental results demonstrate that SERQA achieves better performance on the CRONQUESTIONS and ImConstrainedQuestions datasets. In comparison with existing temporal KGQA methods, our model exhibits outstanding performance in comprehending temporal constraint questions. The ablation experiments verified the effectiveness of combining the constituent tree and the dependency tree with MSA in question answering.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"40 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139669190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"KIMedQA: towards building knowledge-enhanced medical QA models","authors":"Aizan Zafar, Sovan Kumar Sahoo, Deeksha Varshney, Amitava Das, Asif Ekbal","doi":"10.1007/s10844-024-00844-1","DOIUrl":"https://doi.org/10.1007/s10844-024-00844-1","url":null,"abstract":"<p>Medical question-answering systems require the ability to extract accurate, concise, and comprehensive answers. They will better comprehend the complex text and produce helpful answers if they can reason on the explicit constraints described in the question’s textual context and the implicit, pertinent knowledge of the medical world. Integrating Knowledge Graphs (KG) with Language Models (LMs) is a common approach to incorporating structured information sources. However, effectively combining and reasoning over KG representations and language context remains an open question. To address this, we propose the Knowledge Infused Medical Question Answering system <b>(KIMedQA)</b>, which employs two techniques <i>viz.</i> relevant knowledge graph selection and pruning of the large-scale graph to handle Vector Space Inconsistent <i>(VSI)</i> and Excessive Knowledge Information <i>(EKI)</i>. The representation of the query and context are then combined with the pruned knowledge network using a pre-trained language model to generate an informed answer. Finally, we demonstrate through in-depth empirical evaluation that our suggested strategy provides cutting-edge outcomes on two benchmark datasets, namely MASH-QA and COVID-QA. We also compared our results to ChatGPT, a robust and very powerful generative model, and discovered that our model outperforms ChatGPT according to the F1 Score and human evaluation metrics such as adequacy.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"67 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139551670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data- & compute-efficient deviance mining via active learning and fast ensembles","authors":"","doi":"10.1007/s10844-024-00841-4","DOIUrl":"https://doi.org/10.1007/s10844-024-00841-4","url":null,"abstract":"<h3>Abstract</h3> <p>Detecting deviant traces in business process logs is crucial for modern organizations, given the harmful impact of deviant behaviours (e.g., attacks or faults). However, training a Deviance Prediction Model (DPM) by solely using supervised learning methods is impractical in scenarios where only few examples are labelled. To address this challenge, we propose an Active-Learning-based approach that leverages multiple DPMs and a temporal ensembling method that can train and merge them in a few training epochs. Our method needs expert supervision only for a few unlabelled traces exhibiting high prediction uncertainty. Tests on real data (of either complete or ongoing process instances) confirm the effectiveness of the proposed approach.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"10 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139551676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}