{"title":"Learning recommendations from educational event data in higher education","authors":"Gyunam Park, Lukas Liss, Wil M. P. van der Aalst","doi":"10.1007/s10844-024-00873-w","DOIUrl":"https://doi.org/10.1007/s10844-024-00873-w","url":null,"abstract":"<p>This paper presents a novel approach for generating actionable recommendations from educational event data collected by Campus Management Systems (CMS) to enhance study planning in higher education. The approach unfolds in three phases: feature identification tailored to the educational context, predictive modeling employing the RuleFit algorithm, and extracting actionable recommendations. We utilize diverse features, encompassing academic histories and course sequences, to capture the multi-dimensional nature of student academic behaviors. The effectiveness of our approach is empirically validated using data from the computer science bachelor’s program at RWTH Aachen University, with the goal of predicting overall GPA and formulating recommendations to enhance academic performance. Our contributions lie in the novel adaptation of behavioral features for the educational domain and the strategic use of the RuleFit algorithm for both predictive modeling and the generation of practical recommendations, offering a data-driven foundation for informed study planning and academic decision-making.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"16 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporal knowledge completion enhanced self-supervised entity alignment","authors":"Teng Fu, Gang Zhou","doi":"10.1007/s10844-024-00878-5","DOIUrl":"https://doi.org/10.1007/s10844-024-00878-5","url":null,"abstract":"<p>Temporal graph entity alignment aims at finding the equivalent entity pairs across different temporal knowledge graphs (TKGs). Primarily methods mainly utilize a time-aware and relationship-aware approach to embed and align. However, the existence of long-tail entities in TKGs still restricts the accuracy of alignment, as the limited neighborhood information may restrict the available neighborhood information for obtaining high-quality embeddings, and hence would impact the efficiency of entity alignment in representation space. Moreover, most previous researches are supervised, with heavy dependence on seed labels for alignment, restricting their applicability in scenarios with limited resources. To tackle these challenges, we propose a Temporal Knowledge Completion enhanced Self-supervised Entity Alignment (TSEA). We argue that, with high-quality embeddings, the entities would be aligned in a self-supervised manner. To this end, TSEA is constituted of two modules: A graph completion module to predict the missing links for the long-tailed entities. With the improved graph, TSEA further incorporates a self-supervised entity alignment module to achieve unsupervised alignment. Experimental results on widely adopted benchmarks demonstrate improved performance compared to several recent baseline methods. Additional ablation experiments further corroborate the efficacy of the proposed modules.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"58 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmed A. Ewees, Marwa A. Gaheen, Mohammed M. Alshahrani, Ahmed M. Anter, Fatma H. Ismail
{"title":"Improved machine learning technique for feature reduction and its application in spam email detection","authors":"Ahmed A. Ewees, Marwa A. Gaheen, Mohammed M. Alshahrani, Ahmed M. Anter, Fatma H. Ismail","doi":"10.1007/s10844-024-00870-z","DOIUrl":"https://doi.org/10.1007/s10844-024-00870-z","url":null,"abstract":"<p>This paper introduces MPAG, a new feature selection method aimed at overcoming the limitations of the conventional Marine Predators Algorithm (MPA). The MPA may experience stagnation and become trapped in local optima during optimization. To address this challenge, we propose a refined version of the MPA, termed MPAG, which incorporates the Local Escape Operator (LEO) from the gradient-based optimizer (GBO). By leveraging the LEO operator, MPAG enhances the exploration ability of the MPA, particularly during the initial one-third of iterations. This enhancement injects more diversity into populations, thereby improving the process of search space discovery and mitigating the risk of premature convergence. The performance of MPAG is evaluated on 14 feature selection benchmark datasets, employing seven performance measures including fitness value, classification accuracy, and selected features. Our findings indicate that MPAG outperforms other algorithms in 86% of the datasets, underscoring its capability to select the most relevant features across various datasets while maintaining stability. Additionally, MPAG is evaluated using two cybersecurity applications, specifically spam detection datasets, where it demonstrates superior performance across most performance measures compared to other methods.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"77 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint entity and relation extraction with fusion of multi-feature semantics","authors":"Ting Wang, Wenjie Yang, Tao Wu, Chuan Yang, Jiaying Liang, Hongyang Wang, Jia Li, Dong Xiang, Zheng Zhou","doi":"10.1007/s10844-024-00871-y","DOIUrl":"https://doi.org/10.1007/s10844-024-00871-y","url":null,"abstract":"<p>Entity relation extraction is a key technology for extracting structured information from unstructured text and serves as the foundation for building large-scale knowledge graphs. Current joint entity relation extraction methods primarily focus on improving the recognition of overlapping triplets to enhance the overall performance of the model. However, the model still faces numerous challenges in managing intra-triplet and inter-triplet interactions, expanding the breadth of semantic encoding, and reducing information redundancy during the extraction process. These issues make it challenging for the model to achieve satisfactory performance in both normal and overlapping triple extraction. To address these challenges, this study proposes a comprehensive prediction network that includes multi-feature semantic fusion. We have developed a semantic fusion module that integrates entity mask embedding sequences, which enhance connections between entities, and context embedding sequences that provide richer semantic information, to enhance inter-triplet interactions and expand semantic encoding. Subsequently, using a parallel decoder to simultaneously generate a set of triplets, improving the interaction between them. Additionally, we utilize an entity mask sequence to finely prune these triplets, optimizing the final set of triplets. Experimental results on the publicly available datasets NYT and WebNLG demonstrate that, with BERT as the encoder, our model outperforms the baseline model in terms of accuracy and F1 score.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"34 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141870542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vitor A. Batista, Diogo S. M. Gomes, Alexandre Evsukoff
{"title":"SESAME - self-supervised framework for extractive question answering over document collections","authors":"Vitor A. Batista, Diogo S. M. Gomes, Alexandre Evsukoff","doi":"10.1007/s10844-024-00869-6","DOIUrl":"https://doi.org/10.1007/s10844-024-00869-6","url":null,"abstract":"<p>Question Answering is one of the most relevant areas in the field of Natural Language Processing, rapidly evolving with promising results due to the increasing availability of suitable datasets and the advent of new technologies, such as Generative Models. This article introduces SESAME, a Self-supervised framework for Extractive queStion Answering over docuMent collEctions. SESAME aims to enhance open-domain question answering systems (ODQA) by leveraging domain adaptation with synthetic datasets, enabling efficient question answering over private document collections with low resource usage. The framework incorporates recent advances with large language models, and an efficient hybrid method for context retrieval. We conducted several sets of experiments with the Machine Reading for Question Answering (MRQA) 2019 Shared Task datasets, FAQuAD - a Brazilian Portuguese reading comprehension dataset, Wikipedia, and Retrieval-Augmented Generation Benchmark, to demonstrate SESAME’s effectiveness. The results indicate that SESAME’s domain adaptation using synthetic data significantly improves QA performance, generalizes across different domains and languages, and competes with or surpasses state-of-the-art systems in ODQA. Finally, SESAME is an open-source tool, and all code, datasets and experimental data are available for public use in our repository.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"15 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141870711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camilla Sancricca, Giovanni Siracusa, Cinzia Cappiello
{"title":"Enhancing data preparation: insights from a time series case study","authors":"Camilla Sancricca, Giovanni Siracusa, Cinzia Cappiello","doi":"10.1007/s10844-024-00867-8","DOIUrl":"https://doi.org/10.1007/s10844-024-00867-8","url":null,"abstract":"<p>Data play a key role in AI systems that support decision-making processes. Data-centric AI highlights the importance of having high-quality input data to obtain reliable results. However, well-preparing data for machine learning is becoming difficult due to the variety of data quality issues and available data preparation tasks. For this reason, approaches that help users in performing this demanding phase are needed. This work proposes DIANA, a framework for data-centric AI to support data exploration and preparation, suggesting suitable cleaning tasks to obtain valuable analysis results. We design an adaptive self-service environment that can handle the analysis and preparation of different types of sources, i.e., tabular, and streaming data. The central component of our framework is a knowledge base that collects evidence related to the effectiveness of the data preparation actions along with the type of input data and the considered machine learning model. In this paper, we first describe the framework, the knowledge base model, and its enrichment process. Then, we show the experiments conducted to enrich the knowledge base in a particular case study: time series data streams.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"78 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141777860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heuristic approaches for non-exhaustive pattern-based change detection in dynamic networks","authors":"Corrado Loglisci, Angelo Impedovo, Toon Calders, Michelangelo Ceci","doi":"10.1007/s10844-024-00866-9","DOIUrl":"https://doi.org/10.1007/s10844-024-00866-9","url":null,"abstract":"<p>Dynamic networks are ubiquitous in many domains for modelling evolving graph-structured data and detecting changes allows us to understand the dynamic of the domain represented. A category of computational solutions is represented by the pattern-based change detectors (PBCDs), which are non-parametric unsupervised change detection methods based on observed changes in sets of frequent patterns over time. Patterns have the ability to depict the structural information of the sub-graphs, becoming a useful tool in the interpretation of the changes. Existing PBCDs often rely on exhaustive mining, which corresponds to the worst-case exponential time complexity, making this category of algorithms inefficient in practice. In fact, in such a case, the pattern mining process is even more time-consuming and inefficient due to the combinatorial explosion of the sub-graph pattern space caused by the inherent complexity of the graph structure. Non-exhaustive search strategies can represent a possible approach to this problem, also because not all the possible frequent patterns contribute to changes in the time-evolving data. In this paper, we investigate the viability of different heuristic approaches which prevent the complete exploration of the search space, by returning a concise set of sub-graph patterns (compared to the exhaustive case). The heuristics differ on the criterion used to select representative patterns. The results obtained on real-world and synthetic dynamic networks show that these solutions are effective, when mining patterns, and even more accurate when detecting changes.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"18 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141508975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wajeeha Nasar, Ricardo da Silva Torres, Odd Erik Gundersen, Anniken Susanne Thoresen Karlsen
{"title":"Improving search and rescue planning and resource allocation through case-based and concept-based retrieval","authors":"Wajeeha Nasar, Ricardo da Silva Torres, Odd Erik Gundersen, Anniken Susanne Thoresen Karlsen","doi":"10.1007/s10844-024-00861-0","DOIUrl":"https://doi.org/10.1007/s10844-024-00861-0","url":null,"abstract":"<p>The need for effective and efficient search and rescue operations is more important than ever as the frequency and severity of disasters increase due to the escalating effects of climate change. Recognizing the value of personal knowledge and past experiences of experts, in this paper, we present findings of an investigation of how past knowledge and experts’ experiences can be effectively integrated with current search and rescue practices to improve rescue planning and resource allocation. A special focus is on investigating and demonstrating the potential associated with integrating knowledge graphs and case-based reasoning as a viable approach for search and rescue decision support. As part of our investigation, we have implemented a demonstrator system using a Norwegian search and rescue dataset and case-based and concept-based similarity retrieval. The main contribution of the paper is insight into how case-based and concept-based retrieval services can be designed to improve the effectiveness of search and rescue planning. To evaluate the validity of ranked cases in terms of how they align with the existing knowledge and insights of search and rescue experts, we use evaluation measures such as precision and recall. In our evaluation, we observed that attributes, such as the rescue operation type, have high precision, while the precision associated with the objects involved is relatively low. Central findings from our evaluation process are that knowledge-based creation, as well as case- and concept-based similarity retrieval services, can be beneficial in optimizing search and rescue planning time and allocating appropriate resources according to search and rescue incident descriptions.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"17 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141194227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generative adversarial meta-learning knowledge graph completion for large-scale complex knowledge graphs","authors":"Weiming Tong, Xu Chu, Zhongwei Li, Liguo Tan, Jinxiao Zhao, Feng Pan","doi":"10.1007/s10844-024-00860-1","DOIUrl":"https://doi.org/10.1007/s10844-024-00860-1","url":null,"abstract":"<p>In the study of large-scale complex knowledge graphs, due to the incompleteness of knowledge and the existence of low-frequency knowledge samples, existing knowledge graph complementation methods are often limited by the amount of data and ignore the complex semantic information. To solve this problem, this paper proposes a knowledge graph completion method CGAML based on the combination of Conditional Generative Adversarial Network and Meta-Learning, which utilizes the hierarchical background knowledge as the basis and introduces conditional variables in the Generative Adversarial Network to represent the required semantic information to constrain the semantic attributes of the generated knowledge. In addition, we design a meta-learning multi-task framework to embed Conditional Generative Adversarial Networks into the meta-learning process and propose local constraints and global gradient optimization strategies to quickly adapt to new tasks and improve computational efficiency. Empirically, our method demonstrates superior performance in realizing few-shot link prediction when compared to existing representative methods.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"28 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141169639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving the clarity of questions in Community Question Answering networks","authors":"Alireza Khabbazan, Ahmad Ali Abin, Viet-Vu Vu","doi":"10.1007/s10844-024-00847-y","DOIUrl":"https://doi.org/10.1007/s10844-024-00847-y","url":null,"abstract":"<p>Every day, thousands of questions are asked on the Community Question Answering network, making these questions and answers extremely valuable for information seekers around the world. However, a significant proportion of these questions do not elicit proper answers. There are several reasons for this, with the lack of clarity in questions being one of the most crucial factors. In this study, our primary focus is on enhancing the clarity of unclear questions in Community Question Answering networks. In the first step, DistilBERT, which uses Siamese and triplet network structures for meaningful sentence embeddings, is combined with HDBSCAN, effective in diverse noise datasets and less sensitive to density variations, to extract unique features from each question. Questions were then categorized as clear or unclear using an Extremely Randomized Trees ensemble model, known for its robust resistance to class imbalance, with more than 90% accuracy. Next, efforts were made to extract information that could enhance the clarity of unclear questions by comparing them with similar, clearer questions using Dynamic Time Warping, a versatile technique suitable for time series analyses in information systems and applicable across various domains. Finally, the extracted information was incorporated into the feature vector of unclear questions based on histogram-coverage methods to enhance their clarity. When a question is made clearer, the missing information and its importance are shown to the questioner. This enables the questioner to be aware of the missing information and facilitates them in clarifying the question.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"30 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140888363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}