Allegra De Filippo, Emanuele Di Giacomo, Andrea Borghesi
{"title":"Machine learning approaches to predict the execution time of the meteorological simulation software COSMO","authors":"Allegra De Filippo, Emanuele Di Giacomo, Andrea Borghesi","doi":"10.1007/s10844-024-00880-x","DOIUrl":"https://doi.org/10.1007/s10844-024-00880-x","url":null,"abstract":"<p>Predicting the execution time of weather forecast models is a complex task, since these models are usually performed on High Performance Computing systems that require large computing capabilities. Indeed, a reliable prediction can imply several benefits, by allowing for an improved planning of the model execution, a better allocation of available resources, and the identification of possible anomalies. However, to make such predictions is usually hard, since there is a scarcity of datasets that benchmark the existing meteorological simulation models. In this work, we focus on the runtime predictions of the execution of the COSMO (COnsortium for SMall-scale MOdeling) weather forecasting model used at the Hydro-Meteo-Climate Structure of the Regional Agency for the Environment and Energy Prevention Emilia-Romagna. We show how a plethora of Machine Learning approaches can obtain accurate runtime predictions of this complex model, by designing a new well-defined benchmark for this application task. Indeed, our contribution is twofold: 1) the creation of a large public dataset reporting the runtime of COSMO run under a variety of different configurations; 2) a comparative study of ML models, which greatly outperform the current state-of-practice used by the domain experts. This data collection represents an essential initial benchmark for this application field, and a useful resource for analyzing the model performance: better accuracy in runtime predictions could help facility owners to improve job scheduling and resource allocation of the entire system; while for a final user, a posteriori analysis could help to identify anomalous runs.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"75 1 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Span-based semantic syntactic dual enhancement for aspect sentiment triplet extraction","authors":"Shuxia Ren, Zewei Guo, Xiaohan Li, Ruikun Zhong","doi":"10.1007/s10844-024-00881-w","DOIUrl":"https://doi.org/10.1007/s10844-024-00881-w","url":null,"abstract":"<p>Aspect-Based Sentiment Triple Extraction (ASTE), a critical sub-task of Aspect-Based Sentiment Analysis (ABSA), has received extensive attention in recent years. ASTE aims to extract structured sentiment triples from texts, with most existing studies focusing on designing new strategic frameworks. Nonetheless, these methods often overlook the complex characteristics of linguistic expression and the deeper semantic nuances, leading to deficiencies in extracting the semantic representations of triples and effectively utilizing syntactic relationships in texts. To address these challenges, this paper introduces a span-based semantic and syntactic Dual-Enhanced model that deeply integrates rich syntactic information, such as part-of-speech tagging, constituent syntax, and dependency syntax structures. Specifically, we designed a semantic encoder and a syntactic encoder to capture the semantic-syntactic information closely related to the sentence’s underlying intent. Through a Feature Interaction Module, we effectively integrate information across different dimensions and promote a more comprehensive understanding of the relationships between aspects and opinions. We also adopted a span-based tagging scheme that generates more precise aspect sentiment triple extractions by exploring cross-level information and constraints. Experimental results on benchmark datasets derived from the SemEval challenge prove that our model significantly outperforms existing baselines.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"26 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anti Alman, Alessio Arleo, Iris Beerepoot, Andrea Burattin, Claudio Di Ciccio, Manuel Resinas
{"title":"Tiramisù: making sense of multi-faceted process information through time and space","authors":"Anti Alman, Alessio Arleo, Iris Beerepoot, Andrea Burattin, Claudio Di Ciccio, Manuel Resinas","doi":"10.1007/s10844-024-00875-8","DOIUrl":"https://doi.org/10.1007/s10844-024-00875-8","url":null,"abstract":"<p>Knowledge-intensive processes represent a particularly challenging scenario for process mining. The flexibility that such processes allow constitutes a hurdle as they are hard to capture in a single model. To tackle this problem, multiple visual representations of the same processes could be beneficial, each addressing different information dimensions according to the specific needs and background knowledge of the concrete process workers and stakeholders. In this paper, we propose, describe, and evaluate a framework, named <span>Tiramisù</span> , that leverages visual analytics for the interactive visualization of multi-faceted process information, aimed at supporting the investigation and insight generation of users in their process analysis tasks. <span>Tiramisù</span> is based on a multi-layer visualization methodology that includes a visual backdrop that provides context and an arbitrary number of superimposed and on-demand dimension layers. This arrangement allows our framework to display process information from different perspectives and to project this information onto a domain-friendly representation of the context in which the process unfolds. We provide an in-depth description of the approach’s founding principles, deeply rooted in visualization research, that justify our design choices for the whole framework. We demonstrate the feasibility of the framework through its application in two use-case scenarios in the context of healthcare and personal information management. Plus, we conducted qualitative evaluations with potential end users of both scenarios, gathering precious insights about the efficacy and applicability of our framework to various application domains.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"72 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning recommendations from educational event data in higher education","authors":"Gyunam Park, Lukas Liss, Wil M. P. van der Aalst","doi":"10.1007/s10844-024-00873-w","DOIUrl":"https://doi.org/10.1007/s10844-024-00873-w","url":null,"abstract":"<p>This paper presents a novel approach for generating actionable recommendations from educational event data collected by Campus Management Systems (CMS) to enhance study planning in higher education. The approach unfolds in three phases: feature identification tailored to the educational context, predictive modeling employing the RuleFit algorithm, and extracting actionable recommendations. We utilize diverse features, encompassing academic histories and course sequences, to capture the multi-dimensional nature of student academic behaviors. The effectiveness of our approach is empirically validated using data from the computer science bachelor’s program at RWTH Aachen University, with the goal of predicting overall GPA and formulating recommendations to enhance academic performance. Our contributions lie in the novel adaptation of behavioral features for the educational domain and the strategic use of the RuleFit algorithm for both predictive modeling and the generation of practical recommendations, offering a data-driven foundation for informed study planning and academic decision-making.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"16 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Temporal knowledge completion enhanced self-supervised entity alignment","authors":"Teng Fu, Gang Zhou","doi":"10.1007/s10844-024-00878-5","DOIUrl":"https://doi.org/10.1007/s10844-024-00878-5","url":null,"abstract":"<p>Temporal graph entity alignment aims at finding the equivalent entity pairs across different temporal knowledge graphs (TKGs). Primarily methods mainly utilize a time-aware and relationship-aware approach to embed and align. However, the existence of long-tail entities in TKGs still restricts the accuracy of alignment, as the limited neighborhood information may restrict the available neighborhood information for obtaining high-quality embeddings, and hence would impact the efficiency of entity alignment in representation space. Moreover, most previous researches are supervised, with heavy dependence on seed labels for alignment, restricting their applicability in scenarios with limited resources. To tackle these challenges, we propose a Temporal Knowledge Completion enhanced Self-supervised Entity Alignment (TSEA). We argue that, with high-quality embeddings, the entities would be aligned in a self-supervised manner. To this end, TSEA is constituted of two modules: A graph completion module to predict the missing links for the long-tailed entities. With the improved graph, TSEA further incorporates a self-supervised entity alignment module to achieve unsupervised alignment. Experimental results on widely adopted benchmarks demonstrate improved performance compared to several recent baseline methods. Additional ablation experiments further corroborate the efficacy of the proposed modules.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"58 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142217578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmed A. Ewees, Marwa A. Gaheen, Mohammed M. Alshahrani, Ahmed M. Anter, Fatma H. Ismail
{"title":"Improved machine learning technique for feature reduction and its application in spam email detection","authors":"Ahmed A. Ewees, Marwa A. Gaheen, Mohammed M. Alshahrani, Ahmed M. Anter, Fatma H. Ismail","doi":"10.1007/s10844-024-00870-z","DOIUrl":"https://doi.org/10.1007/s10844-024-00870-z","url":null,"abstract":"<p>This paper introduces MPAG, a new feature selection method aimed at overcoming the limitations of the conventional Marine Predators Algorithm (MPA). The MPA may experience stagnation and become trapped in local optima during optimization. To address this challenge, we propose a refined version of the MPA, termed MPAG, which incorporates the Local Escape Operator (LEO) from the gradient-based optimizer (GBO). By leveraging the LEO operator, MPAG enhances the exploration ability of the MPA, particularly during the initial one-third of iterations. This enhancement injects more diversity into populations, thereby improving the process of search space discovery and mitigating the risk of premature convergence. The performance of MPAG is evaluated on 14 feature selection benchmark datasets, employing seven performance measures including fitness value, classification accuracy, and selected features. Our findings indicate that MPAG outperforms other algorithms in 86% of the datasets, underscoring its capability to select the most relevant features across various datasets while maintaining stability. Additionally, MPAG is evaluated using two cybersecurity applications, specifically spam detection datasets, where it demonstrates superior performance across most performance measures compared to other methods.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"77 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141932616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint entity and relation extraction with fusion of multi-feature semantics","authors":"Ting Wang, Wenjie Yang, Tao Wu, Chuan Yang, Jiaying Liang, Hongyang Wang, Jia Li, Dong Xiang, Zheng Zhou","doi":"10.1007/s10844-024-00871-y","DOIUrl":"https://doi.org/10.1007/s10844-024-00871-y","url":null,"abstract":"<p>Entity relation extraction is a key technology for extracting structured information from unstructured text and serves as the foundation for building large-scale knowledge graphs. Current joint entity relation extraction methods primarily focus on improving the recognition of overlapping triplets to enhance the overall performance of the model. However, the model still faces numerous challenges in managing intra-triplet and inter-triplet interactions, expanding the breadth of semantic encoding, and reducing information redundancy during the extraction process. These issues make it challenging for the model to achieve satisfactory performance in both normal and overlapping triple extraction. To address these challenges, this study proposes a comprehensive prediction network that includes multi-feature semantic fusion. We have developed a semantic fusion module that integrates entity mask embedding sequences, which enhance connections between entities, and context embedding sequences that provide richer semantic information, to enhance inter-triplet interactions and expand semantic encoding. Subsequently, using a parallel decoder to simultaneously generate a set of triplets, improving the interaction between them. Additionally, we utilize an entity mask sequence to finely prune these triplets, optimizing the final set of triplets. Experimental results on the publicly available datasets NYT and WebNLG demonstrate that, with BERT as the encoder, our model outperforms the baseline model in terms of accuracy and F1 score.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"34 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141870542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vitor A. Batista, Diogo S. M. Gomes, Alexandre Evsukoff
{"title":"SESAME - self-supervised framework for extractive question answering over document collections","authors":"Vitor A. Batista, Diogo S. M. Gomes, Alexandre Evsukoff","doi":"10.1007/s10844-024-00869-6","DOIUrl":"https://doi.org/10.1007/s10844-024-00869-6","url":null,"abstract":"<p>Question Answering is one of the most relevant areas in the field of Natural Language Processing, rapidly evolving with promising results due to the increasing availability of suitable datasets and the advent of new technologies, such as Generative Models. This article introduces SESAME, a Self-supervised framework for Extractive queStion Answering over docuMent collEctions. SESAME aims to enhance open-domain question answering systems (ODQA) by leveraging domain adaptation with synthetic datasets, enabling efficient question answering over private document collections with low resource usage. The framework incorporates recent advances with large language models, and an efficient hybrid method for context retrieval. We conducted several sets of experiments with the Machine Reading for Question Answering (MRQA) 2019 Shared Task datasets, FAQuAD - a Brazilian Portuguese reading comprehension dataset, Wikipedia, and Retrieval-Augmented Generation Benchmark, to demonstrate SESAME’s effectiveness. The results indicate that SESAME’s domain adaptation using synthetic data significantly improves QA performance, generalizes across different domains and languages, and competes with or surpasses state-of-the-art systems in ODQA. Finally, SESAME is an open-source tool, and all code, datasets and experimental data are available for public use in our repository.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"15 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141870711","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Camilla Sancricca, Giovanni Siracusa, Cinzia Cappiello
{"title":"Enhancing data preparation: insights from a time series case study","authors":"Camilla Sancricca, Giovanni Siracusa, Cinzia Cappiello","doi":"10.1007/s10844-024-00867-8","DOIUrl":"https://doi.org/10.1007/s10844-024-00867-8","url":null,"abstract":"<p>Data play a key role in AI systems that support decision-making processes. Data-centric AI highlights the importance of having high-quality input data to obtain reliable results. However, well-preparing data for machine learning is becoming difficult due to the variety of data quality issues and available data preparation tasks. For this reason, approaches that help users in performing this demanding phase are needed. This work proposes DIANA, a framework for data-centric AI to support data exploration and preparation, suggesting suitable cleaning tasks to obtain valuable analysis results. We design an adaptive self-service environment that can handle the analysis and preparation of different types of sources, i.e., tabular, and streaming data. The central component of our framework is a knowledge base that collects evidence related to the effectiveness of the data preparation actions along with the type of input data and the considered machine learning model. In this paper, we first describe the framework, the knowledge base model, and its enrichment process. Then, we show the experiments conducted to enrich the knowledge base in a particular case study: time series data streams.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"78 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141777860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-task learning and mutual information maximization with crossmodal transformer for multimodal sentiment analysis","authors":"Yang Shi, Jinglang Cai, Lei Liao","doi":"10.1007/s10844-024-00858-9","DOIUrl":"https://doi.org/10.1007/s10844-024-00858-9","url":null,"abstract":"<p>The effectiveness of multimodal sentiment analysis hinges on the seamless integration of information from diverse modalities, where the quality of modality fusion directly influences sentiment analysis accuracy. Prior methods often rely on intricate fusion strategies, elevating computational costs and potentially yielding inaccurate multimodal representations due to distribution gaps and information redundancy across heterogeneous modalities. This paper centers on the backpropagation of loss and introduces a Transformer-based model called Multi-Task Learning and Mutual Information Maximization with Crossmodal Transformer (MMMT). Addressing the issue of inaccurate multimodal representation for MSA, MMMT effectively combines mutual information maximization with crossmodal Transformer to convey more modality-invariant information to multimodal representation, fully exploring modal commonalities. Notably, it utilizes multi-modal labels for uni-modal training, presenting a fresh perspective on multi-task learning in MSA. Comparative experiments on the CMU-MOSI and CMU-MOSEI datasets demonstrate that MMMT improves model accuracy while reducing computational burden, making it suitable for resource-constrained and real-time performance-requiring application scenarios. Additionally, ablation experiments validate the efficacy of multi-task learning and probe the specific impact of combining mutual information maximization with Transformer in MSA.</p>","PeriodicalId":56119,"journal":{"name":"Journal of Intelligent Information Systems","volume":"16 1","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}