Errikos Streviniotis , Nikos Giatrakos , Yannis Kotidis , Thaleia Ntiniakou , Miguel Ponce de Leon
{"title":"RATS: A resource allocator for optimizing the execution of tumor simulations over HPC infrastructures","authors":"Errikos Streviniotis , Nikos Giatrakos , Yannis Kotidis , Thaleia Ntiniakou , Miguel Ponce de Leon","doi":"10.1016/j.is.2025.102538","DOIUrl":"10.1016/j.is.2025.102538","url":null,"abstract":"<div><div>In this work, we introduce RATS (<u>R</u>esource <u>A</u>llocator for <u>T</u>umor <u>S</u>imulations), the first optimizer for the execution of tumor simulations over HPC infrastructures. Given a set of drug therapies under in-silico study, the optimization framework of RATS can: <em>(i)</em> devise the optimal number of cores and prescribe the required number of core hours; and <em>(ii)</em> under core capacity constraints, RATS schedules the execution of simulations minimizing the overall number of core hours, simultaneously prioritizing the execution of expectedly promising in-silico trials higher compared to unpromising ones. RATS is deployed by life scientists at the Barcelona Supercomputing Center to remove the burden of blindly guessing the core hours needing to be reserved from HPC admins to study various tumor treatment methodologies, as well as to rapidly distinguish effective drug combinations, thus, potentially cutting time to market for new cancer therapies. The latter is further elevated by the RATS+ extension we plug into the initial framework. RATS+ employs a Transfer Learning approach to leverage optimization models and decisions from prior in-silico studies, thereby reducing the optimization effort required for new studies in this domain.</div><div>Our experimental evaluation, on real-world data derived from the execution of more than 2500 tumor simulations on the MareNostrum4 supercomputer, confirms the effectiveness of both RATS and RATS+ across the aforementioned performance dimensions.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"132 ","pages":"Article 102538"},"PeriodicalIF":3.0,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143579985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yingxia Tang , Yanxuan Wei , Teng Li , Xiangwei Zheng , Cun Ji
{"title":"A hierarchical transformer-based network for multivariate time series classification","authors":"Yingxia Tang , Yanxuan Wei , Teng Li , Xiangwei Zheng , Cun Ji","doi":"10.1016/j.is.2025.102536","DOIUrl":"10.1016/j.is.2025.102536","url":null,"abstract":"<div><div>In recent years, Transformer has demonstrated considerable potential in multivariate time series classification due to its exceptional strength in capturing global dependencies. However, as a generalized approach, it still faces challenges in processing time series data, such as insufficient temporal sensitivity and inadequate ability to capture local features. In this paper, a hierarchical Transformer-based network (Hformer) is innovatively proposed to address these problems. Hformer handles time series progressively at various stages to aggregate multi-scale representations. At the start of each stage, Hformer segments the input sequence and extracts features independently on each temporal slice. Furthermore, to better accommodate multivariate time series data, a more efficient absolute position encoding as well as relative position encoding are employed by Hformer. Experimental results on 30 multivariate time series datasets of the UEA archive demonstrate that the proposed method outperforms most state-of-the-art methods.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"132 ","pages":"Article 102536"},"PeriodicalIF":3.0,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143562707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Claudio Di Ciccio, Remco Dijkman, Adela del Río Ortega, Stefanie Rinderle-Ma, Manfred Reichert
{"title":"Special issue: BPM 2022 Selected papers in Foundations and Engineering","authors":"Claudio Di Ciccio, Remco Dijkman, Adela del Río Ortega, Stefanie Rinderle-Ma, Manfred Reichert","doi":"10.1016/j.is.2025.102535","DOIUrl":"10.1016/j.is.2025.102535","url":null,"abstract":"","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"131 ","pages":"Article 102535"},"PeriodicalIF":3.0,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143510294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advancing EHR analysis: Predictive medication modeling using LLMs","authors":"Hanan Alghamdi , Abeer Mostafa","doi":"10.1016/j.is.2025.102528","DOIUrl":"10.1016/j.is.2025.102528","url":null,"abstract":"<div><div>In modern healthcare systems, the analysis of Electronic Health Records (EHR) is fundamental for uncovering patient health trends and enhancing clinical practices. This study aims to advance EHR analysis by developing predictive models for prescribed medication prediction using the MIMIC-IV dataset. We address data preparation challenges through comprehensive cleaning and feature selection, transforming structured patient health data into coherent sentences suitable for natural language processing (NLP). We fine-tuned several state-of-the-art large language models (LLMs), including Llama2, Llama3, Gemma, GPT-3.5 Turbo, Meditron, Claude 3.5-Sonnet, DeepSeek-R1, Falcon and Mistral, using tailored prompts derived from EHR data. The models were rigorously evaluated based on Cosine similarity, recall, precision, and F1-score, with BERTScore as the evaluation metric to address limitations of traditional n-gram-based metrics. BERTScore utilizes contextualized token embeddings for semantic similarity, providing a more accurate and human-aligned evaluation. Our findings demonstrate that the integration of advanced NLP techniques with detailed EHR data significantly improves medication management predictions. This research highlights the potential of LLMs in clinical settings and underscores the importance of seamless integration with EHR systems to improve patient safety and healthcare delivery.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"131 ","pages":"Article 102528"},"PeriodicalIF":3.0,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Process-driven design of cloud data platforms","authors":"Matteo Francia, Matteo Golfarelli, Manuele Pasini","doi":"10.1016/j.is.2025.102527","DOIUrl":"10.1016/j.is.2025.102527","url":null,"abstract":"<div><div>Data platforms are state-of-the-art solutions for implementing data-driven applications and analytics. They facilitate the ingestion, storage, management, and exploitation of big data. Data platforms are built on top of complex ecosystems of services answering different data needs and requirements; such ecosystems are offered by different providers (e.g., Amazon AWS and Microsoft Azure). However, when it comes to engineering data platforms, no unifying strategy and methodology is available yet, and the design is mainly left to the expertise of practitioners in the field. Service providers simply expose a long list of interoperable and alternative engines, making it hard to select the optimal subset without a deep knowledge of the ecosystem. A more effective design approach starts with knowledge of the data transformation and exploitation processes that the platform should support. In this paper, we sketch a computer-aided design methodology and then focus on the selection of the optimal services needed to implement such processes. We show that our approach lightens the design of data platforms and enables an unbiased selection and comparison of solutions even through different service ecosystems.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"131 ","pages":"Article 102527"},"PeriodicalIF":3.0,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143196974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing cross-lingual text classification through linguistic and interpretability-guided attack strategies","authors":"Abdelmounaim Kerkri , Mohamed Amine Madani , Aya Qeraouch , Kaoutar Zouin","doi":"10.1016/j.is.2025.102526","DOIUrl":"10.1016/j.is.2025.102526","url":null,"abstract":"<div><div>While adversarial attacks on natural language processing systems have been extensively studied in English, their impact on morphologically complex languages remains poorly understood. We investigate how text classification systems respond to adversarial attacks across Arabic, English, and French — languages chosen for their distinct linguistic properties. Building on the DeepWordBug framework, we develop multilingual attack strategies that combine random perturbations with targeted modifications guided by model interpretability. We also introduce novel attack methods that exploit language-specific features like orthographic variations and syntactic patterns. Testing these approaches on a diverse dataset of news articles (9,030 Arabic, 14,501 English) and movie reviews (200,000 French), we find that interpretability-guided attacks are particularly effective, achieving misclassification rates of 58%–62% across languages. Language-specific perturbations also proved potent, degrading model performance to F1-scores between 0.38 and 0.63. However, incorporating adversarial examples during training markedly improved model robustness, with F1-scores recovering to above 0.82 across all test conditions. Beyond the immediate findings, this work reveals how adversarial vulnerability manifests differently across languages with varying morphological complexity, offering key insights for building more resilient multilingual NLP systems.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"131 ","pages":"Article 102526"},"PeriodicalIF":3.0,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143196971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Majid Rafiei, Mahsa Pourbafrani, Wil M.P. van der Aalst
{"title":"Federated conformance checking","authors":"Majid Rafiei, Mahsa Pourbafrani, Wil M.P. van der Aalst","doi":"10.1016/j.is.2025.102525","DOIUrl":"10.1016/j.is.2025.102525","url":null,"abstract":"<div><div>Conformance checking is a crucial aspect of process mining, where the main objective is to compare the actual execution of a process, as recorded in an event log, with a reference process model, e.g., in the form of a Petri net or a BPMN. Conformance checking enables identifying deviations, anomalies, or non-compliance instances. It offers different perspectives on problems in processes, bottlenecks, or process instances that are not compliant with the model. Performing conformance checking in federated (inter-organizational) settings allows organizations to gain insights into the overall process execution and to identify compliance issues across organizational boundaries, which facilitates process improvement efforts among collaborating entities. In this paper, we propose <em>a privacy-aware federated conformance-checking approach</em> that allows for evaluating the correctness of overall cross-organizational process models, identifying miscommunications, and quantifying their costs. For evaluation, we design and simulate a supply chain process with three organizations engaged in purchase-to-pay, order-to-cash, and shipment processes. We generate synthetic event logs for each organization as well as the complete process, and we apply our approach to identify and evaluate the cost of pre-injected miscommunications.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"131 ","pages":"Article 102525"},"PeriodicalIF":3.0,"publicationDate":"2025-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143196972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scalable and accurate online multivariate anomaly detection","authors":"Rebecca Salles , Benoit Lange , Reza Akbarinia , Florent Masseglia , Eduardo Ogasawara , Esther Pacitti","doi":"10.1016/j.is.2025.102524","DOIUrl":"10.1016/j.is.2025.102524","url":null,"abstract":"<div><div>The continuous monitoring of dynamic processes generates vast amounts of streaming multivariate time series data. Detecting anomalies within them is crucial for real-time identification of significant events, such as environmental phenomena, security breaches, or system failures, which can critically impact sensitive applications. Despite significant advances in univariate time series anomaly detection, scalable and efficient solutions for online detection in multivariate streams remain underexplored. This challenge becomes increasingly prominent with the growing volume and complexity of multivariate time series data in streaming scenarios. In this paper, we provide the first structured survey primarily focused on scalable and online anomaly detection techniques for multivariate time series, offering a comprehensive taxonomy. Additionally, we introduce the Online Distributed Outlier Detection (2OD) methodology, a novel well-defined and repeatable process designed to benchmark the online and distributed execution of anomaly detection methods. Experimental results with both synthetic and real-world datasets, covering up to hundreds of millions of observations, demonstrate that a distributed approach can enable centralized algorithms to achieve significant computational efficiency gains, averaging tens and reaching up to hundreds in speedup, without compromising detection accuracy.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"131 ","pages":"Article 102524"},"PeriodicalIF":3.0,"publicationDate":"2025-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143196973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gerard Pons , Besim Bilalli , Alberto Abelló , Santiago Blanco Sánchez
{"title":"On the use of trajectory data for tackling data scarcity","authors":"Gerard Pons , Besim Bilalli , Alberto Abelló , Santiago Blanco Sánchez","doi":"10.1016/j.is.2025.102523","DOIUrl":"10.1016/j.is.2025.102523","url":null,"abstract":"<div><div>In recent years, the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies have enabled the ubiquitous capturing of the location of moving objects. As a result, trajectory data are abundantly available and there is an increasing trend in analyzing them in the context of mobility data science. However, the abundant availability of trajectory data makes them compelling for other tasks too. In this paper, we propose the use of these data to tackle the data scarcity problem in data analysis by appropriately transforming them to extract relevant knowledge. The challenge lies not just in leveraging these abundant trajectory data, but in accurately deriving information from them that closely approximates the target variable of interest. Such knowledge can be used to generate or supplement the scarcely available datasets in a data analytics problem, thereby enhancing model learning. We showcase the feasibility of our approach in the domain of fishing where there is an abundance of trajectory data but a scarcity of detailed catch information. By using environmental data as explanatory variables, we build and compare models to predict fishing productivity using the actual catches from fishing reports and/or the inferred knowledge from the vessel’s trajectories. The results show that, mainly due to trajectory data being larger in volume than fishing data, models trained with the former obtain a precision 7.9% higher, despite the simplicity of the applied transformations.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"130 ","pages":"Article 102523"},"PeriodicalIF":3.0,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143311830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Furong Peng , Fujin Liao , Xuan Lu , Jianxing Zheng , Ru Li
{"title":"Revisiting explicit recommendation with DC-GCN: Divide-and-Conquer Graph Convolution Network","authors":"Furong Peng , Fujin Liao , Xuan Lu , Jianxing Zheng , Ru Li","doi":"10.1016/j.is.2024.102513","DOIUrl":"10.1016/j.is.2024.102513","url":null,"abstract":"<div><div>In recent years, Graph Convolutional Networks (GCNs) have primarily been applied to implicit feedback recommendation, with limited exploration in explicit scenarios. Although explicit recommendations can yield promising results, the conflict between the sparsity of data and the data starvation of deep learning hinders its development. Unlike implicit scenarios, explicit recommendation provides less evidence for predictions and requires distinguishing weights of edges (ratings) in the user-item graph.</div><div>To exploit high-order relations by GCN in explicit scenarios, we propose dividing the explicit rating graph into sub-graphs, each containing only one type of rating. We then employ GCN to capture user and item representations within each sub-graph, allowing the model to focus on rating-related user-item relations, and aggregate the representations of all subgraphs by MLP for the final recommendation. This approach, named Divide-and-Conquer Graph Convolution Network (DC-GCN), simplifies each model’s mission and highlights the strengths of individual modules. Considering that creating GCNs for each sub-graph may result in over-fitting and faces more serious data sparsity, we propose to share node embeddings for all GCNs to reduce the number of parameters, and create rating-aware embedding for each sub-graph to model rating-related relations. Moreover, to alleviate over-smoothing, we utilize random column mask to randomly select columns of node features to update in GCN layers. This technique can prevent node representations from becoming homogeneous in deep GCN networks. DC-GCN is evaluated on four public datasets and achieves the SOTA experimentally. Furthermore, DC-GCN is analyzed in cold-start and popularity bias scenarios, exhibiting competitive performance in various scenarios.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"130 ","pages":"Article 102513"},"PeriodicalIF":3.0,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143311864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}