{"title":"Enhancing transparency in public procurement: A data-driven analytics approach","authors":"Heriberto Felizzola , Camilo Gomez , Nicolas Arrieta , Vianey Jerez , Yilber Erazo , Geraldine Camacho","doi":"10.1016/j.is.2024.102430","DOIUrl":"10.1016/j.is.2024.102430","url":null,"abstract":"<div><p>Open data is a strategy used by governments to promote transparency and accountability in public procurement processes. To reap the benefits of open data, exploring and analyzing the data is necessary to gain meaningful insights into procurement practices. However, accessing, processing, and analyzing open data can be challenging for non-data-savvy users with domain expertise, creating a barrier to leveraging open procurement data. To address this issue, we present the design, development, and implementation of a visual analytics tool. This tool automates data extraction from multiple sources, performs data cleansing, standardization, and database processing, and generates meaningful visualizations to streamline public procurement analysis. In addition, the tool estimates and visualizes corruption risk indicators at different levels (e.g., regions or public entities), providing valuable insights into the integrity of the procurement process. Key contributions of this work include: (1) providing a comprehensive guide to the development of an open procurement data visualization tool; (2) proposing a data pipeline to support processing, corruption risk estimator and data visualization; (3) demonstrating through a case study how visual analytics can effectively use open data to generate insights that promote and enhance transparency.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102430"},"PeriodicalIF":3.0,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141849323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A survey of sequential recommendation systems: Techniques, evaluation, and future directions","authors":"Tesfaye Fenta Boka, Zhendong Niu, Rama Bastola Neupane","doi":"10.1016/j.is.2024.102427","DOIUrl":"10.1016/j.is.2024.102427","url":null,"abstract":"<div><p>Recommender systems are powerful tools that successfully apply data mining and machine learning techniques. Traditionally, these systems focused on predicting a single interaction, such as a rating between a user and an item. However, this approach overlooks the complexity of user interactions, which often involve multiple interactions over time, such as browsing, adding items to a cart, and more. Recent research has shifted towards leveraging this richer data to build more detailed user profiles and uncover complex user behavior patterns. Sequential recommendation systems have gained significant attention recently due to their ability to model users’ evolving preferences over time. This survey explores how these systems utilize interaction history to make more accurate and personalized recommendations. We provide an overview of the techniques employed in sequential recommendation systems, discuss evaluation methodologies, and highlight future research directions. We categorize existing approaches based on their underlying principles and evaluate their effectiveness in various application domains. Additionally, we outline the challenges and opportunities in sequential recommendation systems.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102427"},"PeriodicalIF":3.0,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141705170","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Making cyber-human systems smarter","authors":"Steven Alter","doi":"10.1016/j.is.2024.102428","DOIUrl":"10.1016/j.is.2024.102428","url":null,"abstract":"<div><div>The term smart is often used carelessly in relation to systems, devices, and other entities such as cities that capture or otherwise process or use information. This conceptual paper treats the idea of smartness in a way that suggests directions for making cyber-human systems smarter. Cyber-human systems can be viewed as work systems. This paper defines work system, cyber-human system, algorithmic agent, and smartness of systems and devices. It links those ideas to challenges that can be addressed by applying ideas that managers and IS designers discuss rarely, if at all, such as dimensions of smartness for devices and systems, facets of work, roles and responsibilities of algorithmic agents, different types of engagement and patterns of interaction between people and algorithmic agents, explicit use of various types of knowledge objects, and performance criteria that are often deemphasized. In combination, those ideas reveal many opportunities for IS analysis and design practice to make cyber-human systems smarter.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"127 ","pages":"Article 102428"},"PeriodicalIF":3.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141704999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Hasan Anowar , Abdullah Shamail , Xiaoyu Wang , Goce Trajcevski , Sohail Murad , Cynthia J. Jameson , Ashfaq Khokhar
{"title":"Compressing generalized trajectories of molecular motion for efficient detection of chemical interactions","authors":"Md Hasan Anowar , Abdullah Shamail , Xiaoyu Wang , Goce Trajcevski , Sohail Murad , Cynthia J. Jameson , Ashfaq Khokhar","doi":"10.1016/j.is.2024.102426","DOIUrl":"10.1016/j.is.2024.102426","url":null,"abstract":"<div><p>Molecular Dynamics (MD) simulation is often used to study properties of various chemical interactions in domains such as drug discovery and development, particularly when executing real experimental studies is costly and/or unsafe. Studying the motion of trajectories of molecules/atoms generated from MD simulations provides a detailed atomic level spatial location of every atom for every time frame in the experiment. The analysis of this data leads to an atomic and molecular level understanding of interactions among the constituents of the system of interest. However, the data is extremely large and poses storage and processing challenges in the querying and analysis of associated atom level motion trajectories. We take a first step towards applying domain-specific <em>generalization</em> techniques for the data representation, subsequently used for applying trajectory compression algorithms towards reducing the storage requirements and speeding up the processing of within-distance queries over MD simulation data. We demonstrate that this <em>generalization-aware</em> compression, when applied to the dataset used in this case study, yields significant improvements in terms of data reduction and processing time without sacrificing the effectiveness of <em>within-distance</em> queries for threshold-based detection of molecular events of interest, such as the formation of <em>Hydrogen Bonds</em> (H-Bonds).</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102426"},"PeriodicalIF":3.0,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141700660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michele Chiari , Bin Xiang , Sergio Canzoneri , Galia Novakova Nedeltcheva , Elisabetta Di Nitto , Lorenzo Blasi , Debora Benedetto , Laurentiu Niculut , Igor Škof
{"title":"DOML: A new modeling approach to Infrastructure-as-Code","authors":"Michele Chiari , Bin Xiang , Sergio Canzoneri , Galia Novakova Nedeltcheva , Elisabetta Di Nitto , Lorenzo Blasi , Debora Benedetto , Laurentiu Niculut , Igor Škof","doi":"10.1016/j.is.2024.102422","DOIUrl":"https://doi.org/10.1016/j.is.2024.102422","url":null,"abstract":"<div><p>One of the main DevOps practices is the automation of resource provisioning and deployment of complex software. This automation is enabled by the explicit definition of <em>Infrastructure-as-Code</em> (IaC), i.e., a set of scripts, often written in different modeling languages, which defines the infrastructure to be provisioned and applications to be deployed.</p><p>We introduce the DevOps Modeling Language (DOML), a new Cloud modeling language for infrastructure deployments. DOML is a modeling approach that can be mapped into multiple IaC languages, addressing infrastructure provisioning, application deployment and configuration.</p><p>The idea behind DOML is to use a single modeling paradigm which can help to reduce the need of deep technical expertise in using different specialized IaC languages.</p><p>We present the DOML’s principles and discuss the related work on IaC languages. Furthermore, the advantages of the DOML for the end-user are demonstrated in comparison with some state-of-the-art IaC languages such as Ansible, Terraform, and Cloudify, and an evaluation of its effectiveness through several examples and a case study is provided.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102422"},"PeriodicalIF":3.0,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000802/pdfft?md5=c405d21d1f83737d4493eb269ebc2006&pid=1-s2.0-S0306437924000802-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141606273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Adeel Aslam, Giovanni Simonini, Luca Gagliardelli, Luca Zecchini, Sonia Bergamaschi
{"title":"Stream-aware indexing for distributed inequality join processing","authors":"Adeel Aslam, Giovanni Simonini, Luca Gagliardelli, Luca Zecchini, Sonia Bergamaschi","doi":"10.1016/j.is.2024.102425","DOIUrl":"https://doi.org/10.1016/j.is.2024.102425","url":null,"abstract":"<div><p>Inequality join is an operator to join data on inequality conditions and it is a fundamental building block for applications. While methods and optimizations exist for efficient inequality join in batch processing, little attention has been given to its streaming version, particularly to large-scale data-intensive applications that run on <em>Distributed Stream Processing Systems</em> (DSPSs). Designing an inequality join in streaming and distributed settings is not an easy task: <em>(i)</em> indexes have to be employed to efficiently support inequality-based comparisons, but the continuous stream of data imposes continuous insertions, updates, and deletions of elements in the indexes—hence a huge overhead for the DSPSs; <em>(ii)</em> oftentimes real data is skewed, which makes indexing even more challenging.</p><p>To address these challenges, we propose the <em>Stream-Aware inequality join</em> (STA), an indexing method that can reduce redundancy and index update overhead. STA builds a separate in-memory index structure for hotkeys, i.e., the most frequently used keys, which are automatically identified with an efficient data sketch. On the other hand, the cold keys are treated using a linked set of index structures. In this way, STA avoids many superfluous index updates for frequent items. Finally, we implement four state-of-the-art inequality join solutions for a widely employed DSPS (Apache Storm) and compare their performance with STA on four real-world data sets and a synthetic one. The results of our experimental evaluation reveal that our stream-aware approach outperforms existing solutions.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102425"},"PeriodicalIF":3.0,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141605957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Clemens Schreiber , Amine Abbad-Andaloussi , Barbara Weber
{"title":"On the cognitive and behavioral effects of abstraction and fragmentation in modularized process models","authors":"Clemens Schreiber , Amine Abbad-Andaloussi , Barbara Weber","doi":"10.1016/j.is.2024.102424","DOIUrl":"10.1016/j.is.2024.102424","url":null,"abstract":"<div><p>Process model comprehension is essential for a variety of technical and managerial tasks. To facilitate comprehension, process models are often divided into subprocesses when they reach a certain size. However, depending on the task type this can either support or impede comprehension. To investigate this hypothesis, we conduct a comprehensive eye-tracking study, where we test two different types of comprehension tasks. These are local tasks focusing on a single subprocess, thereby benefiting from abstraction (i.e., irrelevant information is hidden), and global tasks comprising multiple subprocesses, thereby also benefiting from abstraction but impeded by fragmentation (i.e., relevant information is distributed across multiple fragments). Our subsequent analysis at task (coarse-grained) and phase (fine-grained) levels confirms the opposing effects of abstraction and fragmentation. For global tasks, we observe lower task comprehension, higher cognitive load, as well as more complex search and inference behaviors, when compared to local ones. An additional qualitative analysis of search and inference phases, based on process maps and time series, provides additional insights into the evolution of information processing and confirms the differences between the two task types. The fine-grained analysis at the phase level is based on a novel research method, allowing to clearly separate information search from information inference. We provide an extensive validation of this research method. The outcome of this work provides a more thorough understanding of the effects of fragmentation, in the context of modularized process models, at a coarse-grained level as well as at a fine-grained level, allowing for the development of task- and user-centric support, and opening up future research opportunities to further investigate information processing during process comprehension.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102424"},"PeriodicalIF":3.0,"publicationDate":"2024-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000826/pdfft?md5=8812da61b4effd68d1674d39afc8cc27&pid=1-s2.0-S0306437924000826-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141707697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Jia , Ruizhe Ma , Weinan Niu , Li Yan , Zongmin Ma
{"title":"SFTe: Temporal knowledge graphs embedding for future interaction prediction","authors":"Wei Jia , Ruizhe Ma , Weinan Niu , Li Yan , Zongmin Ma","doi":"10.1016/j.is.2024.102423","DOIUrl":"10.1016/j.is.2024.102423","url":null,"abstract":"<div><p>Interaction prediction is a crucial task in the Social Internet of Things (SIoT), serving diverse applications including social network analysis and recommendation systems. However, the dynamic nature of items, users, and their interactions over time poses challenges in effectively capturing and analyzing these changes. Existing interaction prediction models often overlook the temporal aspect and lack the ability to model multi-relational user-item interactions over time. To address these limitations, in this paper, we propose a <strong>S</strong>tructure, <strong>F</strong>acticity, and <strong>T</strong>emporal information preservation <strong>e</strong>mbedding model (SFTe) to predict future interaction. Our model leverages the advantages of Temporal Knowledge Graphs (TKGs) that can capture both the multi-relations and evolution. We begin by modeling user-item interactions over time by constructing a Temporal Interaction Knowledge Graph (TIKG). We then employ Structure Embedding (SE), Facticity Embedding (FE), and Temporal Embedding (TE) to capture topological structure, facticity consistency, and temporal dependence, respectively. In SE, we focus on preserving the first-order relationships to capture the topological structure of TIKG. In the FE component, given the distinct nature of SIoT, we introduce an attention mechanism to capture the effect of entities with the same additional information for generating subgraph embeddings. Lastly, TE utilizes recurrent neural networks to model the temporal dependencies among subgraphs and capture the evolving dynamics of the interactions over time. Experimental results on standard future interaction prediction demonstrate the superiority of the SFTe model compared with the state-of-the-art methods. Our model effectively addresses the challenges of time-aware interaction prediction, showcasing the potential of TKGs to enhance prediction performance.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102423"},"PeriodicalIF":3.0,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141567259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dehua Liu , Selasi Kwashie , Yidi Zhang , Guangtong Zhou , Michael Bewong , Xiaoying Wu , Xi Guo , Keqing He , Zaiwen Feng
{"title":"An efficient approach for discovering Graph Entity Dependencies (GEDs)","authors":"Dehua Liu , Selasi Kwashie , Yidi Zhang , Guangtong Zhou , Michael Bewong , Xiaoying Wu , Xi Guo , Keqing He , Zaiwen Feng","doi":"10.1016/j.is.2024.102421","DOIUrl":"https://doi.org/10.1016/j.is.2024.102421","url":null,"abstract":"<div><p>Graph entity dependencies (GEDs) are novel graph constraints, unifying keys and functional dependencies, for property graphs. They have been found useful in many real-world data quality and data management tasks, including fact checking on social media networks and entity resolution. In this paper, we study the discovery problem of GEDs—finding a minimal cover of valid GEDs in a given graph data. We formalise the problem, and propose an effective and efficient approach to overcome major bottlenecks in GED discovery. In particular, we leverage existing graph partitioning algorithms to enable fast GED-scope discovery, and employ effective pruning strategies over the prohibitively large space of candidate dependencies. Furthermore, we define an interestingness measure for GEDs based on the minimum description length principle, to score and rank the mined cover set of GEDs. Finally, we demonstrate the scalability and effectiveness of our GED discovery approach through extensive experiments on real-world benchmark graph data sets; and present the usefulness of the discovered rules in different downstream data quality management applications.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102421"},"PeriodicalIF":3.0,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000796/pdfft?md5=8af2f9051185a5f57df5320cb4c1b7bd&pid=1-s2.0-S0306437924000796-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141583109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analyzing workload trends for boosting triple stores performance","authors":"Ahmed Al-Ghezi, Lena Wiese","doi":"10.1016/j.is.2024.102420","DOIUrl":"10.1016/j.is.2024.102420","url":null,"abstract":"<div><p>The Resource Description Framework (RDF) is widely used to model web data. The scale and complexity of the modeled data emphasized performance challenges on the RDF-triple stores. Workload adaption is one important strategy to deal with those challenges on the storage level. Current workload-adaption approaches lack the necessary generalization of the problem and only optimize part of the storage layer with the workload (mostly the replication). This creates a big performance gap within other data structures (e.g. indexes and cache) that could heavily benefit from the same workload adaption strategy. Moreover, the workload statistics are built collectively in most of the current approaches. Thus, the analysis process is unaware of whether workloads’ items are old or recent. However, that does not simulate the temporal trends that exist naturally in user queries which causes the analysis process to lag behind the rapid workload development. We present a novel universal adaption approach to the storage management of a distributed RDF store. The system aims to find optimal data assignments to the different indexes, replications, and join cache within the limited storage space. We present a cost model based on the workload that often contains frequent patterns. The workload is dynamically and continuously analyzed to evaluate predefined rules considering the benefits and costs of all options of assigning data to the storage structures. The objective is to reduce query execution time by letting different data containers compete on the limited storage space. By modeling the workload statistics as time series, we can apply well-known smoothing techniques allowing the importance of the workload to decay over time. That allows the universal adaption to stay tuned with potential changes in the workload trends.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"125 ","pages":"Article 102420"},"PeriodicalIF":3.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924000784/pdfft?md5=4a9d8f0acac2d10b05565ee129773c94&pid=1-s2.0-S0306437924000784-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141393476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}