Petri Puustinen, Maria Stratigi, Kostas Stefanidis
{"title":"STracker: A framework for identifying sentiment changes in customer feedbacks","authors":"Petri Puustinen, Maria Stratigi, Kostas Stefanidis","doi":"10.1016/j.is.2024.102491","DOIUrl":"10.1016/j.is.2024.102491","url":null,"abstract":"<div><div>Companies and organizations monitor customer satisfaction by collecting feedback through Likert scale questions and free-text responses. Freely expressed opinions, not bound to fixed questions, provide a detailed source of information that organizations can use to improve their daily operations. The organization’s quality assurance review processes require a timely follow-up on these customer opinions. However, solutions often address the analytics of textual information with topic discovery and sentiment analysis for a fixed time period. These frameworks also tend to focus on serving the purpose of a specific domain and terminology. In this study, we focus on a facilitation service to track discovered topics and their sentiments over time. This service is generic and can be applied to different domains. To evaluate the capabilities of the framework, we used two datasets with opposite types of wording. The study shows that the framework is capable of discovering similar topics over time and identifying their sentiment changes.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102491"},"PeriodicalIF":3.0,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142705038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Paolo Ferragina, Mariagiovanna Rotundo, Giorgio Vinciguerra
{"title":"Two-level massive string dictionaries","authors":"Paolo Ferragina, Mariagiovanna Rotundo, Giorgio Vinciguerra","doi":"10.1016/j.is.2024.102490","DOIUrl":"10.1016/j.is.2024.102490","url":null,"abstract":"<div><div>We study the problem of engineering space–time efficient data structures that support membership and rank queries on <em>very</em> large static dictionaries of strings.</div><div>Our solution is based on a very simple approach that decouples string storage and string indexing by means of a block-wise compression of the sorted dictionary strings (to be stored in external memory) and a succinct implementation of a Patricia trie (to be stored in internal memory) built on the first string of each block. On top of this, we design an in-memory cache that, given a sample of the query workload, augments the Patricia trie with additional information to reduce the number of I/Os of future queries.</div><div>Our experimental evaluation on two new datasets, which are at least one order of magnitude larger than the ones used in the literature, shows that (i) the state-of-the-art compressed string dictionaries, compared to Patricia tries, do not provide significant benefits when used in a large-scale indexing setting, and (ii) our two-level approach enables the indexing and storage of 3.5 billion strings taking 273 GB in just less than 200 MB of internal memory and 83 GB of compressed disk space, while still guaranteeing comparable or faster query performance than those offered by array-based solutions used in modern storage systems, such as RocksDB, thus possibly influencing their future design.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102490"},"PeriodicalIF":3.0,"publicationDate":"2024-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A generative and discriminative model for diversity-promoting recommendation","authors":"Yuli Liu","doi":"10.1016/j.is.2024.102488","DOIUrl":"10.1016/j.is.2024.102488","url":null,"abstract":"<div><div>Diversity-promoting recommender systems with the goal of recommending diverse and relevant results to users, have received significant attention. However, current studies often face a trade-off: they either recommend highly accurate but homogeneous items or boost diversity at the cost of relevance, making it challenging for users to find truly satisfying recommendations that meet both their obvious and potential needs. To overcome this competitive trade-off, we introduce a unified framework that simultaneously leverages a discriminative model and a generative model. This approach allows us to adjust the focus of learning dynamically. Specifically, our framework uses Variational Graph Auto-Encoders to enhance the diversity of recommendations, while Graph Convolution Networks are employed to ensure high accuracy in predicting user preferences. This dual focus enables our system to deliver recommendations that are both diverse and closely aligned with user interests. Inspired by the quality <em>vs.</em> diversity decomposition of Determinantal Point Process (DPP) kernel, we design the DPP likelihood-based loss function as the joint modeling loss. Extensive experiments on three real-world datasets, demonstrating that the unified framework goes beyond quality-diversity trade-off, <em>i.e.</em>, instead of sacrificing accuracy for promoting diversity, the joint modeling actually boosts both metrics.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102488"},"PeriodicalIF":3.0,"publicationDate":"2024-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142660690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Soundness unknotted: An efficient soundness checking algorithm for arbitrary cyclic process models by loosening loops","authors":"Thomas M. Prinz , Yongsun Choi , N. Long Ha","doi":"10.1016/j.is.2024.102476","DOIUrl":"10.1016/j.is.2024.102476","url":null,"abstract":"<div><div>Although domain experts usually create business process models, these models can still contain errors. For this reason, research and practice establish criteria for process models to provide confidence in the correctness or correct behavior of processes. One widespread criterion is soundness, which guarantees the absence of deadlocks and lacks of synchronization. Checking soundness of process models is not trivial. However, cyclic process models additionally increase the complexity to check soundness. This paper presents a novel approach for verifying soundness that has an efficient cubic worst-case runtime behavior, even for arbitrary cyclic process models. This approach relies on three key techniques — loop conversion, loop reduction, and loop decomposition — to convert any cyclic process model into a set of acyclic process models. Using this approach, we have developed five straightforward rules to verify the soundness, reusing existing approaches for checking soundness of acyclic models.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102476"},"PeriodicalIF":3.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142578535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pavol Jurik , Peter Schmidt , Martin Misut , Ivan Brezina , Marian Reiff
{"title":"The composition diagram of a complex process: Enhancing understanding of hierarchical business processes","authors":"Pavol Jurik , Peter Schmidt , Martin Misut , Ivan Brezina , Marian Reiff","doi":"10.1016/j.is.2024.102489","DOIUrl":"10.1016/j.is.2024.102489","url":null,"abstract":"<div><div>The article presents the Composition Diagram of a Complex Process (CDCP), a new diagramming method for modelling business processes with complex vertical structures. This Method addresses the limitations of traditional modelling techniques such as BPMN, Activity Diagrams (AD), and Event-Driven Process Chains (EPC).</div><div>The experiment was carried out on 277 students from different study programs and grades to determine the effectiveness of the methods. The main objective was to evaluate the usability and effectiveness of CDCP compared to established methods, focusing on two primary tasks: interpretation and diagram creation. The participant's performance was evaluated based on the objective results of the tasks and the subjective feedback of the questionnaire. The results indicate that CDCP was the effective method for the reading and drawing tasks, outperforming BPMN and EPC in terms of understanding and ease of use. Statistical analysis of variance showed that while the year of the study did not significantly affect performance, the study program and Method used had a significant effect. These findings highlight the potential of CDCP as a more accessible and intuitive business process modelling tool, even for users with minimal prior experience.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102489"},"PeriodicalIF":3.0,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142593300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Baocheng Yang , Bing Zhang , Kevin Cutsforth , Shanfu Yu , Xiaowen Yu
{"title":"Emerging industry classification based on BERT model","authors":"Baocheng Yang , Bing Zhang , Kevin Cutsforth , Shanfu Yu , Xiaowen Yu","doi":"10.1016/j.is.2024.102484","DOIUrl":"10.1016/j.is.2024.102484","url":null,"abstract":"<div><div>Accurate industry classification is central to economic analysis and policy making. Current classification systems, while foundational, exhibit limitations in the face of the exponential growth of big data. These limitations include subjectivity, leading to inconsistencies and misclassifications. To overcome these shortcomings, this paper focuses on utilizing the BERT model for classifying emerging industries through the identification of salient attributes within business descriptions. The proposed method identifies clusters of firms within distinct industries, thereby transcending the restrictions inherent in existing classification systems. The model exhibits an impressive degree of precision in categorizing business descriptions, achieving accuracy rates spanning from 84.11% to 99.66% across all 16 industry classifications. This research enriches the field of industry classification literature through a practical examination of the efficacy of machine learning techniques. Our experiments achieved strong performance, highlighting the effectiveness of the BERT model in accurately classifying and identifying emerging industries, providing valuable insights for industry analysts and policymakers.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102484"},"PeriodicalIF":3.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ExamGuard: Smart contracts for secure online test","authors":"Mayuri Diwakar Kulkarni, Ashish Awate, Makarand Shahade, Bhushan Nandwalkar","doi":"10.1016/j.is.2024.102485","DOIUrl":"10.1016/j.is.2024.102485","url":null,"abstract":"<div><div>The education sector is currently experiencing profound changes, primarily driven by the widespread adoption of online platforms for conducting examinations. This paper delves into the utilization of smart contracts as a means to revolutionize the monitoring and execution of online examinations, thereby guaranteeing the traceability of evaluation data and examinee activities. In this context, the integration of advanced technologies such as the PoseNet algorithm, derived from the TensorFlow Model, emerges as a pivotal component. By leveraging PoseNet, the system adeptly identifies both single and multiple faces of examinees, thereby ensuring the authenticity and integrity of examination sessions. Moreover, the incorporation of the COCO dataset facilitates the recognition of objects within examination environments, further bolstering the system's capabilities in monitoring examinee activities.of paramount importance is the secure storage of evidence collected during examinations, a task efficiently accomplished through the implementation of the blockchain technology. This platform not only ensures the immutability of data but also safeguards against potential instances of tampering, thereby upholding the credibility of examination results. Through the utilization of smart contracts, the proposed framework not only streamlines the examination process but also instills transparency and integrity, thereby addressing inherent challenges encountered in traditional examination methods. One of the key advantages of this technological integration lies in its ability to modernize examination procedures while concurrently reinforcing trust and accountability within the educational assessment ecosystem. By harnessing the power of smart contracts, educational institutions can mitigate concerns pertaining to data manipulation and malpractice, thereby fostering a more secure and reliable examination environment. Furthermore, the transparency afforded by blockchain technology ensures that examination outcomes are verifiable and auditable, instilling confidence among stakeholders and enhancing the overall credibility of the assessment process. In conclusion, the adoption of smart contracts represents a paradigm shift in the realm of educational assessment, offering a comprehensive solution to the challenges posed by traditional examination methods. By embracing advanced technologies such as PoseNet and blockchain, educational institutions can not only streamline examination procedures but also uphold the highest standards of integrity and accountability. As such, the integration of smart contracts holds immense potential in shaping the future of online examinations, paving the way for a more efficient, transparent, and trustworthy assessment ecosystem.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102485"},"PeriodicalIF":3.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537887","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Explaining results of path queries on graphs: Single-path results for context-free path queries","authors":"Jelle Hellings","doi":"10.1016/j.is.2024.102475","DOIUrl":"10.1016/j.is.2024.102475","url":null,"abstract":"<div><div>Many graph query languages use, at their core, <em>path queries</em> that yield node pairs <span><math><mrow><mo>(</mo><mi>m</mi><mo>,</mo><mi>n</mi><mo>)</mo></mrow></math></span> that are connected by a path of interest. For the end-user, such node pairs only give limited insight as to <em>why</em> this result is obtained, as the pair does not directly identify the underlying path of interest.</div><div>In this paper, we propose the <em>single-path semantics</em> to address this limitation of path queries. Under single-path semantics, path queries evaluate to a single path connecting nodes <span><math><mi>m</mi></math></span> and <span><math><mi>n</mi></math></span> and that satisfies the conditions of the query. To put our proposal in practice, we provide an efficient algorithm for evaluating <em>context-free path queries</em> using the single-path semantics. Additionally, we perform a short evaluation of our techniques that shows that the single-path semantics is practically feasible, even when query results grow large.</div><div>In addition, we explore the formal relationship between the single-path semantics we propose the problem of finding the <em>shortest string</em> in the intersection of a regular language (representing a graph) and a context-free language (representing a path query). As our formal results show, there is a distinction between the complexity of the single-path semantics for queries that use a single edge label and queries that use multiple edge labels: for queries that use a single edge label, the length of the shortest path is <em>linearly upper bounded</em> by the number of nodes in the graph; whereas for queries that use multiple edge labels, the length of the shortest path has a worst-case <em>quadratic lower bound</em>.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102475"},"PeriodicalIF":3.0,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hands-on analysis of using large language models for the auto evaluation of programming assignments","authors":"Kareem Mohamed , Mina Yousef , Walaa Medhat , Ensaf Hussein Mohamed , Ghada Khoriba , Tamer Arafa","doi":"10.1016/j.is.2024.102473","DOIUrl":"10.1016/j.is.2024.102473","url":null,"abstract":"<div><div>The increasing adoption of programming education necessitates efficient and accurate methods for evaluating students’ coding assignments. Traditional manual grading is time-consuming, often inconsistent, and prone to subjective biases. This paper explores the application of large language models (LLMs) for the automated evaluation of programming assignments. LLMs can use advanced natural language processing capabilities to assess code quality, functionality, and adherence to best practices, providing detailed feedback and grades. We demonstrate the effectiveness of LLMs through experiments comparing their performance with human evaluators across various programming tasks. Our study evaluates the performance of several LLMs for automated grading. Gemini 1.5 Pro achieves an exact match accuracy of 86% and a <span><math><mrow><mo>±</mo><mn>1</mn></mrow></math></span> accuracy of 98%. GPT-4o also demonstrates strong performance, with exact match and <span><math><mrow><mo>±</mo><mn>1</mn></mrow></math></span> accuracies of 69% and 97%, respectively. Both models correlate highly with human evaluations, indicating their potential for reliable automated grading. However, models such as Llama 3 70B and Mixtral 8 <span><math><mo>×</mo></math></span> 7B exhibit low accuracy and alignment with human grading, particularly in problem-solving tasks. These findings suggest that advanced LLMs are instrumental in scalable, automated educational assessment. Additionally, LLMs enhance the learning experience by offering personalized, instant feedback, fostering an iterative learning process. The findings suggest that LLMs could play a pivotal role in the future of programming education, ensuring scalability and consistency in evaluation.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"128 ","pages":"Article 102473"},"PeriodicalIF":3.0,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142529772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Influence maximization based on discrete particle swarm optimization on multilayer network","authors":"Saiwei Wang , Wei Liu , Ling Chen , Shijie Zong","doi":"10.1016/j.is.2024.102466","DOIUrl":"10.1016/j.is.2024.102466","url":null,"abstract":"<div><div>Influence maximization (IM) aims to strategically select influential users to maximize information propagation in social networks. Most of the existing studies focus on IM in single-layer networks. However, we have observed that individuals often engage in multiple social platforms to fulfill various social needs. To make better use of this observation, we consider an extended problem of how to maximize influence spread in multilayer networks. The Multilayer Influence Maximization (MLIM) problem is different from the IM problem because information propagation behaves differently in multilayer networks compared to single-layer networks: users influenced on one layer may trigger the propagation of information on another layer. Our work successfully models the information propagation process as a Multilayer Independent Cascade model in multilayer networks. Based on the characteristics of this model, we introduce an approximation function called Multilayer Expected Diffusion Value (MLEDV) for it. However, the NP-hardness of the MLIM problem has posed significant challenges to our work. To tackle the issue, we devise a novel algorithm based on Discrete Particle Swarm Optimization. Our algorithm consists of two stages: 1) the candidate node selection, where we devise a novel centrality metric called Random connectivity Centrality to select candidate nodes, which assesses the importance of nodes from a connectivity perspective. 2)the seed selection, where we employ a discrete particle swarm algorithm to find seed nodes from the candidate nodes. We use MLEDV as a fitness function to measure the spreading power of candidate solutions in our algorithm. Additionally, we introduce a Neighborhood Optimization strategy to increase the convergence of the algorithm. We conduct experiments on four real-world networks and two self-built networks and demonstrate that our algorithms are effective for the MLIM problem.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":"127 ","pages":"Article 102466"},"PeriodicalIF":3.0,"publicationDate":"2024-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142420321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}