Information Systems最新文献

筛选
英文 中文
Effective data exploration through clustering of local attributive explanations 通过对局部归因解释的聚类进行有效的数据探索
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-09-28 DOI: 10.1016/j.is.2024.102464
{"title":"Effective data exploration through clustering of local attributive explanations","authors":"","doi":"10.1016/j.is.2024.102464","DOIUrl":"10.1016/j.is.2024.102464","url":null,"abstract":"<div><div>Machine Learning (ML) has become an essential tool for modeling complex phenomena, offering robust predictions and comprehensive data analysis. Nevertheless, the lack of interpretability in these predictions often results in a closed-box effect, which the field of eXplainable Machine Learning (XML) aims to address. Local attributive XML methods, in particular, provide explanations by quantifying the contribution of each attribute to individual predictions, referred to as influences. This type of explanation is the most acute as it focuses on each instance of the dataset and allows the detection of individual differences. Additionally, aggregating local explanations allows for a deeper analysis of the underlying data. In this context, influences can be considered as a new data space to reveal and understand complex data patterns. We hypothesize that these influences, derived from ML explanations, are more informative than the original raw data, especially for identifying homogeneous groups within the data. To identify such groups effectively, we utilize a clustering approach. We compare clusters formed using raw data against those formed using influences computed by various local attributive XML methods. Our findings reveal that clusters based on influences consistently outperform those based on raw data, even when using models with low accuracy.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Lakehouse: A survey and experimental study 数据湖:调查与实验研究
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-09-26 DOI: 10.1016/j.is.2024.102460
{"title":"Data Lakehouse: A survey and experimental study","authors":"","doi":"10.1016/j.is.2024.102460","DOIUrl":"10.1016/j.is.2024.102460","url":null,"abstract":"<div><div>Efficient big data management is a dire necessity to manage the exponential growth in data generated by digital information systems to produce usable knowledge. Structured databases, data lakes, and warehouses have each provided a solution with varying degrees of success. However, a new and superior solution, the data Lakehouse, has emerged to extract actionable insights from unstructured data ingested from distributed sources. By combining the strengths of data warehouses and data lakes, the data Lakehouse can process and merge data quickly while ingesting and storing high-speed unstructured data with post-storage transformation and analytics capabilities. The Lakehouse architecture offers the necessary features for optimal functionality and has gained significant attention in the big data management research community. In this paper, we compare data lake, warehouse, and lakehouse systems, highlight their strengths and shortcomings, identify the desired features to handle the evolving challenges in big data management and analysis and propose an advanced data Lakehouse architecture. We also demonstrate the performance of three state-of-the-art data management systems namely HDFS data lake, Hive data warehouse, and Delta lakehouse in managing data for analytical query responses through an experimental study.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142356794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal graph processing in modern memory hierarchies 现代存储器分层中的时序图处理
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-09-21 DOI: 10.1016/j.is.2024.102462
{"title":"Temporal graph processing in modern memory hierarchies","authors":"","doi":"10.1016/j.is.2024.102462","DOIUrl":"10.1016/j.is.2024.102462","url":null,"abstract":"<div><div>Updates in graph DBMS lead to structural changes in the graph over time with different intermediate states. Capturing these changes and their time is one of the main purposes of temporal DBMS. Most DBMSs built their temporal features based on their non-temporal processing and storage without considering the memory hierarchy of the underlying system. This leads to slower temporal processing and poor storage utilization. In this paper, we propose a storage and processing strategy for (bi-) temporal graphs using temporal materialized views (TMV) while exploiting the memory hierarchy of a modern system. Further, we show a solution to the query containment problem for certain types of temporal graph queries. Finally, we evaluate the overhead and performance of the presented approach. The results show that using TMV reduces the runtime of temporal graph queries while using less memory.</div></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142319396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging reading and mapping: The role of reading annotations in facilitating feedback while concept mapping 连接阅读和绘图:在绘制概念图时,阅读注释在促进反馈中的作用
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-09-06 DOI: 10.1016/j.is.2024.102458
{"title":"Bridging reading and mapping: The role of reading annotations in facilitating feedback while concept mapping","authors":"","doi":"10.1016/j.is.2024.102458","DOIUrl":"10.1016/j.is.2024.102458","url":null,"abstract":"<div><p>Concept maps are visual tools for organizing knowledge, commonly used in education and design. The process often involves reading and developing conceptual models, where feedback is crucial. Learners (e.g., students, designers) often refer to reading materials, and receive feedback from instructors (e.g., teachers, stakeholders) based on the maps they create. However, annotations made by learners, like highlights, are usually not visible to instructors, limiting tailored feedback. We propose incorporating annotation practices into concept mapping. Learners could highlight text and link these highlights to existing or newly created concepts in their concept map. This way, instructors can access both the concept map and the relevant readings for better feedback. This vision is realized through <em>Concept&amp;Go</em>, a plug-in for the editor <em>CmapCloud</em>. This extension aims at the interplay between mapping, reading, and feedback during concept mapping. The effectiveness of this approach is demonstrated through a focus group (n=5) and a UTAUT evaluation (n=12). <em>Concept&amp;Go</em> is publicly available.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924001169/pdfft?md5=f1df1b7c90dae26d25484ea7d7b77c25&pid=1-s2.0-S0306437924001169-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142147687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A universal approach for simplified redundancy-aware cross-model querying 简化冗余感知跨模型查询的通用方法
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-09-04 DOI: 10.1016/j.is.2024.102456
{"title":"A universal approach for simplified redundancy-aware cross-model querying","authors":"","doi":"10.1016/j.is.2024.102456","DOIUrl":"10.1016/j.is.2024.102456","url":null,"abstract":"<div><p>Numerous challenges and open problems have appeared with the dawn of multi-model data. In most cases, single-model solutions cannot be straightforwardly extended, and new, efficient approaches must be found. In addition, since there are no standards related to combining and managing multiple models, the situation is even more complicated and confusing for users.</p><p>This paper deals with the most important aspect of data management — querying. To enable the user to grasp all the popular models, we base our solution on the abstract categorical representation of multi-model data, which can be viewed as a graph. To unify the querying of multi-model data, we enable the user to query the categorical graph using a SPARQL-based model-agnostic query language called MMQL. The query is then decomposed and translated into languages of the underlying systems. The intermediate results are then combined into the final categorical result that can be expressed in any selected format. The support for cross-model redundancy enables one to create distinct query plans and choose the optimal one. We also introduce a proof-of-concept implementation of our solution called <em>MM-quecat</em>.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142147684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tri-AL: An open source platform for visualization and analysis of clinical trials Tri-AL:用于临床试验可视化和分析的开源平台
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-09-04 DOI: 10.1016/j.is.2024.102459
{"title":"Tri-AL: An open source platform for visualization and analysis of clinical trials","authors":"","doi":"10.1016/j.is.2024.102459","DOIUrl":"10.1016/j.is.2024.102459","url":null,"abstract":"<div><p>ClinicalTrials.gov hosts an online database with over 440,000 medical studies (as of 2023) evaluating drugs, supplements, medical devices, and behavioral treatments. Target users include scientists, medical researchers, pharmaceutical companies, and other public and private institutions. Although ClinicalTrials has some filtering ability, it does not provide visualization tools, reporting tools or historical data; only the most recent state of each trial is visible to users. To fill these functionality gaps, we present <em>Tri-AL</em>: an open-source data platform for clinical trial visualization, information extraction, historical analysis, and reporting. This paper describes the design and functionality of <em>Tri-AL</em>, including a programmable module to incorporate machine learning models and extract disease-specific data from unstructured trial reports, which we demonstrate using Alzheimer’s disease reporting as a case study. We also highlight the use of <em>Tri-AL</em> for trial participation analysis in terms of sex, gender, race and ethnicity. The source code is publicly available at <span><span>https://github.com/pouyan9675/Tri-AL</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142147686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Electricity behaviors anomaly detection based on multi-feature fusion and contrastive learning 基于多特征融合和对比学习的用电行为异常检测
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-09-02 DOI: 10.1016/j.is.2024.102457
{"title":"Electricity behaviors anomaly detection based on multi-feature fusion and contrastive learning","authors":"","doi":"10.1016/j.is.2024.102457","DOIUrl":"10.1016/j.is.2024.102457","url":null,"abstract":"<div><p>Abnormal electricity usage detection is the process of discovering and diagnosing abnormal electricity usage behavior by monitoring and analyzing the electricity usage in the power system. How to improve the accuracy of anomaly detection is a popular research topic. Most studies use neural networks for anomaly detection, but ignore the effect of missing electricity data on anomaly detection performance. Missing value completion is an important method to improve the quality of electricity data and to optimize the anomaly detection performance. Moreover, most studies have ignored the potential correlation relationship between spatial features by modeling the temporal features of electricity data. Therefore, this paper proposes an electricity anomaly detection model based on multi-feature fusion and contrastive learning. The model integrates the temporal and spatial features to jointly accomplish electricity anomaly detection. In terms of temporal feature representation learning, an improved bi-directional LSTM is designed to achieve the missing value completion of electricity data, and combined with CNN to capture the electricity consumption behavior patterns in the temporal data. In terms of spatial feature representation learning, GCN and Transformer are used to fully explore the complex correlation relationships among data. In addition, in order to improve the performance of anomaly detection, this paper also designs a gated fusion module and combines the idea of contrastive learning to strengthen the representation ability of electricity data. Finally, we demonstrate through experiments that the method proposed in this paper can effectively improve the performance of electricity behavior anomaly detection.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142147685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A framework for measuring the quality of business process simulation models 衡量业务流程模拟模型质量的框架
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-08-22 DOI: 10.1016/j.is.2024.102447
{"title":"A framework for measuring the quality of business process simulation models","authors":"","doi":"10.1016/j.is.2024.102447","DOIUrl":"10.1016/j.is.2024.102447","url":null,"abstract":"<div><p>Business Process Simulation (BPS) is an approach to analyze the performance of business processes under different scenarios. For example, BPS allows us to estimate the impact of adding one or more resources on the cycle time of a process. The starting point of BPS is a process model annotated with simulation parameters (a BPS model). BPS models may be manually designed, based on information collected from stakeholders and from empirical observations, or automatically discovered from historical execution data. Regardless of its provenance, a key question when using a BPS model is how to assess its quality. In particular, in a setting where we are able to produce multiple alternative BPS models of the same process, this question becomes: How to determine which model is better, to what extent, and in what respect? In this context, this article studies the question of how to measure the quality of a BPS model with respect to its ability to accurately replicate the observed behavior of a process. Rather than pursuing a one-size-fits-all approach, the article recognizes that a process covers multiple perspectives. Accordingly, the article outlines a framework that can be instantiated in different ways to yield quality measures that tackle different process perspectives. The article defines a number of concrete quality measures and evaluates these measures with respect to their ability to discern the impact of controlled perturbations on a BPS model, and their ability to uncover the relative strengths and weaknesses of two approaches for automated discovery of BPS models. The evaluation shows that the proposed measures not only capture how close a BPS model is to the observed behavior, but they also help us to identify the sources of discrepancies.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0306437924001054/pdfft?md5=7958dc6fdab5faf4469760f9d839425a&pid=1-s2.0-S0306437924001054-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142089258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PathEL: A novel collective entity linking method based on relationship paths in heterogeneous information networks PathEL:基于异构信息网络关系路径的新型集体实体链接方法
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-08-13 DOI: 10.1016/j.is.2024.102433
{"title":"PathEL: A novel collective entity linking method based on relationship paths in heterogeneous information networks","authors":"","doi":"10.1016/j.is.2024.102433","DOIUrl":"10.1016/j.is.2024.102433","url":null,"abstract":"<div><p>Collective entity linking always outperforms independent entity linking because it considers the interdependencies among entities. However, the existing collective entity linking methods often have high time complexity, do not fully utilize the relationship information in heterogeneous information networks (HIN) and most of them are largely dependent on the special features associated with Wikipedia. Based on the above problems, this paper proposes a novel collective entity linking method based on relationship path in heterogeneous information networks (PathEL). The PathEL classifies complex relationships in HIN into 1-hop paths and 3 types of 2-hop paths, and measures entity correlation by the path information among entities, ultimately combining textual semantic information to realize collective entity linking. In addition, facing the high complexity of collective entity linking, this paper proposes to solve the problem by combining the variable sliding window data processing method and the two-step pruning strategy. The variable sliding window data processing method limits the number of entity mentions in each window and the pruning strategy reduces the number of candidate entities. Finally, the experimental results of three benchmark datasets verify that the model proposed in this paper performs better in entity linking than the baseline models. On the AIDA CoNLL dataset, compared to the second-ranked model, our model has improved P, R, and F1 scores by 1.61%, 1.54%, and 1.57%, respectively.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142007080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An incremental algorithm for repairing denial constraint violations 修复拒绝约束违规行为的增量算法
IF 3 2区 计算机科学
Information Systems Pub Date : 2024-08-05 DOI: 10.1016/j.is.2024.102435
{"title":"An incremental algorithm for repairing denial constraint violations","authors":"","doi":"10.1016/j.is.2024.102435","DOIUrl":"10.1016/j.is.2024.102435","url":null,"abstract":"<div><p>Data repairing algorithms are extensively studied for improving data quality. Denial constraints (DCs) are commonly employed to state quality specifications that data should satisfy and hence facilitate data repairing since DCs are general enough to subsume many other dependencies. Data in practice are usually frequently updated, which motivates the quest for efficient incremental repairing techniques in response to data updates. In this paper, we present the first incremental algorithm for repairing DC violations. Specifically, given a relational instance <span><math><mi>I</mi></math></span> consistent with a set <span><math><mi>Σ</mi></math></span> of DCs, and a set <span><math><mo>△</mo></math></span> <span><math><mi>I</mi></math></span> of tuple insertions to <span><math><mi>I</mi></math></span>, our aim is to find a set <span><math><mo>△</mo></math></span> <span><math><msup><mrow><mi>I</mi></mrow><mrow><mo>′</mo></mrow></msup></math></span> of tuple insertions such that <span><math><mi>Σ</mi></math></span> is satisfied on <span><math><mrow><mi>I</mi><mo>+</mo><mo>△</mo></mrow></math></span> <span><math><msup><mrow><mi>I</mi></mrow><mrow><mo>′</mo></mrow></msup></math></span>. We first formalize and prove the complexity of the problem of incremental data repairing with DCs. We then present techniques that combine auxiliary indexing structures to efficiently identify DC violations incurred by <span><math><mo>△</mo></math></span> <span><math><mi>I</mi></math></span> <em>w.r.t.</em> <span><math><mi>Σ</mi></math></span>, and further develop an efficient repairing algorithm to compute <span><math><mo>△</mo></math></span> <span><math><msup><mrow><mi>I</mi></mrow><mrow><mo>′</mo></mrow></msup></math></span> by resolving DC violations. Finally, using both real-life and synthetic datasets, we conduct extensive experiments to demonstrate the effectiveness and efficiency of our approach.</p></div>","PeriodicalId":50363,"journal":{"name":"Information Systems","volume":null,"pages":null},"PeriodicalIF":3.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141963870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信