Proceedings of the 35th International Conference on Scientific and Statistical Database Management最新文献

筛选
英文 中文
Towards Efficient Discovery of Spatially Interesting Patterns in Geo-referenced Sequential Databases 地理参考序列数据库中空间有趣模式的有效发现
Shota Suzuki, Uday Kiran Rage
{"title":"Towards Efficient Discovery of Spatially Interesting Patterns in Geo-referenced Sequential Databases","authors":"Shota Suzuki, Uday Kiran Rage","doi":"10.1145/3603719.3603743","DOIUrl":"https://doi.org/10.1145/3603719.3603743","url":null,"abstract":"A geo-referenced time series is a crucial form of spatiotemporal data. Useful information that can empower the users to achieve economic development is hidden in this series. When confronted with this problem, researchers modeled this series as a transactional database and discovered various user interest-based patterns. Since transactional databases disregard the items’ sequential ordering information, existing studies are inadequate to find interesting patterns in the data of those applications, where the items’ sequential ordering needs to be considered. With this motivation, this paper first presents a new data transformation technique that converts geo-referenced time series data into a geo-referenced sequential database that preserves the items’ sequential occurrence information. Second, this paper presents a novel model of geo-referenced frequent sequential patterns that may exist in a database. Third, a novel neighborhood-aware exploration technique has been presented to effectively reduce the search space and the computational cost of finding the desired patterns. Finally, we present an efficient algorithm to find all desired patterns in a database. Experimental results demonstrate that the proposed algorithm is efficient. We demonstrate the usefulness of our patterns with a case study, which involves finding congestion patterns in road network data.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133016042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated Learning on Personal Data Management Systems: Decentralized and Reliable Secure Aggregation Protocols 个人数据管理系统的联邦学习:分散和可靠的安全聚合协议
Julien Mirval, Luc Bouganim, Iulian Sandu Popa
{"title":"Federated Learning on Personal Data Management Systems: Decentralized and Reliable Secure Aggregation Protocols","authors":"Julien Mirval, Luc Bouganim, Iulian Sandu Popa","doi":"10.1145/3603719.3603730","DOIUrl":"https://doi.org/10.1145/3603719.3603730","url":null,"abstract":"The development and adoption of personal data management systems (PDMS) has been fueled by legal and technical means such as smart disclosure, data portability and data altruism. By using a PDMS, individuals can effortlessly gather and share data, generated directly by their devices or as a result of their interactions with companies or institutions. In this context, federated learning appears to be a very promising technology, but it requires secure, reliable, and scalable aggregation protocols to preserve user privacy and account for potential PDMS dropouts. Despite recent significant progress in secure aggregation for federated learning, we still lack a solution suitable for the fully decentralized PDMS context. This paper proposes a family of fully decentralized protocols that are scalable and reliable with respect to dropouts. We focus in particular on the reliability property which is key in a peer-to-peer system wherein aggregators are system nodes and are subject to dropouts in the same way as contributor nodes. We show that in a decentralized setting, reliability raises a tension between the potential completeness of the result and the aggregation cost. We then propose a set of strategies that deal with dropouts and offer different trade-offs between completeness and cost. We extensively evaluate the proposed protocols and show that they cover the design space allowing to favor completeness or cost in all settings.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130542670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
InfoMoD: Information-theoretic Model Diagnostics 信息理论模型诊断
Armin Esmaeilzadeh, Lukasz Golab, K. Taghva
{"title":"InfoMoD: Information-theoretic Model Diagnostics","authors":"Armin Esmaeilzadeh, Lukasz Golab, K. Taghva","doi":"10.1145/3603719.3603725","DOIUrl":"https://doi.org/10.1145/3603719.3603725","url":null,"abstract":"Validating and debugging machine learning models is done by testing them on unseen data. Analyzing model performance on various subsets of the data is critical for fairness, trust, bias detection and explainablility. In this paper, we describe a new way to do this. Our solution, called InfoMoD, applies recent work in information-theoretic data summarization to the problem of model diagnostics. Using real-life datasets, we show how InfoMod concisely describes how a model performs across different subsets of the data and produces expected performance indicators for individual test instances.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129089363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSLS: Meta-graph Search with Learnable Supernet for Heterogeneous Graph Neural Networks 基于可学习超级网络的异构图神经网络元图搜索
Yili Wang, Jiamin Chen, Qiutong Li, Changlong He, Jianliang Gao
{"title":"MSLS: Meta-graph Search with Learnable Supernet for Heterogeneous Graph Neural Networks","authors":"Yili Wang, Jiamin Chen, Qiutong Li, Changlong He, Jianliang Gao","doi":"10.1145/3603719.3603727","DOIUrl":"https://doi.org/10.1145/3603719.3603727","url":null,"abstract":"In recent years, heterogeneous graph neural networks (HGNNs) have achieved excellent performance. The efficient HGNNs consist of meta-graphs and aggregation operations. Since manually designing meta-graph is an expert-dependent and time-consuming process, the performance of HGNNs is limited. To address this challenge, the differentiable meta-graph search has been proposed to obtain promising meta-graph automatically. However, the previous differentiable meta-graph search constructs the supernet without learnable aggregation operations, which limits the semantics extracting ability of HGNNs with automatically designed meta-graph for downstream tasks. To solve this problem, we propose the Meta-graph Search with Learnable Supernet for Heterogeneous Graph Neural Networks (MSLS). Specifically, to obtain better performance HGNNs, MSLS constructs a supernet with learnable aggregation operations based on the meta-graphs. MSLS adopts decoupling training to train the learnable supernet and obtains the optimal meta-graph with learnable aggregation operations using a constrained evolution strategy. Extensive experiments show that our method (MSLS) achieves the best performance in different tasks.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"164 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116879349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Four Factors Affecting Missing Data Imputation 影响缺失数据输入的四个因素
A. Hackl, Jürgen Zeindl, Lisa Ehrlinger
{"title":"Four Factors Affecting Missing Data Imputation","authors":"A. Hackl, Jürgen Zeindl, Lisa Ehrlinger","doi":"10.1145/3603719.3604285","DOIUrl":"https://doi.org/10.1145/3603719.3604285","url":null,"abstract":"Missing data is a common problem in datasets and impacts the reliability of data analysis. Numerous methods to impute (i.e., predict and replace) missing values have been proposed. The quality of these imputed values depends on factors like correlation, percentage of missingness, or the mechanism behind the missing value. Despite comparative studies on imputation methods, conditions for their effectiveness and safe application lack dedicated investigation. This research aims to systematically investigate the impact of four factors on imputation quality. We specifically investigate the extent to which (1) missing data mechanism, (2) variable distribution, (3) correlation, and (4) percentage of missingness affect the imputation quality of eight different machine-learning-based imputation methods. The evaluation will be done on both a synthetic dataset and a real-world dataset from voestalpine Stahl GmbH.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124695847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ST-CopulaGNN : A Multi-View Spatio-Temporal Graph Neural Network for Traffic Forecasting ST-CopulaGNN:一种用于交通预测的多视图时空图神经网络
Pitikorn Khlaisamniang, S. Phoomvuthisarn
{"title":"ST-CopulaGNN : A Multi-View Spatio-Temporal Graph Neural Network for Traffic Forecasting","authors":"Pitikorn Khlaisamniang, S. Phoomvuthisarn","doi":"10.1145/3603719.3603740","DOIUrl":"https://doi.org/10.1145/3603719.3603740","url":null,"abstract":"Modern cities heavily rely on complex transportation, making accurate traffic speed prediction crucial for traffic management authorities. Classical methods, including statistical techniques and traditional machine learning techniques, fail to capture complex relationships, while deep learning approaches may have weaknesses such as error accumulation, difficulty in handling long sequences, and overlooking spatial correlations. Graph neural networks (GNNs) have shown promise in extracting spatial features from non-Euclidean graph structures, but they usually initialize the adjacency matrix based on distance and may fail to detect hidden statistical correlations. The choice of correlation measure can have a significant impact on the resulting adjacency matrix and the effectiveness of graph-based models. This paper proposes a novel approach for accurately forecasting traffic patterns by utilizing a multi-view spatio-temporal graph neural network that captures data from both realistic and statistical domains. Unlike traditional correlation measures such as Pearson correlation, copula models are utilized to extract hidden statistical correlations and construct multivariate distribution functions to obtain the correlation relationship among traffic nodes. A two-step approach is adopted, which involves selecting and testing different types of bivariate copulas to identify the ones that best fit the traffic data, and utilizing these copulas to create multi-weight adjacency matrices. The second step involves utilizing a graph convolutional network to extract spatial information and capturing temporal trends using dilated causal convolutions. The proposed ST-CopulaGNN model outperforms other models in spatio-temporal traffic forecasting that solely rely on distance-based adjacency matrices, such as DCRNN and Graph WaveNet. It also achieves the lowest MAE for 30 and 60 minutes ahead and the lowest MAPE for 15 minutes ahead on the PEMS-BAY dataset. The model incorporates copulas, and the study explores copula function selection and the impact of using paired time-series with a time lag. The findings suggest that using copula-based adjacency matrix configurations, particularly those including Clayton and Gumbel copulas, can enhance traffic forecasting accuracy.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114186025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Driven Dimensionality Reduction to Improve Modeling Performance✱ 数据驱动的降维,以提高建模性能
Joshua Chung, Marcos M. López de Prado, Horst Simon, Kesheng Wu
{"title":"Data Driven Dimensionality Reduction to Improve Modeling Performance✱","authors":"Joshua Chung, Marcos M. López de Prado, Horst Simon, Kesheng Wu","doi":"10.1145/3603719.3603744","DOIUrl":"https://doi.org/10.1145/3603719.3603744","url":null,"abstract":"In a number of applications, data may be anonymized, obfuscated, or highly noisy. In such cases, it is difficult to use domain knowledge or low-dimensional visualizations to engineer the features for tasks such as machine learning, instead, we explore dimensionality reduction (DR) as a data-driven approach for engineering these low-dimensional representations. Through a careful examination of available feature selection and feature extraction techniques, we propose a new class named feature clustering. These new methods could utilize different forms of clustering to help evaluate the relative importance of features and take on properties different from the well-known DR algorithms. To evaluate these algorithms, we develop a parallel computing framework that optimizes their hyperparameters on a sample of application datasets. This framework harnesses the parallel computing power to examine a large number of parameter combinations and enables hyperparameter tuning and model tuning purely based on observed performance. This optimization framework provides mechanism for users to control computational cost and is able to examine many parameter choices in seconds. On a set of building energy data where the key features are known based on domain knowledge, the optimized DR algorithms indeed identify the expected main drivers of building electricity usage: outdoor temperature and solar radiance. This shows the automated optimization procedure is able to find known features. In terms of modeling accuracy, a distance correlation-based feature clustering method outperforms other DR algorithms including the well-known KPCA, LLE, and UMAP on two different tests.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127064477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-representations Space Separation based Graph-level Anomaly-aware Detection 基于多表示空间分离的图级异常感知检测
Fu Lin, Haonan Gong, Mingkang Li, Zitong Wang, Yue Zhang, Xuexiong Luo
{"title":"Multi-representations Space Separation based Graph-level Anomaly-aware Detection","authors":"Fu Lin, Haonan Gong, Mingkang Li, Zitong Wang, Yue Zhang, Xuexiong Luo","doi":"10.1145/3603719.3603739","DOIUrl":"https://doi.org/10.1145/3603719.3603739","url":null,"abstract":"Graph structure patterns are widely used to model different area data recently. How to detect anomalous graph information on these graph data has become a popular research problem. The objective of this research is centered on the particular issue that how to detect abnormal graphs within a graph set. The previous works have observed that abnormal graphs mainly show node-level and graph-level anomalies, but these methods equally treat two anomaly forms above in the evaluation of abnormal graphs, which is contrary to the fact that different types of abnormal graph data have different degrees in terms of node-level and graph-level anomalies. Furthermore, abnormal graphs that have subtle differences from normal graphs are easily escaped detection by the existing methods. Thus, we propose a multi-representations space separation based graph-level anomaly-aware detection framework in this paper. To consider the different importance of node-level and graph-level anomalies, we design an anomaly-aware module to learn the specific weight between them in the abnormal graph evaluation process. In addition, we learn strictly separate normal and abnormal graph representation spaces by four types of weighted graph representations against each other including anchor normal graphs, anchor abnormal graphs, training normal graphs, and training abnormal graphs. Based on the distance error between the graph representations of the test graph and both normal and abnormal graph representation spaces, we can accurately determine whether the test graph is anomalous. Our approach has been extensively evaluated against baseline methods using ten public graph datasets, and the results demonstrate its effectiveness. The code for our method is publicly available on https://github.com/whb605/MssGAD.git","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123740574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SciDG: Benchmarking Scientific Dynamic Graph Queries 科学动态图查询的基准测试
Chenglin Zeng, Chuan Hu, Huajin Wang, Zhihong Shen
{"title":"SciDG: Benchmarking Scientific Dynamic Graph Queries","authors":"Chenglin Zeng, Chuan Hu, Huajin Wang, Zhihong Shen","doi":"10.1145/3603719.3603724","DOIUrl":"https://doi.org/10.1145/3603719.3603724","url":null,"abstract":"Dynamic graphs are increasingly being utilized in domain knowledge modeling and large-scale scientific data management. Managing dynamic graph data requires a graph database system that can handle constantly changing volumes and data versions, while maintaining an acceptable query latency related to versioning. To understand how the design of storage structures affects database performance and assist scientific application developers in finding the optimal storage structure for their dynamic graph application scenarios, we have designed an easy-to-use benchmark framework called SciDG. We also conducted a study on the latencies of five fundamental version-related queries for various scientific application scenarios using SciDG. We evaluated the performance of databases based on three distinct storage principles: Sp-DB, Dp-DB, and Tp-DB. The experimental results indicate that SciDG is a valuable tool for assessing the strengths and weaknesses of different storage structures for dynamic graphs in various scenarios. Additionally, it assists scientists in selecting the most suitable dynamic graph database system for their work.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114016187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast Algorithm for Embedded Order Dependency Validation 嵌入式订单依赖关系验证的快速算法
Daichi Amagata, Alejandro Ramos, Ryo Shirai, Takahiro Hara
{"title":"Fast Algorithm for Embedded Order Dependency Validation","authors":"Daichi Amagata, Alejandro Ramos, Ryo Shirai, Takahiro Hara","doi":"10.1145/3603719.3603720","DOIUrl":"https://doi.org/10.1145/3603719.3603720","url":null,"abstract":"Order Dependencies (ODs) have many applications, such as query optimization, data integration, and data cleaning. Although many works addressed the problem of discovering OD (and its variants), they do not consider datasets with missing values, a standard observation in real-world datasets. This paper introduces the novel notion of Embedded ODs to deal with missing values, and we propose an efficient algorithm for validating embedded ODs. We conduct experiments on real-world datasets, and the results confirm the efficiency of our algorithm.","PeriodicalId":314512,"journal":{"name":"Proceedings of the 35th International Conference on Scientific and Statistical Database Management","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132696605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信