32nd International Conference on Scientific and Statistical Database Management最新文献

筛选
英文 中文
The Vantage Index: Executing Distance Queries at Scale 优势索引:大规模执行距离查询
32nd International Conference on Scientific and Statistical Database Management Pub Date : 2020-07-07 DOI: 10.1145/3400903.3400933
Giannis Evagorou, M. Lavalle, T. Heinis
{"title":"The Vantage Index: Executing Distance Queries at Scale","authors":"Giannis Evagorou, M. Lavalle, T. Heinis","doi":"10.1145/3400903.3400933","DOIUrl":"https://doi.org/10.1145/3400903.3400933","url":null,"abstract":"Due to the proliferation of GPS-enabled devices, vast amounts of trajectory datasets are being collected every day. Analyzing this data efficiently and at scale is a major challenge. Several different types of spatio-temporal queries are used to analyze these datasets. One important query is the distance query on trajectory data which, given a query distance D, a point P and a time span T, finds all trajectories within D of P during T. This query is frequently used in traffic analysis and numerous other applications. In this paper we develop the means to efficiently and scalably analyse large amounts of trajectory data with the distance query. To this end we develop the means to distribute the trajectory data in a distributed infrastructure (Spark) as well as the index needed on the nodes to answer the query locally. As our experiments show, our approach is more efficient when compared to a baseline method.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123666485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Combining Two Worlds: MonetDB with Multi-Dimensional Index Structure Support to Efficiently Query Scientific Data
32nd International Conference on Scientific and Statistical Database Management Pub Date : 2020-07-07 DOI: 10.1145/3400903.3401691
Paul Blockhaus, David Broneske, Martin Schäler, V. Köppen, G. Saake
{"title":"Combining Two Worlds: MonetDB with Multi-Dimensional Index Structure Support to Efficiently Query Scientific Data","authors":"Paul Blockhaus, David Broneske, Martin Schäler, V. Köppen, G. Saake","doi":"10.1145/3400903.3401691","DOIUrl":"https://doi.org/10.1145/3400903.3401691","url":null,"abstract":"Reproducibility and generalizability are important criteria for today’s data management society. Hence, stand-alone solutions that work well in isolation, but cannot convince at system level lead to a frustrating user experience. As a consequence, in our demo, we take the step of accelerating queries on scientific data by integrating the multi-dimensional index structure Elf into the main-memory-optimized database management system MonetDB. The overall intention is to show that the stand-alone speed ups of using Elf can also be observed when integrated into a holistic system storing scientific data sets. In our prototypical implementation, we demonstrate the performance of an Elf-backed MonetDB on the standard OLAP-benchmark, TPC-H, and the genomic multi-dimensional range query benchmark from the scientific data community. Queries can be run live on both benchmarks by the audience, while they are able to create different indexes to accelerate selection performance.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125738205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Versatile Hypergraph Model for Document Collections 文档集合的通用超图模型
32nd International Conference on Scientific and Statistical Database Management Pub Date : 2020-07-07 DOI: 10.1145/3400903.3400919
Andreas Spitz, Dennis Aumiller, Bálint Soproni, Michael Gertz
{"title":"A Versatile Hypergraph Model for Document Collections","authors":"Andreas Spitz, Dennis Aumiller, Bálint Soproni, Michael Gertz","doi":"10.1145/3400903.3400919","DOIUrl":"https://doi.org/10.1145/3400903.3400919","url":null,"abstract":"Efficiently and effectively representing large collections of text is of central importance to information retrieval tasks such as summarization and search. Since models for these tasks frequently rely on an implicit graph structure of the documents or their contents, graph-based document representations are naturally appealing. For tasks that consider the joint occurrence of words or entities, however, existing document representations often fall short in capturing cooccurrences of higher order, higher multiplicity, or at varying proximity levels. Furthermore, while numerous applications benefit from structured knowledge sources, external data sources are rarely considered as integral parts of existing document models. To address these shortcomings, we introduce heterogeneous hypergraphs as a versatile model for representing annotated document collections. We integrate external metadata, document content, entity and term annotations, and document segmentation at different granularity levels in a joint model that bridges the gap between structured and unstructured data. We discuss selection and transformation operations on the set of hyperedges, which can be chained to support a wide range of query scenarios. To ensure compatibility with established information retrieval methods, we discuss projection operations that transform hyperedges to traditional dyadic cooccurrence graph representations. Using PostgreSQL and Neo4j, we investigate the suitability of existing database systems for implementing the hypergraph document model, and explore the impact of utilizing implicit and materialized hyperedge representations on storage space requirements and query performance.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125856096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Efficient Calculation of Empirical P-values for Association Testing of Binary Classifications 二分类关联检验经验p值的有效计算
32nd International Conference on Scientific and Statistical Database Management Pub Date : 2020-07-07 DOI: 10.1145/3400903.3400923
Konstantinos Zagganas, Thanasis Vergoulis, Spiros Skiadopoulos, Theodore Dalamagas
{"title":"Efficient Calculation of Empirical P-values for Association Testing of Binary Classifications","authors":"Konstantinos Zagganas, Thanasis Vergoulis, Spiros Skiadopoulos, Theodore Dalamagas","doi":"10.1145/3400903.3400923","DOIUrl":"https://doi.org/10.1145/3400903.3400923","url":null,"abstract":"Investigating whether two different classifications of a population are associated, is an interesting problem in many scientific fields. For this reason, various statistical tests to reveal this type of associations have been developed, with the most popular of them being Fisher’s exact test. However it has lately been shown that in some cases this test fails to produce accurate results. An alternative approach, known as randomization tests, was introduced to alleviate this issue, however, such tests are computationally intensive. In this paper, we introduce two novel indexing approaches that exploit frequently occurring patterns in classifications to avoid performing redundant computations during the analysis. We conduct a comprehensive set of experiments using real datasets and application scenarios to show that our approaches always outperform the state-of-the-art, with one approach being faster by an order of magnitude.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123802654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
WALLeSMART: Cloud Platform for Smart Farming WALLeSMART:智能农业云平台
32nd International Conference on Scientific and Statistical Database Management Pub Date : 2020-07-07 DOI: 10.1145/3400903.3401690
Amine Roukh, Fabrice Nolack Fote, S. Mahmoudi, S. Mahmoudi
{"title":"WALLeSMART: Cloud Platform for Smart Farming","authors":"Amine Roukh, Fabrice Nolack Fote, S. Mahmoudi, S. Mahmoudi","doi":"10.1145/3400903.3401690","DOIUrl":"https://doi.org/10.1145/3400903.3401690","url":null,"abstract":"Today, agricultural practices are supported by bio-informatics and emerging technologies such as remote sensing, cloud computing and the Internet of Things (IoT), which leads to the concept of “Smart Farming”. Smart farming is a cycle of intelligent detection and monitoring, analysis and planning, as well as control of agricultural operations using a cloud-based event management system. In this paper, we propose WALLeSMART, a cloud-based framework built to capitalize the efforts invested in building smart farming management systems, applied to the Wallonia region of Belgium. The framework proposes an architecture to address the challenges of acquisition, processing, and visualization of massive amounts of data, in both batch and real-time basis. An initial prototype has been developed and tested with various farms and shows prominent results.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130156972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Node Classification and Link Prediction in Social Graphs using RLVECN 基于RLVECN的社交图节点分类与链接预测
32nd International Conference on Scientific and Statistical Database Management Pub Date : 2020-07-07 DOI: 10.1145/3400903.3400928
Bonaventure C. Molokwu, Shaon Bhatta Shuvo, N. Kar, Ziad Kobti
{"title":"Node Classification and Link Prediction in Social Graphs using RLVECN","authors":"Bonaventure C. Molokwu, Shaon Bhatta Shuvo, N. Kar, Ziad Kobti","doi":"10.1145/3400903.3400928","DOIUrl":"https://doi.org/10.1145/3400903.3400928","url":null,"abstract":"Node classification and link prediction problems in Social Network Analysis (SNA) remain open research problems with respect to Artificial Intelligence (AI). Inherent representations about social network structures can be effectively harnessed for training AI models in a bid to detect clusters via classification of actors as well as predict ties with regard to a given social network. In this paper, we have proposed a unique hybrid model: Representation Learning via Knowledge-Graph Embeddings and ConvNet (RLVECN). Our proposition is designed for analyzing and extracting expressive feature representations from social network structures to aid in link prediction, node classification and community detection tasks. RLVECN utilizes an edge sampling technique for exploiting features of a given social network via learning the context of each actor with respect to its associate actors.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128715395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Hurricane in Bipartite Graphs: The Lethal Nodes of Butterflies 二部图中的飓风:蝴蝶的致命节点
32nd International Conference on Scientific and Statistical Database Management Pub Date : 2020-07-07 DOI: 10.1145/3400903.3400916
Qiuyu Zhu, Jiahong Zheng, Han Yang, Chen Chen, Xiaoyang Wang, Ying Zhang
{"title":"Hurricane in Bipartite Graphs: The Lethal Nodes of Butterflies","authors":"Qiuyu Zhu, Jiahong Zheng, Han Yang, Chen Chen, Xiaoyang Wang, Ying Zhang","doi":"10.1145/3400903.3400916","DOIUrl":"https://doi.org/10.1145/3400903.3400916","url":null,"abstract":"Bipartite graphs are widely used when modeling the relationships between two different types of entities, such as purchase relationships. In a bipartite graph, the number of butterflies, i.e., 2 × 2 biclique, is a fundamental metric for analyzing the structures and properties of bipartite graphs. Considering the deletion of critical nodes may affect the stability of bipartite graphs, we propose the butterfly minimization problem, where the attacker aims to maximize the number of butterflies removed from the graph by deleting b nodes. We prove the problem is NP-hard, and the objective function is monotonic and submodular. We adopt a greedy algorithm to solve the problem with 1 − 1/e approximation ratio. To scale for large graphs, novel methods are developed to reduce the searching space. Experiments over real-world bipartite graphs are conducted to demonstrate the advantages of proposed techniques.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128613468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Deluceva: Delta-Based Neural Network Inference for Fast Video Analytics Deluceva:基于delta的快速视频分析神经网络推理
32nd International Conference on Scientific and Statistical Database Management Pub Date : 2020-07-07 DOI: 10.1145/3400903.3400930
Jingjing Wang, M. Balazinska
{"title":"Deluceva: Delta-Based Neural Network Inference for Fast Video Analytics","authors":"Jingjing Wang, M. Balazinska","doi":"10.1145/3400903.3400930","DOIUrl":"https://doi.org/10.1145/3400903.3400930","url":null,"abstract":"Modern video analytics requires efficient machine learning model serving and evaluation. We present Deluceva, a system that optimizes video applications by applying incremental and approximate computation techniques. Experiments on three real models and six videos show that our prototype system can achieve significant performance gains up to 79% with F1 errors below 0.1.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127071896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Determining the provenance of land parcel polygons via machine learning 通过机器学习确定地块多边形的来源
32nd International Conference on Scientific and Statistical Database Management Pub Date : 2020-07-07 DOI: 10.1145/3400903.3400924
Vassilis Kaffes, G. Giannopoulos, Nontas Tsakonas, Spiros Skiadopoulos
{"title":"Determining the provenance of land parcel polygons via machine learning","authors":"Vassilis Kaffes, G. Giannopoulos, Nontas Tsakonas, Spiros Skiadopoulos","doi":"10.1145/3400903.3400924","DOIUrl":"https://doi.org/10.1145/3400903.3400924","url":null,"abstract":"An important task on land registration processes is to be able to determine the prevalent data provenance for a finalized polygon that represents a cadastral parcel, since the finalized polygon is derived by the examination of a set of initial polygons, drawn from several individual registers (databases). These registers might contain different, partially similar or conflicting information regarding the ownership, usage and polygon geometry of a cadastral parcel. In such cases, the cadastration expert either select one of of the initial geometries, or (in cases none of the initial accurately represents the finalized land parcel) creates a new geometry. Maintaining this provenance information is of high importance for further cadastration and validation/quality assessment processes; however, due to the gradual and long lasting nature of cadastration procedures, this information is absent from large parts of cadastral databases. In this paper, we present an approach for effectively classifying such land parcel polygons with respect to their provenance information. We propose a method that can produce highly accurate provenance recommendations based only on attributes derived from the geometry of a land parcel. In particular, we implement a set of spatial training features, capturing polygon properties and relations. These features are fed into several classification algorithms and are evaluated on a proprietary dataset of a cadastration company.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122319113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Improving geocoding quality via learning to integrate multiple geocoders 通过学习集成多个地理编码器来提高地理编码质量
32nd International Conference on Scientific and Statistical Database Management Pub Date : 2020-07-07 DOI: 10.1145/3400903.3400918
Konstantinos Alexis, Vassilis Kaffes, Ilias Varkas, A. Syngros, Nontas Tsakonas, G. Giannopoulos
{"title":"Improving geocoding quality via learning to integrate multiple geocoders","authors":"Konstantinos Alexis, Vassilis Kaffes, Ilias Varkas, A. Syngros, Nontas Tsakonas, G. Giannopoulos","doi":"10.1145/3400903.3400918","DOIUrl":"https://doi.org/10.1145/3400903.3400918","url":null,"abstract":"In this paper, we introduce an approach for improving the quality of the geocoding process. Geocoding refers to the procedure of mapping an address of textual form to a pair of accurate spatial coordinates. While there is a variety of available geocoders, both open source and commercial, that curate this mapping in either a semi-automated or fully-automated way, there is no one-size-fits-all system. Depending on the underlying algorithm of each geocoder, its output may be very accurate for some addresses, districts or countries, while failing to properly locate some others. Given that, our setup can be thought of as a meta-geocoding pipeline, built on top of the available geocoders. We propose a machine learning approach, which, given an address and a sequence of coordinate pairs suggested by standalone geocoders, it is able to identify the most accurate one. In order to achieve this, we formulate the task as a multi-class classification problem and introduce a series of domain specific training features, capturing essential information about each coordinate pair suggestion, as well as computing comparative metrics among different suggestions. These features are fed into several classification algorithms and are evaluated on a proprietary address dataset of a geo-marketing company. Furthermore, we present LGM-GC, a QGIS plugin, which provides the functionality of our approach through a user-friendly interface.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122752032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信