Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management最新文献

筛选
英文 中文
Multi-query scheduling for time-critical data stream applications 多查询调度的时间关键型数据流应用
Yongluan Zhou, Ji Wu, A. K. Leghari
{"title":"Multi-query scheduling for time-critical data stream applications","authors":"Yongluan Zhou, Ji Wu, A. K. Leghari","doi":"10.1145/2484838.2484864","DOIUrl":"https://doi.org/10.1145/2484838.2484864","url":null,"abstract":"Many data stream applications, such as network intrusion detection, on-line financial tickers and environmental monitoring, typically exhibit certain \"real-time\" traits. In such applications, people are interested in strategies that ensure on-time delivery of query results. In this paper, we point out that traditional operator-based query scheduling strategies are insufficient to handle this class of problem. Therefore we choose to approach the issue from a new angle by modeling multi-query scheduling as a job-scheduling problem, a classical problem in real-time computing. By taking advantage of the wisdom in the real-time computing community, we propose several new scheduling strategies and algorithms to enhance the overall data stream query scheduling performance. Through extensive experiments over both real and synthetic data, we identify the important factors for scheduling performance and verify the effectiveness of our approaches.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"21 1","pages":"15:1-15:12"},"PeriodicalIF":0.0,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74173555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Towards a universal tracking database 走向一个通用的跟踪数据库
Gereon Schüller, Andreas Behrend
{"title":"Towards a universal tracking database","authors":"Gereon Schüller, Andreas Behrend","doi":"10.1145/2484838.2484845","DOIUrl":"https://doi.org/10.1145/2484838.2484845","url":null,"abstract":"In moving object databases, authors usually assume that number and position of objects to be processed are always known in advance. Detecting an unknown moving object and pursuing its movement, however, is usually left to tracking algorithms resting outside the database. Trackers are complex software systems which process sensor data and application-specific context information in order to detect, classify, monitor and predict the course of moving objects. As there are no universal software tools for realizing a tracker, such systems are usually hand-coded from scratch for each tracking application. In this paper we present a way how to implement a framework for implementing universal trackers inside a database. As a use case, we consider the well-known probabilistic multiple hypothesis tracking approach (PMHT) and the interacting multiple model filter (IMM) for realizing typical tracking tasks. We show that incremental view maintenance techniques and Bregman Ball trees are well-suited for efficiently implementing state-of-the-art trackers for processing streams of radar data.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"30 1","pages":"10:1-10:12"},"PeriodicalIF":0.0,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85450877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parameter-free and domain-independent similarity search with diversity 具有多样性的无参数、独立于域的相似度搜索
Lúcio F. D. Santos, Willian D. Oliveira, Mônica Ribeiro Porto Ferreira, A. Traina, C. Traina
{"title":"Parameter-free and domain-independent similarity search with diversity","authors":"Lúcio F. D. Santos, Willian D. Oliveira, Mônica Ribeiro Porto Ferreira, A. Traina, C. Traina","doi":"10.1145/2484838.2484854","DOIUrl":"https://doi.org/10.1145/2484838.2484854","url":null,"abstract":"New operators to execute similarity-based queries over multimedia data stored in Database Management Systems are increasingly demanded. However, searching in very large datasets, the basic operators often return elements too much similar both to the query center and to themselves, reducing the answer's utility. In this paper, we tackle the problem of providing diversity to similarity query results, and define techniques to assure that each element in the result set is different enough from the others. Existing techniques compel the user to define either a parameter to trade among similarity and diversity or a minimum similarity between result elements. Distinctly, our approach provides similarity queries with diversification using the influence concept, which automatically estimates the inherent diversity between the result set elements requiring no user-defined parameters. Furthermore, our technique can be applied over any data represented in a metric space, so it is both parameter and application-domain independent. The \"Better Results with Influence Diversification\" (BRID) technique is the basis to the k-Diverse Nearest Neighbor (BRIDk) and to the Range Diverse (BRIDr) algorithms, which execute k-nearest neighbor and range queries with diversification, showing that the technique can be applied to diversify any type of similarity queries. We also define a way to measure the diversification degree in a result set. Through a detailed experimental evaluation using our approach, we show that BRID outperforms the existing methods regarding both query diversification quality and execution times, being at least two orders of magnitude faster than the best existing approaches.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"44 1","pages":"5:1-5:12"},"PeriodicalIF":0.0,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83241937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Research lattices: towards a scientific hypothesis data model 研究格:走向科学假设数据模型
Bernardo Gonçalves, F. Porto
{"title":"Research lattices: towards a scientific hypothesis data model","authors":"Bernardo Gonçalves, F. Porto","doi":"10.1145/2484838.2484861","DOIUrl":"https://doi.org/10.1145/2484838.2484861","url":null,"abstract":"As the problems of scientific interest raise in scale and complexity, scientists have to tacitly manage too many analytic elements. Hypotheses are worked out to drive research towards successful explanation and prediction, which characterizes science as a dynamic activity that is partially ordered towards progress. This paper motivates and introduces research lattices, carrying out a lattice-theoretic approach for hypothesis representation and management in large-scale science and engineering. The goal of this work is to equip scientists with tools to manipulate and query hypotheses while keeping track of research progress. We refer to SciDB's array data model and discuss how data and theories could be managed in a unified model management framework.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"50 1","pages":"41:1-41:4"},"PeriodicalIF":0.0,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83773759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Publishing trajectories with differential privacy guarantees 具有不同隐私保障的发布轨迹
Kaifeng Jiang, Dongxu Shao, S. Bressan, Thomas Kister, K. Tan
{"title":"Publishing trajectories with differential privacy guarantees","authors":"Kaifeng Jiang, Dongxu Shao, S. Bressan, Thomas Kister, K. Tan","doi":"10.1145/2484838.2484846","DOIUrl":"https://doi.org/10.1145/2484838.2484846","url":null,"abstract":"The pervasiveness of location-acquisition technologies has made it possible to collect the movement data of individuals or vehicles. However, it has to be carefully managed to ensure that there is no privacy breach. In this paper, we investigate the problem of publishing trajectory data under the differential privacy model. A straightforward solution is to add noise to a trajectory - this can be done either by adding noise to each coordinate of the position, to each position of the trajectory, or to the whole trajectory. However, such naive approaches result in trajectories with zigzag shapes and many crossings, making the published trajectories of little practical use. We introduce a mechanism called SDD (Sampling Distance and Direction), which is ε-differentially private. SDD samples a suitable direction and distance at each position to publish the next possible position. Numerical experiments conducted on real ship trajectories demonstrate that our proposed mechanism can deliver ship trajectories that are of good practical utility.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"127 1","pages":"12:1-12:12"},"PeriodicalIF":0.0,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73929761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 90
Adaptive exploration for large-scale protein analysis in the molecular dynamics database 分子动力学数据库中大规模蛋白质分析的自适应探索
Sarana Nutanong, N. Carey, Yanif Ahmad, A. Szalay, T. Woolf
{"title":"Adaptive exploration for large-scale protein analysis in the molecular dynamics database","authors":"Sarana Nutanong, N. Carey, Yanif Ahmad, A. Szalay, T. Woolf","doi":"10.1145/2484838.2484872","DOIUrl":"https://doi.org/10.1145/2484838.2484872","url":null,"abstract":"Molecular dynamics (MD) simulations generate detailed time-series data of all-atom motions. These simulations are leading users of the world's most powerful supercomputers, and are standard-bearers for a wide range of high-performance computing (HPC) methods. However, MD data exploration and analysis is in its infancy in terms of scalability, ease-of-use, and ultimately its ability to answer 'grand challenge' science questions. This demonstration introduces the Molecular Dynamics Database (MDDB) project at Johns Hopkins, to study the co-design of database methods for deep on-the-fly exploratory MD analyses with HPC simulations. Data exploration in MD suffers from a \"human bottleneck\", where the laborious administration of simulations leaves little room for domain experts to focus on tackling science questions. MDDB exploits the data-rich nature of MD simulations to provide adaptive control of the exploration process with machine learning techniques, specifically reinforcement learning (RL). We present MDDB's data and queries, architecture, and its use of RL methods. Our audience will co-operate with our steering algorithm and science partners, and witness MDDB's abilities to significantly reduce exploration times and direct computation resources to where they best address science questions.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"3 1","pages":"45:1-45:4"},"PeriodicalIF":0.0,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75666587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Education and career paths for data scientists 数据科学家的教育和职业道路
M. Balazinska, S. Davidson, Bill Howe, Alexandros Labrinidis
{"title":"Education and career paths for data scientists","authors":"M. Balazinska, S. Davidson, Bill Howe, Alexandros Labrinidis","doi":"10.1145/2484838.2484886","DOIUrl":"https://doi.org/10.1145/2484838.2484886","url":null,"abstract":"MOTIVATION: As industry and science are increasingly data-driven, the need for skilled data scientists is exceeding what our universities are producing. According to a Mckinsey report: \"By 2018, the United States alone could face a shortage of 140,000 to 190,000 people with deep analytical skills\". Similarly, the ability to extract knowledge from scientific data is accelerating discovery and we need the next generation of domain scientists to be experts not only in their domain but also in data management. At the same time, however, researchers in academia who focus on building instruments or data management tools are often less recognized for their contributions than researchers focusing purely on the actual science.\u0000 OVERVIEW: The goal of this panel will be to discuss all these challenges. We will discuss various aspects of how we should be educating both the emerging \"data science\" experts and the next generation of database and domain science experts. The panel will also discuss career paths for researchers who choose to specialize in developing new methods and tools for Big Data management in domain sciences, with recommendations for how we should better support these less traditional career paths.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"82 1","pages":"3:1-3:2"},"PeriodicalIF":0.0,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89023229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Parallel online aggregation in action 并行在线聚合正在起作用
Chengjie Qin, Florin Rusu
{"title":"Parallel online aggregation in action","authors":"Chengjie Qin, Florin Rusu","doi":"10.1145/2484838.2484874","DOIUrl":"https://doi.org/10.1145/2484838.2484874","url":null,"abstract":"Online aggregation provides continuous estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution, or can let the processing terminate and obtain the exact result. In this demonstration, we introduce a general framework for parallel online aggregation in which estimation does not incur overhead on top of the actual processing. We define a generic interface to express any estimation model that abstracts completely the execution details. We design multiple sampling-based estimators suited for parallel online aggregation and implement them inside the framework. Demonstration participants are shown how estimates to general SQL aggregation queries over terabytes of TPC-H data are generated during the entire processing. Due to parallel execution, the estimate converges to the correct result in a matter of seconds even for the most difficult queries. The behavior of the estimators is evaluated under different operating regimes of the distributed cluster used in the demonstration.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"27 1","pages":"46:1-46:4"},"PeriodicalIF":0.0,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84835885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Bulk sorted access for efficient top-k retrieval 批量排序访问,以实现高效的top-k检索
Dustin Lange, Felix Naumann
{"title":"Bulk sorted access for efficient top-k retrieval","authors":"Dustin Lange, Felix Naumann","doi":"10.1145/2484838.2484852","DOIUrl":"https://doi.org/10.1145/2484838.2484852","url":null,"abstract":"Efficient top-k retrieval of records from a database has been an active research field for many years. We approach the problem from a real-world application point of view, in which the order of records according to some similarity function on an attribute is not unique: Many records have same values in several attributes and thus their ranking in those attributes is arbitrary. For instance, in large person databases many individuals have the same first name, the same date of birth, or live in the same city. Existing algorithms, such as the Threshold Algorithm (TA), are ill-equipped to handle such cases efficiently.\u0000 We introduce a variation of TA, the Bulk Sorted Access Algorithm (BSA), which retrieves larger chunks of records from the sorted lists using fixed thresholds, and which focusses its efforts on records that are ranked high in more than one ordering and are thus more promising candidates. We experimentally show that our method outperforms TA and another previous method for top-k retrieval in those very common cases.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"199 1","pages":"39:1-39:4"},"PeriodicalIF":0.0,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73557802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sharing confidential data for algorithm development by multiple imputation 通过多重输入共享机密数据以进行算法开发
S. Verwer, S. V. D. Braak, Sunil Choenni
{"title":"Sharing confidential data for algorithm development by multiple imputation","authors":"S. Verwer, S. V. D. Braak, Sunil Choenni","doi":"10.1145/2484838.2484865","DOIUrl":"https://doi.org/10.1145/2484838.2484865","url":null,"abstract":"The availability of real-life data sets is of crucial importance for algorithm and application development, as these often require insight into the specific properties of the data. Often, however, such data are not released because of their proprietary and confidential nature. We propose to solve this problem using the statistical technique of multiple imputation, which is used as a powerful method for generating realistic synthetic data sets. Additionally, it is shown how the generated records can be combined into networked data using clustering techniques.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"47 1","pages":"42:1-42:4"},"PeriodicalIF":0.0,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85191511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信