Proceedings of the 27th International Conference on Scientific and Statistical Database Management最新文献

筛选
英文 中文
SciCSM SciCSM
Gangyi Zhu, Yi Wang, G. Agrawal
{"title":"SciCSM","authors":"Gangyi Zhu, Yi Wang, G. Agrawal","doi":"10.1145/2791347.2791361","DOIUrl":"https://doi.org/10.1145/2791347.2791361","url":null,"abstract":"Contrast set mining is a broadly applicable exploratory technique, which identifies interesting differences across contrast groups. The existing algorithms primarily target relational datasets with categorical attributes. There is clearly a need to apply this method to discover interesting patterns across scientific datasets, which feature arrays with numeric values. In this paper, we present a novel algorithm, SciCSM, for efficient contrast set mining over array-based datasets. We define how \"interesting\" contrast sets can be characterized for numeric and array data -- handling the fact that subsets can involve both value-based and/or dimension-based attributes. We extensively use bitmap indices to reduce computational complexity and enable processing of larger-scale data. We demonstrate both high efficiency and effectiveness of our algorithm by using multiple real-life datasets.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122069333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
A compression-based framework for the efficient analysis of business process logs 用于有效分析业务流程日志的基于压缩的框架
Bettina Fazzinga, S. Flesca, F. Furfaro, E. Masciari, L. Pontieri
{"title":"A compression-based framework for the efficient analysis of business process logs","authors":"Bettina Fazzinga, S. Flesca, F. Furfaro, E. Masciari, L. Pontieri","doi":"10.1145/2791347.2791351","DOIUrl":"https://doi.org/10.1145/2791347.2791351","url":null,"abstract":"The increasing availability of large process log repositories calls for efficient solutions for their analysis. In this regard, a novel specialized compression technique for process logs is proposed, that builds a synopsis supporting a fast estimation of aggregate queries, which are of crucial importance in exploratory and high-level analysis tasks. The synopsis is constructed by progressively merging the original log-tuples, which represent single activity executions within the process instances, into aggregate tuples, summarizing sets of activity executions. The compression strategy is guided by a heuristic aiming at limiting the loss of information caused by summarization, while guaranteeing that no information is lost on the set of activities performed within the process instances and on the order among their executions. The selection conditions in an aggregate query are specified in terms of a graph pattern, that allows precedence relationships over activity executions to be expressed, along with conditions on their starting times, durations, and executors. The efficacy of the compression technique, in terms of capability of reducing the size of the log and of accuracy of the estimates retrieved from the synopsis, has been experimentally validated.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129795312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
TarMiner: automatic extraction of miRNA targets from literature TarMiner:从文献中自动提取miRNA目标
R. Tsoupidi, Ilias Kanellos, Thanasis Vergoulis, I. Vlachos, A. Hatzigeorgiou, Theodore Dalamagas
{"title":"TarMiner: automatic extraction of miRNA targets from literature","authors":"R. Tsoupidi, Ilias Kanellos, Thanasis Vergoulis, I. Vlachos, A. Hatzigeorgiou, Theodore Dalamagas","doi":"10.1145/2791347.2791366","DOIUrl":"https://doi.org/10.1145/2791347.2791366","url":null,"abstract":"MicroRNAs (miRNAs) are small RNA molecules that target particular genes and prohibit their expression. Since many important diseases are related to the expression or non-expression of particular genes, knowing the miRNAs that affect these genes can help in finding possible treatments. In the last decade, a large amount of experimental studies trying to reveal the targets of several miRNAs has been published. A handful of curated databases that collect miRNA targets from the literature have been developed to make this information more easily available. However, due to the large number of existing published articles, maintaining these databases up-to-date is a tedious task that requires important resources. In this work we introduce TarMiner, a pipeline for automatic extraction of miRNA targets that can facilitate the curation process of databases that maintain miRNA validated targets.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123565216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
GEN: a database interface generator for HPC programs GEN:用于HPC程序的数据库接口生成器
Quan Pham, T. Malik
{"title":"GEN: a database interface generator for HPC programs","authors":"Quan Pham, T. Malik","doi":"10.1145/2791347.2791363","DOIUrl":"https://doi.org/10.1145/2791347.2791363","url":null,"abstract":"In this paper, we present GEN an interface generator that takes user-supplied C declarations and provides the necessary interface needed to load and access data from common scientific array databases such as SciDB and Rasdaman. GEN can be used for storing the output of parallel computations directly into the database and automates the previously used inefficient ingestion process which requires development of special database schemas for each computation. Further, GEN requires no modifications to existing C code and can build a working interface in minutes. We show how GEN can be used for Cosmology analysis programs to output data sets in real-time to a database and use for subsequent analysis. We show that GEN introduces modest overhead in program execution but is more efficient than writing to files and then loading. More significantly, it significantly reduces the programmatic overhead of learning new database languages.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129476413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How to quantify the impact of lossy transformations on change detection 如何量化有损转换对变更检测的影响
Pavel Efros, Erik Buchmann, Adrian Englhardt, Klemens Böhm
{"title":"How to quantify the impact of lossy transformations on change detection","authors":"Pavel Efros, Erik Buchmann, Adrian Englhardt, Klemens Böhm","doi":"10.1145/2791347.2791371","DOIUrl":"https://doi.org/10.1145/2791347.2791371","url":null,"abstract":"To ease the proliferation of big data, it frequently is transformed, be it by compression, be it by anonymization. Such transformations however modify characteristics of the data, such as changes in the case of time series. Changes however are important for subsequent analyses. The impact of those modifications depends on the application scenario, and quantifying it is far from trivial. This is because a transformation can shift or modify existing changes or introduce new ones. In this paper, we propose MILTON, a flexible and robust Measure for quantifying the Impact of Lossy Transformations on subsequent change detectiON. MILTON is applicable to any lossy transformation technique on time-series data and to any general-purpose change-detection approach. We have evaluated it with three real-world use cases. Our evaluation shows that MILTON allows to quantify the impact of lossy transformations and to choose the best one from a class of transformation techniques for a given application scenario.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128696542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Compact distance histogram: a novel structure to boost k-nearest neighbor queries 紧凑距离直方图:一种提高k近邻查询的新结构
M. Bedo, D. S. Kaster, A. Traina, C. Traina
{"title":"Compact distance histogram: a novel structure to boost k-nearest neighbor queries","authors":"M. Bedo, D. S. Kaster, A. Traina, C. Traina","doi":"10.1145/2791347.2791359","DOIUrl":"https://doi.org/10.1145/2791347.2791359","url":null,"abstract":"The k-Nearest Neighbor query (k-NNq) is one of the most useful similarity queries. Elaborated k-NNq algorithms depend on an initial radius to prune regions of the search space that cannot contribute to the answer. Therefore, estimating a suitable starting radius is of major importance to accelerate k-NNq execution. This paper presents a new technique to estimate a tight initial radius. Our approach, named CDH-kNN, relies on Compact Distance Histograms (CDHs), which are pivot-based histograms defined as piecewise linear functions. Such structures approximate the distance distribution and are compressed according to a given constraint, which can be a desired number of buckets and/or a maximum allowed error. The covering radius of a k-NNq is estimated based on the relationship between the query element and the CDHs' joint frequencies. The paper presents a complete specification of CDH-kNN, including CDH's construction and radii estimation. Extensive experiments on both real and synthetic datasets highlighted the efficiency of our approach, showing that it was up to 72% faster than existing algorithms, outperforming every competitor in all the setups evaluated. In fact, the experiments showed that our proposal was just 20% slower than the theoretical lower bound.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121706330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Batch matching of conjunctive triple patterns over linked data streams in the internet of things 物联网中关联数据流上连接三重模式的批量匹配
Yongrui Qin, Quan Z. Sheng, Nickolas J. G. Falkner, A. Shemshadi, E. Curry
{"title":"Batch matching of conjunctive triple patterns over linked data streams in the internet of things","authors":"Yongrui Qin, Quan Z. Sheng, Nickolas J. G. Falkner, A. Shemshadi, E. Curry","doi":"10.1145/2791347.2791364","DOIUrl":"https://doi.org/10.1145/2791347.2791364","url":null,"abstract":"The Internet of Things (IoT) envisions smart objects collecting and sharing data at a global scale via the Internet. One challenging issue is how to disseminate data to relevant consumers efficiently. This paper leverages semantic technologies, such as Linked Data, which can facilitate machine-to-machine (M2M) communications to build an efficient information dissemination system for semantic IoT. The system integrates Linked Data streams generated from various data collectors and disseminates matched data to relevant data consumers based on conjunctive triple pattern queries registered in the system by the consumers. We also design a new data structure, CTP-automata, to meet the high performance needs of Linked Data dissemination. We evaluate our system using a real-world dataset generated from a Smart Building Project. With CTP-automata, the proposed system can disseminate Linked Data at a speed of an order of magnitude faster than the existing approach with thousands of registered conjunctive queries.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125387488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Towards automated prediction of relationships among scientific datasets 迈向科学数据集之间关系的自动预测
Abdussalam Alawini, D. Maier, K. Tufte, Bill Howe, Rashmi Nandikur
{"title":"Towards automated prediction of relationships among scientific datasets","authors":"Abdussalam Alawini, D. Maier, K. Tufte, Bill Howe, Rashmi Nandikur","doi":"10.1145/2791347.2791358","DOIUrl":"https://doi.org/10.1145/2791347.2791358","url":null,"abstract":"Before scientists can analyze, publish, or share their data, they often need to determine how their datasets are related. Determining relationships helps scientists identify the most complete version of a dataset, detect versions of datasets that complement each other, and determine multiple datasets that overlap. In previous work, we showed how observable relationships between two datasets help scientists recall their original derivation connection. While that work helped with identifying relationships between two datasets, it is infeasible for scientists to use it for finding relationships between all possible pairs in a large collection of datasets. In order to deal with larger numbers of datasets, we are extending our methodology with a relationship-prediction system, ReDiscover, a tool to identify pairs from a collection of datasets that are most likely related and the relationship between them. We report on the initial design of ReDiscover, which uses machine-learning methods such as Conditional Random Fields and Support Vector Machines to the relationship-discovery problem. Our preliminarily evaluation shows that ReDiscover predicted relationships with an average accuracy of 87%.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129209613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
RUBIK: efficient threshold queries on massive time series RUBIK:大规模时间序列的有效阈值查询
Eleni Tzirita Zacharatou, F. Tauheed, T. Heinis, A. Ailamaki
{"title":"RUBIK: efficient threshold queries on massive time series","authors":"Eleni Tzirita Zacharatou, F. Tauheed, T. Heinis, A. Ailamaki","doi":"10.1145/2791347.2791372","DOIUrl":"https://doi.org/10.1145/2791347.2791372","url":null,"abstract":"An increasing number of applications from finance, meteorology, science and others are producing time series as output. The analysis of the vast amount of time series is key to understand the phenomena studied, particularly in the simulation sciences, where the analysis of time series resulting from simulation allows scientists to refine the model simulated. Existing approaches to query time series typically keep a compact representation in main memory, use it to answer queries approximately and then access the exact time series data on disk to validate the result. The more precise the in-memory representation, the fewer disk accesses are needed to validate the result. With the massive sizes of today's datasets, however, current in-memory representations oftentimes no longer fit into main memory. To make them fit, their precision has to be reduced considerably resulting in substantial disk access which impedes query execution today and limits scalability for even bigger datasets in the future. In this paper we develop RUBIK, a novel approach to compressing and indexing time series. RUBIK exploits that time series in many applications and particularly in the simulation sciences are similar to each other. It compresses similar time series, i.e., observation values as well as time information, achieving better space efficiency and improved precision. RUBIK translates threshold queries into two dimensional spatial queries and efficiently executes them on the compressed time series by exploiting the pruning power of a tree structure to find the result, thereby outperforming the state-of-the-art by a factor of between 6 and 23. As our experiments further indicate, exploiting similarity within and between time series is crucial to make query execution scale and to ultimately decouple query execution time from the growth of the data (size and number of time series).","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116657797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Data mining for isotopic mapping of bioarchaeological finds in a central european alpine passage 中欧阿尔卑斯通道生物考古发现同位素制图的数据挖掘
Markus Mauder, Eirini Ntoutsi, Peer Kröger, G. Grupe
{"title":"Data mining for isotopic mapping of bioarchaeological finds in a central european alpine passage","authors":"Markus Mauder, Eirini Ntoutsi, Peer Kröger, G. Grupe","doi":"10.1145/2791347.2791357","DOIUrl":"https://doi.org/10.1145/2791347.2791357","url":null,"abstract":"Isotopic mapping has become an indispensable tool for the assessment of mobility and trade of the past. However, modeling and understanding spatio-temporal isotopic variation is complicated by the small number of available samples, potential mobility of the investigated samples, sample preservation quality, uncertainty of measurements, and so forth. In this work, we use data mining techniques to build an isotopic map (descriptive modeling) and to determine the spatial origin of new samples (predictive modeling). In particular, we propose a clustering-based isotope ratio model and a scoring function for the origin prediction of new samples. Our data was extracted from real animal finds from an Alpine passage that spans three countries (Germany, Austria, and Italy) and comprises a high variety of isotopes and geological characteristics. Our results and evaluation by domain experts show that it is possible to derive a model of the area for both descriptive and predictive purposes.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122387101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信