{"title":"A constrained frequent pattern mining system for handling aggregate constraints","authors":"C. Leung, Fan Jiang, Lijing Sun, Yan Wang","doi":"10.1145/2351476.2351479","DOIUrl":"https://doi.org/10.1145/2351476.2351479","url":null,"abstract":"Frequent pattern mining searches data for sets of items that are frequently co-occurring together. Most of algorithms find all the frequent patterns. However, there are many real-life situations in which users is interested in only some small portions of the entire collection of frequent patterns. To mine patterns that satisfy the user aggregate constraints in the form of agg(X.attr)θconst, properties of constraints are exploited. When agg is sum, the mining can be complicated. Existing mining systems or algorithms usually make assumptions about the value or range of X.attr and/or const. In this paper, we propose a frequent pattern mining system that avoids making these assumptions and that effectively handles the sum constraints as well as other aggregate constraints.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"73 1","pages":"14-23"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86106922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Norbert Martínez-Bazan, Miquel Angel Aguila-Lorente, V. Muntés-Mulero, David Dominguez-Sal, S. Gómez-Villamor, J. Larriba-Pey
{"title":"Efficient graph management based on bitmap indices","authors":"Norbert Martínez-Bazan, Miquel Angel Aguila-Lorente, V. Muntés-Mulero, David Dominguez-Sal, S. Gómez-Villamor, J. Larriba-Pey","doi":"10.1145/2351476.2351489","DOIUrl":"https://doi.org/10.1145/2351476.2351489","url":null,"abstract":"The increasing amount of graph like data from social networks, science and the web has grown an interest in analyzing the relationships between different entities. New specialized solutions in the form of graph databases, which are generic and able to adapt to any schema as an alternative to RDBMS, have appeared to manage attributed multigraphs efficiently. In this paper, we describe the internals of DEX graph database, which is based on a representation of the graph and its attributes as maps and bitmap structures that can be loaded and unloaded efficiently from memory. We also present the internal operations used in DEX to manipulate these structures. We show that by using these structures, DEX scales to graphs with billions of vertices and edges with very limited memory requirements. Finally, we compare our graph-oriented approach to other approaches showing that our system is better suited for out-of-core typical graph-like operations.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"101 1","pages":"110-119"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80154413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sample-based forecasting exploiting hierarchical time series","authors":"Ulrike Fischer, Frank Rosenthal, Wolfgang Lehner","doi":"10.1145/2351476.2351490","DOIUrl":"https://doi.org/10.1145/2351476.2351490","url":null,"abstract":"Time series forecasting is challenging as sophisticated forecast models are computationally expensive to build. Recent research has addressed the integration of forecasting inside a DBMS. One main benefit is that models can be created once and then repeatedly used to answer forecast queries. Often forecast queries are submitted on higher aggregation levels, e. g., forecasts of sales over all locations. To answer such a forecast query, we have two possibilities. First, we can aggregate all base time series (sales in Austria, sales in Belgium...) and create only one model for the aggregate time series. Second, we can create models for all base time series and aggregate the base forecast values. The second possibility might lead to a higher accuracy but it is usually too expensive due to a high number of base time series. However, we actually do not need all base models to achieve a high accuracy, a sample of base models is enough. With this approach, we still achieve a better accuracy than an aggregate model, very similar to using all models, but we need less models to create and maintain in the database. We further improve this approach if new actual values of the base time series arrive at different points in time. With each new actual value we can refine the aggregate forecast and eventually converge towards the real actual value. Our experimental evaluation using several real-world data sets, shows a high accuracy of our approaches and a fast convergence towards the optimal value with increasing sample sizes and increasing number of actual values respectively.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"34 1","pages":"120-129"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82453641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A mediator-based system for distributed semantic provenance management systems","authors":"Mohamed Amin Sakka, Bruno Defude","doi":"10.1145/2351476.2351499","DOIUrl":"https://doi.org/10.1145/2351476.2351499","url":null,"abstract":"Today, most of the applications exchanging and processing documents on the web or in clouds become provenance aware and provides heterogeneous, decentralized and not interoperable provenance data. Provenance is becoming a key metadata for assessing electronic documents trustworthiness and should be considered as first class data.\u0000 Most of provenance management systems are either dedicated to a specific application (workflow, database) or a specific data type. Moreover, in modern infrastructures such as cloud, applications can be deployed and are executed on several provider infrastructure. So, there is a need to track the provenance of applications between different provenance providers. For these reasons, modeling, collecting and querying provenance across heterogeneous distributed sources is considered as a challenging task. In this paper, we introduce a framework based on semantic web models supporting syntactic and semantic heterogeneity of provenance sources and correlations between multiple sources. This framework is implemented as a provenance management system (or PMS). We focus on the design of a mediator based system allowing to federate distributed PMSs and we present optimization issues related to distributed query processing.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"56 1","pages":"193-198"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85513613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Ahmad-Kassem, Christophe Bobineau, C. Collet, Etienne Dublé, S. Grumbach, Fuda Ma, L. Martínez, S. Ubéda
{"title":"UBIQUEST, for rapid prototyping of networking applications","authors":"A. Ahmad-Kassem, Christophe Bobineau, C. Collet, Etienne Dublé, S. Grumbach, Fuda Ma, L. Martínez, S. Ubéda","doi":"10.1145/2351476.2351498","DOIUrl":"https://doi.org/10.1145/2351476.2351498","url":null,"abstract":"An UBIQUEST system provides a high level programming abstraction for rapid prototyping of heterogeneous and distributed applications in a dynamic environment. Such a system is perceived as a distributed database and the applications interact through declarative queries including declarative networking programs (e.g. routing) and/or specific data-oriented distributed algorithms (e.g. distributed join). Case-Based Reasoning is used for optimization of distributed queries when as there is no prior knowledge on data (sources) in networking applications, and certainly no related metadata such as data statistics.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"4 1","pages":"187-192"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77963028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sonia Khetarpaul, S. K. Gupta, L. V. Subramaniam, Ullas Nambiar
{"title":"Mining GPS traces to recommend common meeting points","authors":"Sonia Khetarpaul, S. K. Gupta, L. V. Subramaniam, Ullas Nambiar","doi":"10.1145/2351476.2351497","DOIUrl":"https://doi.org/10.1145/2351476.2351497","url":null,"abstract":"Scheduling a meeting is a difficult task for people who have overbooked calendars and many constraints. The complexity increases when the meeting is to be scheduled between parties who are situated in geographically distant locations of a city and have varying travel patterns. In this paper, we present a solution that identifies a common meeting point for a group of users who have temporal and spatial locality constraints that vary over time. The problem entails answering an Optimal Meeting Point (OMP) query in spatial databases. Under Euclidean space OMP query solution identification gets reduced to the problem of determining the geometric median of a set of points, a problem for which no exact solution exists. The OMP problem does not consider any constraints as far as availability of users is concerned whereas that is a key constraint in our setting. We therefore focus on finding a solution that uses daily movements information obtained from GPS traces for each user to compute stay points during various times of the day. We then determine interesting locations by analyzing the stay points across multiple users. The novelty of our solution is that the computations are done within the database by using various relational algebra operations in combination with statistical operations on the GPS trajectory data. This makes our solution scalable to larger groups of users and for multiple such requests. Once this list of stay points and interesting locations are obtained, we show that this data can be utilized to construct spatio-temporal graphs for the users that allow us efficiently decide a meeting place. We perform experiments on a real-world dataset and show that our method is effective in finding an optimal meeting point between two users.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"30 1","pages":"181-186"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80318878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Leung, S. Tanbeer, Bhavek P. Budhia, Lauren C. Zacharias
{"title":"Mining probabilistic datasets vertically","authors":"C. Leung, S. Tanbeer, Bhavek P. Budhia, Lauren C. Zacharias","doi":"10.1145/2351476.2351500","DOIUrl":"https://doi.org/10.1145/2351476.2351500","url":null,"abstract":"As frequent pattern mining plays an important role in various real-life applications, it has been the subject of numerous studies. Most of the studies mine transactional datasets of precise data. However, there are situations in which data are uncertain. Over the few years, Apriori-based, tree-based, and hyperlinked array structure based mining algorithms have been proposed to mine frequent patterns from these probabilistic datasets of uncertain data. These algorithms view the datasets \"horizontally\" as collections of transactions, and each records a set of items contained in that transaction. In this paper, we consider an alternative representation such that probabilistic datasets of uncertain data can be viewed \"vertically\" as collections of vectors. The vector for each item indicates which transactions contain that item. We also propose an algorithm called U-VIPER to mine these probabilistic datasets \"vertically for frequent patterns.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"65 1","pages":"199-204"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81453627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differential evolution versus genetic algorithms: towards symbolic aggregate approximation of non-normalized time series","authors":"Muhammad Marwan Muhammad Fuad","doi":"10.1145/2351476.2351501","DOIUrl":"https://doi.org/10.1145/2351476.2351501","url":null,"abstract":"The differential evolution (DE) is a very powerful search method for solving many optimization problems. In this paper we present a new scheme (DESAX) based on the differential evolution to localize the breakpoints utilized with the symbolic aggregate approximation method; one of the most important symbolic representation techniques for times series data. We compare the new scheme with a previous one (GASAX), which is based on the genetic algorithms, and we show how the new scheme outperforms the original one. We also show how (DESAX) can be used for the symbolic aggregate approximation of non-normalized time series.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"18 1","pages":"205-210"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88427190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Partitioning XML documents for iterative queries","authors":"N. Bidoit, Dario Colazzo, Noor Malla, C. Sartiani","doi":"10.1145/2351476.2351483","DOIUrl":"https://doi.org/10.1145/2351476.2351483","url":null,"abstract":"This paper presents an XML partitioning technique that allows main-memory query engines to process a class of XQuery queries, that we dub iterative queries, on arbitrarily large input documents. We provide a static analysis technique to recognize these queries. The static analysis is based on paths extracted from queries and does not need additional schema information. We then provide an algorithm using path information for partitioning the input documents of iterative queries. This algorithm admits a streaming implementation, whose effectiveness is experimentally validated.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"29 1","pages":"51-60"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81078960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient MD5 hash reversing using D.E.A. framework for sharing computational resources","authors":"Nunzio Cassavia, E. Masciari","doi":"10.1145/2351476.2351502","DOIUrl":"https://doi.org/10.1145/2351476.2351502","url":null,"abstract":"The recent advances in computing technology lead to the availability of a huge number of computational resources that can be easily connected through network infrastructures. Indeed, a really small fraction of the available computing power is fully exploited for performing effective computation of user tasks. On the contrary, there are several research projects that require a lot of computing power to reach their goals, but they usually lack adequate resources thus making the project activities quite hard to be completed. In this paper we describe D.E.A. (Distributed Execution Agent), a framework for sharing computational resources. We will exploit D.E.A. framework to tame the high computational demanding problem of hash MD5 reversing. We performed several experiments that confirmed the validity of our approach.","PeriodicalId":93615,"journal":{"name":"Proceedings. International Database Engineering and Applications Symposium","volume":"19 1","pages":"211-215"},"PeriodicalIF":0.0,"publicationDate":"2012-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81156052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}