{"title":"Mapping abstract queries to big data web resources for on-the-fly data integration and information retrieval","authors":"H. Jamil","doi":"10.1109/ICDEW.2014.6818304","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818304","url":null,"abstract":"The emergence of technologies such as XML, web services and cloud computing have helped, the proliferation of databases and their diversity pose serious barriers to meaningful information extraction from these “big databases”. Research in intention recognition has also progressed substantially, yet very little has been done to recognize query intents to search, select, map and extract responses from such enormous pools of candidate databases. Query mapping becomes truly complicated particularly in scientific databases where tools and functions are needed to interpret the database contents, semantics of which are usually hidden inside the functions. In this paper, we present a declarative meta-language, called BioVis, using which biologists potentially are able to express their “intentional queries” with the expectation that a mapping function μ is able to accurately understand the meaning of the queries and map them to the underlying resources appropriately. We show that such a function is technically feasible if we can design a schema mapping function that can tailor itself according to a knowledgebase and recognize entities in schema graphs. We offer this idea as a possible research problem for the community to address.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126798963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"In schema matching, even experts are human: Towards expert sourcing in schema matching","authors":"Tomer Sagi, A. Gal","doi":"10.1109/ICDEW.2014.6818301","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818301","url":null,"abstract":"Schema matching problems have been historically defined as a semi-automated task in which correspondences are generated by matching algorithms and subsequently validated by a single human expert. Emerging alternative models are based upon piecemeal human validation of algorithmic results and the usage of crowd based validation. We propose an alternative model in which human and algorithmic matchers are given more symmetric roles. Under this model, better insight into the respective strengths and weaknesses of human and algorithmic matchers is required. We present initial insights from a pilot study conducted and outline future work in this area.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"52 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113970900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Interactive data exploration based on user relevance feedback","authors":"Kyriaki Dimitriadou, Olga Papaemmanouil, Y. Diao","doi":"10.1109/ICDEW.2014.6818343","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818343","url":null,"abstract":"Interactive Data Exploration (IDE) applications typically involve users that aim to discover interesting objects by it-eratively executing numerous ad-hoc exploration queries. Therefore, IDE can easily become an extremely labor and resource intensive process. To support these applications, we introduce a framework that assists users by automatically navigating them through the data set and allows them to identify relevant objects without formulating data retrieval queries. Our approach relies on user relevance feedback on data samples to model user interests and strategically collects more samples to refine the model while minimizing the user effort. The system leverages decision tree classifiers to generate an effective user model that balances the trade-off between identifying all relevant objects and reducing the size of final returned (relevant and irrelevant) objects. Our preliminary experimental results demonstrate that we can predict linear patterns of user interests (i.e., range queries) with high accuracy while achieving interactive performance.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115407712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Faten A. Elshwimy, Alsayed Algergawy, A. Sarhan, E. Sallam
{"title":"Aggregation of similarity measures in schema matching based on generalized mean","authors":"Faten A. Elshwimy, Alsayed Algergawy, A. Sarhan, E. Sallam","doi":"10.1109/ICDEW.2014.6818306","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818306","url":null,"abstract":"Schema matching represents a critical step to integrate heterogeneous e-Business and shared-data applications. Most existing schema matching approaches rely heavily on similarity-based techniques, which attempt to discover correspondences based on various element similarity measures, each computed by an individual base matcher. It has been accepted that aggregating results of multiple base matchers is a promising technique to obtain more accurate matching correspondences. A number of current matching systems use experimental weights for aggregation of similarities among different element matchers while others use machine learning approaches to find optimal weights that should be assigned to different matchers. However, both approaches have their own deficiencies. To overcome the limitations of existing aggregation strategies and to achieve better performance, in this paper, we propose a new aggregation strategy, called the AHGM strategy, which aggregates multiple element matchers based on the concept of generalized mean. In particular, we first develop a practical way to obtain optimal weights that will be assigned to each associated matcher for the given aggregation task. We then use these weights in our aggregation method to improve the performance of matcher combining. To validate the performance of the proposed strategy, we conducted a set of experiments, and the obtained results are encouraging.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132637514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David Schwalb, Martin Faust, Jens Krüger, H. Plattner
{"title":"Leveraging in-memory technology for interactive analyses of point-of-sales data","authors":"David Schwalb, Martin Faust, Jens Krüger, H. Plattner","doi":"10.1109/ICDEW.2014.6818311","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818311","url":null,"abstract":"Retailers face not only the challenge of consolidating all the data generated by electronic point-of-sale (POS) terminals, but also to leverage the data to derive business value. Especially when the data is stored at its finest granularity recording the actual transactions with all their items, processing becomes a challenge. In this work, we describe how in-memory technology can help to analyze POS data and how it enables new types of enterprise applications. We show that it is possible to interactively explore the transactional data set without precomputing analytical summaries while providing users with full flexibility. As an example, we present a prototypical application for interactive analyses and exploration of 8 billion records of real data from a large retail company with sub-second response times.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134194971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predictive query processing on moving objects","authors":"Abdeltawab M. Hendawi","doi":"10.1109/ICDEW.2014.6818352","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818352","url":null,"abstract":"A fundamental category of location based services relies on predictive queries which consider the anticipated future locations of users. Predictive queries attracted the researchers' attention as they are widely used in several applications including traffic management, routing, location-based advertising, and ride sharing. This paper aims to present a generic and scalable system for predictive query processing on moving objects, e.g, vehicles. Inside the proposed system, two frameworks are provided to work in two different environments, (1) Panda framework for euclidean space, and (2) iRoad framework for road network. Unlike previous work in supporting predictive queries, the target of the proposed system is to: (a) support long-term query prediction as well as short term prediction, (b) scale up to large number of moving objects, and (c) efficiently support different types of predictive queries, e.g., predictive range, KNN, and aggregate queries.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127377388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"2B or not 2B and everything in between — novel evaluation methods for matching problems","authors":"Tomer Sagi","doi":"10.1109/ICDEW.2014.6818349","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818349","url":null,"abstract":"Solving matching problems in computer science entails generating alignments between structured data. Well known examples are schema matching, process model matching, ontology alignment, and Web service composition. Design of software systems aimed at solving these problems, and refinement of interim results, are aided by solution quality evaluation measures. Historically, measures have been based upon binary set-theory, required an expert generated exact-match and assumed a single expert review following the algorithmic effort. Motivated by new applications for data integration, the dissertation both extends commonly used measures and proposes new measures to support evaluation in a variety of scenarios. We review the measures proposed to date and present an outlook towards future work.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127168780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data stream partitioning re-optimization based on runtime dependency mining","authors":"Emeric Viel, Haruyasu Ueda","doi":"10.1109/ICDEW.2014.6818327","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818327","url":null,"abstract":"In distributed data stream processing, a program made of multiple queries can be parallelized by partitioning input streams according to the values of specific attributes, or partitioning keys. Applying different partitioning keys to different queries requires re-partitioning intermediary streams, causing extra communication and reduced throughput. Re-partitionings can be avoided by detecting dependencies between the partitioning keys applicable to each query. Existing partitioning optimization methods analyze query syntax at compile-time to detect inter-key dependencies and avoid re-partitionings. This paper extends those compile-time methods by adding a runtime re-optimization step based on the mining of temporal approximate dependencies (TADs) between partitioning keys. A TAD is defined in this paper as a type of dependency that can be approximately valid over a moving time window. Our evaluation, based on a simulation of the Linear Road Benchmark, showed a 94.5% reduction of the extra communication cost.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120958863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vasvi Kakkad, Akon Dey, A. Fekete, Bernhard Scholz
{"title":"Curracurrong cloud: Stream processing in the cloud","authors":"Vasvi Kakkad, Akon Dey, A. Fekete, Bernhard Scholz","doi":"10.1109/ICDEW.2014.6818328","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818328","url":null,"abstract":"The dominant model for computing with large-scale data in cloud environments has been founded on batch processing including the Map-Reduce model. Important use-cases such as monitoring and alerting in the cloud require instead the incremental and continual handling of new data. Thus recent systems such as Storm, Samza and S4 have adopted ideas from stream processing to the cloud environment. We describe a novel system, Curracurrong Cloud, that, for the first time, allows the computation and data origins to share a cloud-hosted cluster, offers a lightweight algebraic-style description of the processing pipeline, and supports automated placement of computation among compute resources.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132889996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RQ-RDF-3X: Going beyond triplestores","authors":"Jyoti Leeka, Srikanta J. Bedathur","doi":"10.1109/ICDEW.2014.6818337","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818337","url":null,"abstract":"Efficient storage and querying of large repositories of RDF content is important due to the widespread growth of Semantic Web and Linked Open Data initiatives. Many novel database systems that store RDF in its native form or within traditional relational storage have demonstrated their ability to scale to large volumes of RDF content. However, it is increasingly becoming obvious that the simple dyadic relationship captured through traditional triples alone is not sufficient for modelling multi-entity relationships, provenance of facts, etc. Such richer models are supported in RDF through two techniques - first, called reification which retains the triple nature of RDF and the second, a non-standard extension called N-Quads. In this paper, we explore the challenges of supporting such richer semantic data by extending the state-of-the-art RDF-3X system. We describe our implementation of RQ-RDF-3X, a reification and quad enhanced RDF-3X, which involved a significant re-engineering ranging from the set of indexes and their compression schemes to the query processing pipeline for queries over reified content. Using large RDF repositories such as YAGO2S and DBpedia, and a set of SPARQL queries that utilize reification model, we demonstrate that RQ-RDF-3X is significantly faster than RDF-3X.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127799882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}