2014 IEEE 30th International Conference on Data Engineering Workshops最新文献

筛选
英文 中文
Mapping abstract queries to big data web resources for on-the-fly data integration and information retrieval 将抽象查询映射到大数据网络资源,实现实时数据集成和信息检索
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818304
H. Jamil
{"title":"Mapping abstract queries to big data web resources for on-the-fly data integration and information retrieval","authors":"H. Jamil","doi":"10.1109/ICDEW.2014.6818304","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818304","url":null,"abstract":"The emergence of technologies such as XML, web services and cloud computing have helped, the proliferation of databases and their diversity pose serious barriers to meaningful information extraction from these “big databases”. Research in intention recognition has also progressed substantially, yet very little has been done to recognize query intents to search, select, map and extract responses from such enormous pools of candidate databases. Query mapping becomes truly complicated particularly in scientific databases where tools and functions are needed to interpret the database contents, semantics of which are usually hidden inside the functions. In this paper, we present a declarative meta-language, called BioVis, using which biologists potentially are able to express their “intentional queries” with the expectation that a mapping function μ is able to accurately understand the meaning of the queries and map them to the underlying resources appropriately. We show that such a function is technically feasible if we can design a schema mapping function that can tailor itself according to a knowledgebase and recognize entities in schema graphs. We offer this idea as a possible research problem for the community to address.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126798963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
In schema matching, even experts are human: Towards expert sourcing in schema matching 在模式匹配中,专家也是人:走向模式匹配中的专家溯源
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818301
Tomer Sagi, A. Gal
{"title":"In schema matching, even experts are human: Towards expert sourcing in schema matching","authors":"Tomer Sagi, A. Gal","doi":"10.1109/ICDEW.2014.6818301","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818301","url":null,"abstract":"Schema matching problems have been historically defined as a semi-automated task in which correspondences are generated by matching algorithms and subsequently validated by a single human expert. Emerging alternative models are based upon piecemeal human validation of algorithmic results and the usage of crowd based validation. We propose an alternative model in which human and algorithmic matchers are given more symmetric roles. Under this model, better insight into the respective strengths and weaknesses of human and algorithmic matchers is required. We present initial insights from a pilot study conducted and outline future work in this area.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"52 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113970900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Interactive data exploration based on user relevance feedback 基于用户相关性反馈的交互式数据探索
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818343
Kyriaki Dimitriadou, Olga Papaemmanouil, Y. Diao
{"title":"Interactive data exploration based on user relevance feedback","authors":"Kyriaki Dimitriadou, Olga Papaemmanouil, Y. Diao","doi":"10.1109/ICDEW.2014.6818343","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818343","url":null,"abstract":"Interactive Data Exploration (IDE) applications typically involve users that aim to discover interesting objects by it-eratively executing numerous ad-hoc exploration queries. Therefore, IDE can easily become an extremely labor and resource intensive process. To support these applications, we introduce a framework that assists users by automatically navigating them through the data set and allows them to identify relevant objects without formulating data retrieval queries. Our approach relies on user relevance feedback on data samples to model user interests and strategically collects more samples to refine the model while minimizing the user effort. The system leverages decision tree classifiers to generate an effective user model that balances the trade-off between identifying all relevant objects and reducing the size of final returned (relevant and irrelevant) objects. Our preliminary experimental results demonstrate that we can predict linear patterns of user interests (i.e., range queries) with high accuracy while achieving interactive performance.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115407712","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Aggregation of similarity measures in schema matching based on generalized mean 基于广义均值的模式匹配中相似测度的聚合
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818306
Faten A. Elshwimy, Alsayed Algergawy, A. Sarhan, E. Sallam
{"title":"Aggregation of similarity measures in schema matching based on generalized mean","authors":"Faten A. Elshwimy, Alsayed Algergawy, A. Sarhan, E. Sallam","doi":"10.1109/ICDEW.2014.6818306","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818306","url":null,"abstract":"Schema matching represents a critical step to integrate heterogeneous e-Business and shared-data applications. Most existing schema matching approaches rely heavily on similarity-based techniques, which attempt to discover correspondences based on various element similarity measures, each computed by an individual base matcher. It has been accepted that aggregating results of multiple base matchers is a promising technique to obtain more accurate matching correspondences. A number of current matching systems use experimental weights for aggregation of similarities among different element matchers while others use machine learning approaches to find optimal weights that should be assigned to different matchers. However, both approaches have their own deficiencies. To overcome the limitations of existing aggregation strategies and to achieve better performance, in this paper, we propose a new aggregation strategy, called the AHGM strategy, which aggregates multiple element matchers based on the concept of generalized mean. In particular, we first develop a practical way to obtain optimal weights that will be assigned to each associated matcher for the given aggregation task. We then use these weights in our aggregation method to improve the performance of matcher combining. To validate the performance of the proposed strategy, we conducted a set of experiments, and the obtained results are encouraging.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132637514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Leveraging in-memory technology for interactive analyses of point-of-sales data 利用内存技术对销售点数据进行交互式分析
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818311
David Schwalb, Martin Faust, Jens Krüger, H. Plattner
{"title":"Leveraging in-memory technology for interactive analyses of point-of-sales data","authors":"David Schwalb, Martin Faust, Jens Krüger, H. Plattner","doi":"10.1109/ICDEW.2014.6818311","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818311","url":null,"abstract":"Retailers face not only the challenge of consolidating all the data generated by electronic point-of-sale (POS) terminals, but also to leverage the data to derive business value. Especially when the data is stored at its finest granularity recording the actual transactions with all their items, processing becomes a challenge. In this work, we describe how in-memory technology can help to analyze POS data and how it enables new types of enterprise applications. We show that it is possible to interactively explore the transactional data set without precomputing analytical summaries while providing users with full flexibility. As an example, we present a prototypical application for interactive analyses and exploration of 8 billion records of real data from a large retail company with sub-second response times.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134194971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Predictive query processing on moving objects 移动对象的预测查询处理
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818352
Abdeltawab M. Hendawi
{"title":"Predictive query processing on moving objects","authors":"Abdeltawab M. Hendawi","doi":"10.1109/ICDEW.2014.6818352","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818352","url":null,"abstract":"A fundamental category of location based services relies on predictive queries which consider the anticipated future locations of users. Predictive queries attracted the researchers' attention as they are widely used in several applications including traffic management, routing, location-based advertising, and ride sharing. This paper aims to present a generic and scalable system for predictive query processing on moving objects, e.g, vehicles. Inside the proposed system, two frameworks are provided to work in two different environments, (1) Panda framework for euclidean space, and (2) iRoad framework for road network. Unlike previous work in supporting predictive queries, the target of the proposed system is to: (a) support long-term query prediction as well as short term prediction, (b) scale up to large number of moving objects, and (c) efficiently support different types of predictive queries, e.g., predictive range, KNN, and aggregate queries.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127377388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
2B or not 2B and everything in between — novel evaluation methods for matching problems 2B或非2B以及介于两者之间的一切——匹配问题的新评估方法
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818349
Tomer Sagi
{"title":"2B or not 2B and everything in between — novel evaluation methods for matching problems","authors":"Tomer Sagi","doi":"10.1109/ICDEW.2014.6818349","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818349","url":null,"abstract":"Solving matching problems in computer science entails generating alignments between structured data. Well known examples are schema matching, process model matching, ontology alignment, and Web service composition. Design of software systems aimed at solving these problems, and refinement of interim results, are aided by solution quality evaluation measures. Historically, measures have been based upon binary set-theory, required an expert generated exact-match and assumed a single expert review following the algorithmic effort. Motivated by new applications for data integration, the dissertation both extends commonly used measures and proposes new measures to support evaluation in a variety of scenarios. We review the measures proposed to date and present an outlook towards future work.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127168780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Data stream partitioning re-optimization based on runtime dependency mining 基于运行时依赖挖掘的数据流分区重新优化
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818327
Emeric Viel, Haruyasu Ueda
{"title":"Data stream partitioning re-optimization based on runtime dependency mining","authors":"Emeric Viel, Haruyasu Ueda","doi":"10.1109/ICDEW.2014.6818327","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818327","url":null,"abstract":"In distributed data stream processing, a program made of multiple queries can be parallelized by partitioning input streams according to the values of specific attributes, or partitioning keys. Applying different partitioning keys to different queries requires re-partitioning intermediary streams, causing extra communication and reduced throughput. Re-partitionings can be avoided by detecting dependencies between the partitioning keys applicable to each query. Existing partitioning optimization methods analyze query syntax at compile-time to detect inter-key dependencies and avoid re-partitionings. This paper extends those compile-time methods by adding a runtime re-optimization step based on the mining of temporal approximate dependencies (TADs) between partitioning keys. A TAD is defined in this paper as a type of dependency that can be approximately valid over a moving time window. Our evaluation, based on a simulation of the Linear Road Benchmark, showed a 94.5% reduction of the extra communication cost.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120958863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Curracurrong cloud: Stream processing in the cloud Curracurrong云:云中的流处理
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818328
Vasvi Kakkad, Akon Dey, A. Fekete, Bernhard Scholz
{"title":"Curracurrong cloud: Stream processing in the cloud","authors":"Vasvi Kakkad, Akon Dey, A. Fekete, Bernhard Scholz","doi":"10.1109/ICDEW.2014.6818328","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818328","url":null,"abstract":"The dominant model for computing with large-scale data in cloud environments has been founded on batch processing including the Map-Reduce model. Important use-cases such as monitoring and alerting in the cloud require instead the incremental and continual handling of new data. Thus recent systems such as Storm, Samza and S4 have adopted ideas from stream processing to the cloud environment. We describe a novel system, Curracurrong Cloud, that, for the first time, allows the computation and data origins to share a cloud-hosted cluster, offers a lightweight algebraic-style description of the processing pipeline, and supports automated placement of computation among compute resources.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132889996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
RQ-RDF-3X: Going beyond triplestores RQ-RDF-3X:超越三重存储
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-03-01 DOI: 10.1109/ICDEW.2014.6818337
Jyoti Leeka, Srikanta J. Bedathur
{"title":"RQ-RDF-3X: Going beyond triplestores","authors":"Jyoti Leeka, Srikanta J. Bedathur","doi":"10.1109/ICDEW.2014.6818337","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818337","url":null,"abstract":"Efficient storage and querying of large repositories of RDF content is important due to the widespread growth of Semantic Web and Linked Open Data initiatives. Many novel database systems that store RDF in its native form or within traditional relational storage have demonstrated their ability to scale to large volumes of RDF content. However, it is increasingly becoming obvious that the simple dyadic relationship captured through traditional triples alone is not sufficient for modelling multi-entity relationships, provenance of facts, etc. Such richer models are supported in RDF through two techniques - first, called reification which retains the triple nature of RDF and the second, a non-standard extension called N-Quads. In this paper, we explore the challenges of supporting such richer semantic data by extending the state-of-the-art RDF-3X system. We describe our implementation of RQ-RDF-3X, a reification and quad enhanced RDF-3X, which involved a significant re-engineering ranging from the set of indexes and their compression schemes to the query processing pipeline for queries over reified content. Using large RDF repositories such as YAGO2S and DBpedia, and a set of SPARQL queries that utilize reification model, we demonstrate that RQ-RDF-3X is significantly faster than RDF-3X.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127799882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信