2014 IEEE 30th International Conference on Data Engineering Workshops最新文献

筛选
英文 中文
Automatic user steering for interactive data exploration 用于交互式数据探索的自动用户转向
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818348
Kyriaki Dimitriadou
{"title":"Automatic user steering for interactive data exploration","authors":"Kyriaki Dimitriadou","doi":"10.1109/ICDEW.2014.6818348","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818348","url":null,"abstract":"The amount of data that have flooded databases during the last few years have created several new problems for the data management community to address. One of the most prominent is the discovery of new and interesting information that is hidden in the underlying big data sets. As of now, in order to explore these data sets users begin with a few general queries, study their output and iteratively issue more specific ones until they discover interesting information. This is an onerous process that requires time and effort. In this thesis we are developing an automatic data exploration framework that will make the discovery of new information in a vast data space both effective and efficient. Our system asks users for their relevance feedback on strategically collected samples in an interactive manner, steers them towards the interesting parts of the database and eventually formulates the query that retrieves their data of interest. Our preliminary results are encouraging and allow us to persevere in the development of such a system.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115456956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scholarly big data information extraction and integration in the CiteSeerχ digital library CiteSeerχ数字图书馆学术大数据信息提取与集成
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818305
Kyle Williams, Jian Wu, Sagnik Ray Choudhury, Madian Khabsa, C. Lee Giles
{"title":"Scholarly big data information extraction and integration in the CiteSeerχ digital library","authors":"Kyle Williams, Jian Wu, Sagnik Ray Choudhury, Madian Khabsa, C. Lee Giles","doi":"10.1109/ICDEW.2014.6818305","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818305","url":null,"abstract":"CiteSeerχ is a digital library that contains approximately 3.5 million scholarly documents and receives between 2 and 4 million requests per day. In addition to making documents available via a public Website, the data is also used to facilitate research in areas like citation analysis, co-author network analysis, scalability evaluation and information extraction. The papers in CiteSeerχ are gathered from the Web by means of continuous automatic focused crawling and go through a series of automatic processing steps as part of the ingestion process. Given the size of the collection, the fact that it is constantly expanding, and the multiple ways in which it is used both by the public to access scholarly documents and for research, there are several big data challenges. In this paper, we provide a case study description of how we address these challenges when it comes to information extraction, data integration and entity linking in CiteSeerχ. We describe how we: aggregate data from multiple sources on the Web; store and manage data; process data as part of an automatic ingestion pipeline that includes automatic metadata and information extraction; perform document and citation clustering; perform entity linking and name disambiguation; and make our data and source code available to enable research and collaboration.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126931866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 56
A tool for personal data extraction 个人数据提取工具
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818307
Daniela Vianna, A. Yong, Chaolun Xia, A. Marian, Thu D. Nguyen
{"title":"A tool for personal data extraction","authors":"Daniela Vianna, A. Yong, Chaolun Xia, A. Marian, Thu D. Nguyen","doi":"10.1109/ICDEW.2014.6818307","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818307","url":null,"abstract":"Digital storage now acts as an archive of the memories of users worldwide, keeping record of data as well as the context in which the data was acquired. The massive amount of data available and the fact that it is fragmented across many services (e.g., Facebook) and devices (e.g., laptop) make it very difficult for users to find specific pieces of information that they remember having stored or accessed. Unifying this fragmented data into a single data set that includes contextual information would allow for much better indexing and searching of personal information. Thus, we have developed a personal data extraction tool as a first step toward this vision. In this paper, we present this extraction tool, along with some preliminary statistics about personal data gathered by the tool for several users. The goal of the data analysis is to give a glimpse of what the digital life of a person may look like, and how it is currently partitioned across many different services; moreover, it reinforces the fact that it is not possible for users to manually retrieve, store and access their extensive digital data without the support of a personalized information management tool.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127661947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Parallel join executions in RAMCloud RAMCloud中的并行连接执行
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818325
Christian Tinnefeld, Donald Kossmann, Joos-Hendrik Böse, H. Plattner
{"title":"Parallel join executions in RAMCloud","authors":"Christian Tinnefeld, Donald Kossmann, Joos-Hendrik Böse, H. Plattner","doi":"10.1109/ICDEW.2014.6818325","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818325","url":null,"abstract":"Modern large-scale storage systems provide not only storage capacity, but also processing power. When such a storage system serves as persistence for a database application, it is desirable to utilize its processing power for supporting query execution. In this paper, we evaluate the parallel execution of join operations in Stanford's RAMCloud which is a DRAM-based storage system connected via RDMA-enabled network adapters. We a) provide a system model to derive the execution costs for the Grace Join, the Distributed Block Nested Loop Join, and the Cyclo Join algorithm and their corresponding implementations in RAMCloud. We describe b) how the execution time for a single join operation depends on factors such as relation sizes, numbers of nodes used for a join, and the chosen algorithm. We finally introduce and evaluate c) a set of heuristics for parameterizing the execution of many join operations in parallel with the goal of maximizing the throughput.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127286406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
YCSB+T: Benchmarking web-scale transactional databases YCSB+T:对web规模的事务数据库进行基准测试
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818330
Akon Dey, A. Fekete, R. Nambiar, Uwe Röhm
{"title":"YCSB+T: Benchmarking web-scale transactional databases","authors":"Akon Dey, A. Fekete, R. Nambiar, Uwe Röhm","doi":"10.1109/ICDEW.2014.6818330","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818330","url":null,"abstract":"Database system benchmarks like TPC-C and TPC-E focus on emulating database applications to compare different DBMS implementations. These benchmarks use carefully constructed queries executed within the context of transactions to exercise specific RDBMS features, and measure the throughput achieved. Cloud services benchmark frameworks like YCSB, on the other hand, are designed for performance evaluation of distributed NoSQL key-value stores, early examples of which did not support transactions, and so the benchmarks use single operations that are not inside transactions. Recent implementations of web-scale distributed NoSQL systems like Spanner and Percolator, offer transaction features to cater to new web-scale applications. This has exposed a gap in standard benchmarks. We identify the issues that need to be addressed when evaluating transaction support in NoSQL databases. We describe YCSB+T, an extension of YCSB, that wraps database operations within transactions. In this framework, we include a validation stage to detect and quantify database anomalies resulting from any workload, and we gather metrics that measure transactional overhead. We have designed a specific workload called Closed Economy Workload (CEW), which can run within the YCSB+T framework. We share our experience with using CEW to evaluate some NoSQL systems.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130063355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 84
PDS4: A model-driven planetary science data architecture for long-term preservation PDS4:用于长期保存的模型驱动的行星科学数据体系结构
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818317
J. Hughes, D. Crichton, S. Hardman, E. Law, R. Joyner, P. Ramirez
{"title":"PDS4: A model-driven planetary science data architecture for long-term preservation","authors":"J. Hughes, D. Crichton, S. Hardman, E. Law, R. Joyner, P. Ramirez","doi":"10.1109/ICDEW.2014.6818317","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818317","url":null,"abstract":"The goal of the Planetary Data System (PDS) is the digital preservation of scientific data for long-term use by the scientific research community. After two decades of successful operation, the PDS found itself in a new era of big data, international cooperation, distributed nodes, and multiple ways of analysing and interpreting data. A project was formed to develop a disciplined architectural approach that would drive the design and implementation of a scalable data system that could evolve to meet the demands of this new era. PDS4, the next generation system, uses an explicit model-driven architectural approach coupled with modern information technologies and standards to meet these challenges in order to ensure the planetary data assets can be mined for scientific knowledge for years to come.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130647411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
On reflection in Linked Data management 关联数据管理中的反思
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818338
G. Fletcher
{"title":"On reflection in Linked Data management","authors":"G. Fletcher","doi":"10.1109/ICDEW.2014.6818338","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818338","url":null,"abstract":"The Linked Data (LD) initiative promotes the use of international standards for data sharing on the web. The flexible Resource Description Framework (RDF) graph data model is a central LD technology. Much of RDF's flexibility is due to its blurring of the traditional distinction between data and metadata. A particularly powerful form of metadata is query expressions. A query language is called reflective if it can interpret expressions of the language stored as data both as “active” data and as regular “static” data. Fundamental applications of reflection are found in diverse domains such as security, information integration, and data quality. Consequently, a variety of RDF representations of active data have been proposed. However, RDF query languages, while bridging the data-metadata barrier, continue to maintain a divide between active and static data. Hence, applications of RDF active data rely on ad-hoc special-purpose solutions, thereby limiting their broader use and impact. This paper argues that active data must be promoted as first class citizens in RDF querying, if we are to realize the full potential of LD. The aim is to stimulate the study of general frameworks and solutions for reasoning about and applying reflection in the web of data, towards addressing this fundamental research challenge.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132960147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Typing query languages for data graphs 为数据图输入查询语言
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818297
Dario Colazzo, C. Sartiani
{"title":"Typing query languages for data graphs","authors":"Dario Colazzo, C. Sartiani","doi":"10.1109/ICDEW.2014.6818297","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818297","url":null,"abstract":"Graph query languages are essentially untyped. The lack of type information greatly limits the optimization opportunities for query engines and makes application development more complex. In this paper we discuss a simple, yet expressive, schema language for edge-labelled data graphs. This schema language is, then, used to define a query type inference approach with good precision properties.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124982190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Method and components for creating scientific workflow 创建科学工作流的方法和组件
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818319
Yuan Lin, I. Mougenot, T. Libourel
{"title":"Method and components for creating scientific workflow","authors":"Yuan Lin, I. Mougenot, T. Libourel","doi":"10.1109/ICDEW.2014.6818319","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818319","url":null,"abstract":"Researchers in environmental areas (biology, geographical information, etc.) need to capitalize, communicate and validate their experiments. In this context, the scientific workflow is increasingly used, since it facilitates the resource reusability and makes the daily work of the experts more efficient. This article, after a short introduction, has the objective of introducing our global vision of a scientific workflow system. By considering the user point of view and the analysis of existing work, we propose to realize the system based on a 3-level architecture (statistical level/intermediate level/dynamic level). We will focus on two essential points: (1) building a resource management environment (“work desktop”) and (2) using this environment to instantiate, validate and correct different workflow models. Finally, we conclude the article and present some perspectives.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123296469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Graph databases for large-scale healthcare systems: A framework for efficient data management and data services 用于大规模医疗保健系统的图形数据库:高效数据管理和数据服务的框架
2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818295
Yubin Park, M. Shankar, Byung H. Park, Joydeep Ghosh
{"title":"Graph databases for large-scale healthcare systems: A framework for efficient data management and data services","authors":"Yubin Park, M. Shankar, Byung H. Park, Joydeep Ghosh","doi":"10.1109/ICDEW.2014.6818295","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818295","url":null,"abstract":"Designing a database system for both efficient data management and data services has been one of the enduring challenges in the healthcare domain. In many healthcare systems, data services and data management are often viewed as two orthogonal tasks; data services refer to retrieval and analytic queries such as search, joins, statistical data extraction, and simple data mining algorithms, while data management refers to building error-tolerant and non-redundant database systems. The gap between service and management has resulted in rigid database systems and schemas that do not support effective analytics. We compose a rich graph structure from an abstracted healthcare RDBMS to illustrate how we can fill this gap in practice. We show how a healthcare graph can be automatically constructed from a normalized relational database using the proposed “3NF Equivalent Graph” (3EG) transformation. We discuss a set of real world graph queries such as finding self-referrals, shared providers, and collaborative filtering, and evaluate their performance over a relational database and its 3EG-transformed graph. Experimental results show that the graph representation serves as multiple de-normalized tables, thus reducing complexity in a database and enhancing data accessibility of users. Based on this finding, we propose an ensemble framework of databases for healthcare applications.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124515309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信