2014 IEEE 30th International Conference on Data Engineering Workshops最新文献

Automatic user steering for interactive data exploration 用于交互式数据探索的自动用户转向

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818348

Kyriaki Dimitriadou

引用次数: 0

Scholarly big data information extraction and integration in the CiteSeerχ digital library CiteSeerχ数字图书馆学术大数据信息提取与集成

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818305

Kyle Williams, Jian Wu, Sagnik Ray Choudhury, Madian Khabsa, C. Lee Giles

{"title":"Scholarly big data information extraction and integration in the CiteSeerχ digital library","authors":"Kyle Williams, Jian Wu, Sagnik Ray Choudhury, Madian Khabsa, C. Lee Giles","doi":"10.1109/ICDEW.2014.6818305","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818305","url":null,"abstract":"CiteSeerχ is a digital library that contains approximately 3.5 million scholarly documents and receives between 2 and 4 million requests per day. In addition to making documents available via a public Website, the data is also used to facilitate research in areas like citation analysis, co-author network analysis, scalability evaluation and information extraction. The papers in CiteSeerχ are gathered from the Web by means of continuous automatic focused crawling and go through a series of automatic processing steps as part of the ingestion process. Given the size of the collection, the fact that it is constantly expanding, and the multiple ways in which it is used both by the public to access scholarly documents and for research, there are several big data challenges. In this paper, we provide a case study description of how we address these challenges when it comes to information extraction, data integration and entity linking in CiteSeerχ. We describe how we: aggregate data from multiple sources on the Web; store and manage data; process data as part of an automatic ingestion pipeline that includes automatic metadata and information extraction; perform document and citation clustering; perform entity linking and name disambiguation; and make our data and source code available to enable research and collaboration.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126931866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 56

A tool for personal data extraction 个人数据提取工具

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818307

Daniela Vianna, A. Yong, Chaolun Xia, A. Marian, Thu D. Nguyen

{"title":"A tool for personal data extraction","authors":"Daniela Vianna, A. Yong, Chaolun Xia, A. Marian, Thu D. Nguyen","doi":"10.1109/ICDEW.2014.6818307","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818307","url":null,"abstract":"Digital storage now acts as an archive of the memories of users worldwide, keeping record of data as well as the context in which the data was acquired. The massive amount of data available and the fact that it is fragmented across many services (e.g., Facebook) and devices (e.g., laptop) make it very difficult for users to find specific pieces of information that they remember having stored or accessed. Unifying this fragmented data into a single data set that includes contextual information would allow for much better indexing and searching of personal information. Thus, we have developed a personal data extraction tool as a first step toward this vision. In this paper, we present this extraction tool, along with some preliminary statistics about personal data gathered by the tool for several users. The goal of the data analysis is to give a glimpse of what the digital life of a person may look like, and how it is currently partitioned across many different services; moreover, it reinforces the fact that it is not possible for users to manually retrieve, store and access their extensive digital data without the support of a personalized information management tool.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127661947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Parallel join executions in RAMCloud RAMCloud中的并行连接执行

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818325

Christian Tinnefeld, Donald Kossmann, Joos-Hendrik Böse, H. Plattner

引用次数: 10

YCSB+T: Benchmarking web-scale transactional databases YCSB+T:对web规模的事务数据库进行基准测试

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818330

Akon Dey, A. Fekete, R. Nambiar, Uwe Röhm

{"title":"YCSB+T: Benchmarking web-scale transactional databases","authors":"Akon Dey, A. Fekete, R. Nambiar, Uwe Röhm","doi":"10.1109/ICDEW.2014.6818330","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818330","url":null,"abstract":"Database system benchmarks like TPC-C and TPC-E focus on emulating database applications to compare different DBMS implementations. These benchmarks use carefully constructed queries executed within the context of transactions to exercise specific RDBMS features, and measure the throughput achieved. Cloud services benchmark frameworks like YCSB, on the other hand, are designed for performance evaluation of distributed NoSQL key-value stores, early examples of which did not support transactions, and so the benchmarks use single operations that are not inside transactions. Recent implementations of web-scale distributed NoSQL systems like Spanner and Percolator, offer transaction features to cater to new web-scale applications. This has exposed a gap in standard benchmarks. We identify the issues that need to be addressed when evaluating transaction support in NoSQL databases. We describe YCSB+T, an extension of YCSB, that wraps database operations within transactions. In this framework, we include a validation stage to detect and quantify database anomalies resulting from any workload, and we gather metrics that measure transactional overhead. We have designed a specific workload called Closed Economy Workload (CEW), which can run within the YCSB+T framework. We share our experience with using CEW to evaluate some NoSQL systems.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130063355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 84

PDS4: A model-driven planetary science data architecture for long-term preservation PDS4:用于长期保存的模型驱动的行星科学数据体系结构

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818317

J. Hughes, D. Crichton, S. Hardman, E. Law, R. Joyner, P. Ramirez

引用次数: 9

On reflection in Linked Data management 关联数据管理中的反思

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818338

G. Fletcher

{"title":"On reflection in Linked Data management","authors":"G. Fletcher","doi":"10.1109/ICDEW.2014.6818338","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818338","url":null,"abstract":"The Linked Data (LD) initiative promotes the use of international standards for data sharing on the web. The flexible Resource Description Framework (RDF) graph data model is a central LD technology. Much of RDF's flexibility is due to its blurring of the traditional distinction between data and metadata. A particularly powerful form of metadata is query expressions. A query language is called reflective if it can interpret expressions of the language stored as data both as “active” data and as regular “static” data. Fundamental applications of reflection are found in diverse domains such as security, information integration, and data quality. Consequently, a variety of RDF representations of active data have been proposed. However, RDF query languages, while bridging the data-metadata barrier, continue to maintain a divide between active and static data. Hence, applications of RDF active data rely on ad-hoc special-purpose solutions, thereby limiting their broader use and impact. This paper argues that active data must be promoted as first class citizens in RDF querying, if we are to realize the full potential of LD. The aim is to stimulate the study of general frameworks and solutions for reasoning about and applying reflection in the web of data, towards addressing this fundamental research challenge.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132960147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Typing query languages for data graphs 为数据图输入查询语言

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818297

Dario Colazzo, C. Sartiani

引用次数: 4

Method and components for creating scientific workflow 创建科学工作流的方法和组件

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818319

Yuan Lin, I. Mougenot, T. Libourel

引用次数: 1

Graph databases for large-scale healthcare systems: A framework for efficient data management and data services 用于大规模医疗保健系统的图形数据库:高效数据管理和数据服务的框架

2014 IEEE 30th International Conference on Data Engineering Workshops Pub Date : 2014-05-19 DOI: 10.1109/ICDEW.2014.6818295

Yubin Park, M. Shankar, Byung H. Park, Joydeep Ghosh

{"title":"Graph databases for large-scale healthcare systems: A framework for efficient data management and data services","authors":"Yubin Park, M. Shankar, Byung H. Park, Joydeep Ghosh","doi":"10.1109/ICDEW.2014.6818295","DOIUrl":"https://doi.org/10.1109/ICDEW.2014.6818295","url":null,"abstract":"Designing a database system for both efficient data management and data services has been one of the enduring challenges in the healthcare domain. In many healthcare systems, data services and data management are often viewed as two orthogonal tasks; data services refer to retrieval and analytic queries such as search, joins, statistical data extraction, and simple data mining algorithms, while data management refers to building error-tolerant and non-redundant database systems. The gap between service and management has resulted in rigid database systems and schemas that do not support effective analytics. We compose a rich graph structure from an abstracted healthcare RDBMS to illustrate how we can fill this gap in practice. We show how a healthcare graph can be automatically constructed from a normalized relational database using the proposed “3NF Equivalent Graph” (3EG) transformation. We discuss a set of real world graph queries such as finding self-referrals, shared providers, and collaborative filtering, and evaluate their performance over a relational database and its 3EG-transformed graph. Experimental results show that the graph representation serves as multiple de-normalized tables, thus reducing complexity in a database and enhancing data accessibility of users. Based on this finding, we propose an ensemble framework of databases for healthcare applications.","PeriodicalId":302600,"journal":{"name":"2014 IEEE 30th International Conference on Data Engineering Workshops","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124515309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34