Data4U '14最新文献

筛选
英文 中文
Taming Big Data: Integrating diverse public data sources for economic competitiveness analytics 驯服大数据:整合各种公共数据源,用于经济竞争力分析
Data4U '14 Pub Date : 2014-09-01 DOI: 10.1145/2658840.2658845
R. Neamtu, Ramoza Ahsan, J. Stokes, Armend Hoxha, Jialiang Bao, Stefan Gvozdenovic, Ted Meyer, Nilesh Patel, Raghu Rangan, Yumou Wang, Dongyun Zhang, Elke A. Rundensteiner
{"title":"Taming Big Data: Integrating diverse public data sources for economic competitiveness analytics","authors":"R. Neamtu, Ramoza Ahsan, J. Stokes, Armend Hoxha, Jialiang Bao, Stefan Gvozdenovic, Ted Meyer, Nilesh Patel, Raghu Rangan, Yumou Wang, Dongyun Zhang, Elke A. Rundensteiner","doi":"10.1145/2658840.2658845","DOIUrl":"https://doi.org/10.1145/2658840.2658845","url":null,"abstract":"In an era where Big Data can greatly impact a broad population, many novel opportunities arise, chief among them the ability to integrate data from diverse sources and \"wrangle\" it to extract novel insights. Conceived as a tool that can help both expert and non-expert users better understand public data, MATTERS was collaboratively developed by the Massachusetts High Tech Council, WPI and other institutions as an analytic platform offering dynamic modeling capabilities. MATTERS is an integrative data source on high fidelity cost and talent competitiveness metrics. Its goal is to extract, integrate and model rich economic, financial, educational and technological information from renowned heterogeneous web data sources ranging from The US Census Bureau, The Bureau of Labor Statistics to the Institute of Education Sciences, all known to be critical factors influencing economic competitiveness of states. This demonstration of MATTERS illustrates how we tackle challenges of data acquisition, cleaning, integration and wrangling into appropriate representations, visualization and story-telling with data in the context of state competitiveness in the high-tech sector.","PeriodicalId":135661,"journal":{"name":"Data4U '14","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134543287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Paradigm for Learning Queries on Big Data 基于大数据的查询学习范式
Data4U '14 Pub Date : 2014-09-01 DOI: 10.1145/2658840.2658842
A. Bonifati, Radu Ciucanu, Aurélien Lemay, S. Staworko
{"title":"A Paradigm for Learning Queries on Big Data","authors":"A. Bonifati, Radu Ciucanu, Aurélien Lemay, S. Staworko","doi":"10.1145/2658840.2658842","DOIUrl":"https://doi.org/10.1145/2658840.2658842","url":null,"abstract":"Specifying a database query using a formal query language is typically a challenging task for non-expert users. In the context of big data, this problem becomes even harder as it requires the users to deal with database instances of big sizes and hence difficult to visualize. Such instances usually lack a schema to help the users specify their queries, or have an incomplete schema as they come from disparate data sources. In this paper, we propose a novel paradigm for interactive learning of queries on big data, without assuming any knowledge of the database schema. The paradigm can be applied to different database models and a class of queries adequate to the database model. In particular, in this paper we present two instantiations that validated the proposed paradigm for learning relational join queries and for learning path queries on graph databases. Finally, we discuss the challenges of employing the paradigm for further data models and for learning cross-model schema mappings.","PeriodicalId":135661,"journal":{"name":"Data4U '14","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133167570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
DiNoDB: Efficient Large-Scale Raw Data Analytics 高效的大规模原始数据分析
Data4U '14 Pub Date : 2014-09-01 DOI: 10.1145/2658840.2658841
Yongchao Tian, Ioannis Alagiannis, Erietta Liarou, A. Ailamaki, P. Michiardi, M. Vukolic
{"title":"DiNoDB: Efficient Large-Scale Raw Data Analytics","authors":"Yongchao Tian, Ioannis Alagiannis, Erietta Liarou, A. Ailamaki, P. Michiardi, M. Vukolic","doi":"10.1145/2658840.2658841","DOIUrl":"https://doi.org/10.1145/2658840.2658841","url":null,"abstract":"Modern big data workflows, found in e.g., machine learning use cases, often involve iterations of cycles of batch analytics and interactive analytics on temporary data. Whereas batch analytics solutions for large volumes of raw data are well established (e.g., Hadoop, MapReduce), state-of-the-art interactive analytics solutions (e.g., distributed shared nothing RDBMSs) require data loading and/or transformation phase, which is inherently expensive for temporary data.\u0000 In this paper, we propose a novel scalable distributed solution for in-situ data analytics, that offers both scalable batch and interactive data analytics on raw data, hence avoiding the loading phase bottleneck of RDBMSs. Our system combines a MapReduce based platform with the recently proposed NoDB paradigm, which optimizes traditional centralized RDBMSs for in-situ queries of raw files. We revisit the NoDB's centralized design and scale it out supporting multiple clients and data processing nodes to produce a new distributed data analytics system we call Distributed NoDB (DiNoDB). DiNoDB leverages MapReduce batch queries to produce critical pieces of metadata (e.g., distributed positional maps and vertical indices) to speed up interactive queries without the overheads of the data loading and data movement phases allowing users to quickly and efficiently exploit their data.\u0000 Our experimental analysis demonstrates that DiNoDB significantly reduces the data-to-query latency with respect to comparable state-of-the-art distributed query engines, like Shark, Hive and HadoopDB.","PeriodicalId":135661,"journal":{"name":"Data4U '14","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132591683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
An Efficient Processing of k-Dominant Skyline Query in MapReduce MapReduce中k-Dominant Skyline查询的高效处理
Data4U '14 Pub Date : 2014-09-01 DOI: 10.1145/2658840.2658846
Hao Tian, M. A. Siddique, Y. Morimoto
{"title":"An Efficient Processing of k-Dominant Skyline Query in MapReduce","authors":"Hao Tian, M. A. Siddique, Y. Morimoto","doi":"10.1145/2658840.2658846","DOIUrl":"https://doi.org/10.1145/2658840.2658846","url":null,"abstract":"Filtering uninteresting data is important to utilize \"big data\". Skyline query is one of popular techniques to filter uninteresting data, in which it selects a set of points that are not dominated by another from a given large database. However, a skyline query often retrieves too many points to analyze intensively especially for high-dimensional dataset. In order to solve the problem, k-dominant skyline queries have been introduced, which can control the number of retrieved points. However, conventional algorithms for computing k-dominant skyline queries are not well suited for parallel and distributed environments, such as the MapReduce framework. In this paper we considered an efficient parallel algorithm to process k-dominant skyline query in the MapReduce framework. Extensive experiments are conducted to evaluate the algorithm under different settings of data distribution, dimensionality, and cardinality.","PeriodicalId":135661,"journal":{"name":"Data4U '14","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125416345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Affordable Analytics on Expensive Data 基于昂贵数据的平价分析
Data4U '14 Pub Date : 2014-09-01 DOI: 10.1145/2658840.2658844
P. Upadhyaya, Martina Unutzer, M. Balazinska, Dan Suciu, Hakan Hacıgümüş
{"title":"Affordable Analytics on Expensive Data","authors":"P. Upadhyaya, Martina Unutzer, M. Balazinska, Dan Suciu, Hakan Hacıgümüş","doi":"10.1145/2658840.2658844","DOIUrl":"https://doi.org/10.1145/2658840.2658844","url":null,"abstract":"In this paper, we outline steps towards supporting \"data analysis on a budget\" when operating in a setting where data must be bought, possibly periodically. We model the problem, and explore the design choices for analytic applications as well as potentially fruitful algorithmic techniques to reduce the cost of acquiring data. Simulations suggest that an order of magnitude improvements are possible.","PeriodicalId":135661,"journal":{"name":"Data4U '14","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134056228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信