SIGMOD Rec.最新文献_第9页

What's Really New with NewSQL? NewSQL的真正新特性是什么?

SIGMOD Rec. Pub Date : 2016-09-28 DOI: 10.1145/3003665.3003674

Andrew Pavlo, Matthew Aslett

{"title":"What's Really New with NewSQL?","authors":"Andrew Pavlo, Matthew Aslett","doi":"10.1145/3003665.3003674","DOIUrl":"https://doi.org/10.1145/3003665.3003674","url":null,"abstract":"A new class of database management systems (DBMSs) called NewSQL tout their ability to scale modern on-line transaction processing (OLTP) workloads in a way that is not possible with legacy systems. The term NewSQL was first used by one of the authors of this article in a 2011 business analysis report discussing the rise of new database systems as challengers to these established vendors (Oracle, IBM, Microsoft). The other author was working on what became one of the first examples of a NewSQL DBMS. Since then several companies and research projects have used this term (rightly and wrongly) to describe their systems.\u0000 Given that relational DBMSs have been around for over four decades, it is justifiable to ask whether the claim of NewSQL's superiority is actually true or whether it is simply marketing. If they are indeed able to get better performance, then the next question is whether there is anything scientifically new about them that enables them to achieve these gains or is it just that hardware has advanced so much that now the bottlenecks from earlier years are no longer a problem.\u0000 To do this, we first discuss the history of databases to understand how NewSQL systems came about. We then provide a detailed explanation of what the term NewSQL means and the different categories of systems that fall under this definition.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"3 4 1","pages":"45-55"},"PeriodicalIF":0.0,"publicationDate":"2016-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78351394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 122

Factorized Databases 映像数据库

SIGMOD Rec. Pub Date : 2016-09-28 DOI: 10.1145/3003665.3003667

Dan Olteanu, Maximilian Schleich

引用次数: 82

A Survey on Accessing Dataspaces 访问数据空间的调查

SIGMOD Rec. Pub Date : 2016-09-28 DOI: 10.1145/3003665.3003672

Yihan Wang, Shaoxu Song, Lei Chen

引用次数: 4

H V Jagadish Speaks Out on PVLDB, CoRR and Data-driven Research H·V·Jagadish谈PVLDB、CoRR和数据驱动研究

SIGMOD Rec. Pub Date : 2016-09-28 DOI: 10.1145/3003665.3003676

M. Winslett, V. Braganholo

引用次数: 1

Technical Perspective:: Toward Building Entity Matching Management Systems 技术视角:构建实体匹配管理系统

SIGMOD Rec. Pub Date : 2016-08-01 DOI: 10.14778/2994509.2994535

Pradap Konda, Sanjib Das, C. PaulSuganthanG., A. Doan, A. Ardalan, Jeffrey R. Ballard, Han Li, Fatemah Panahi, Haojun Zhang, J. Naughton, Shishir Prasad, Ganesh Krishnan, Rohit Deep, V. Raghavendra

{"title":"Technical Perspective:: Toward Building Entity Matching Management Systems","authors":"Pradap Konda, Sanjib Das, C. PaulSuganthanG., A. Doan, A. Ardalan, Jeffrey R. Ballard, Han Li, Fatemah Panahi, Haojun Zhang, J. Naughton, Shishir Prasad, Ganesh Krishnan, Rohit Deep, V. Raghavendra","doi":"10.14778/2994509.2994535","DOIUrl":"https://doi.org/10.14778/2994509.2994535","url":null,"abstract":"Entity matching (EM) has been a long-standing challenge in data management. Most current EM works focus only on developing matching algorithms. We argue that far more efforts should be devoted to building EM systems. We discuss the limitations of current EM systems, then describe Magellan, a new kind of EM system. Magellan is novel in four important aspects. (1) It provides how-to guides that tell users what to do in each EM scenario, step by step. (2) It provides tools to help users execute these steps; the tools seek to cover the entire EM pipeline, not just blocking and matching as current EM systems do. (3) Tools are built into the Python open-source data science ecosystem, allowing Magellan to borrow a rich set of capabilities in data cleaning, IE, visualization, learning, etc. (4) Magellan provides a powerful scripting environment to facilitate interactive experimentation and quick \"patching\" of the system. We describe research challenges and present extensive experiments that show the promise of the Magellan approach.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"118 1","pages":"33-40"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74896136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 187

Resource Bricolage for Parallel DBMSs on Heterogeneous Clusters 异构集群上并行dbms的资源拼贴

SIGMOD Rec. Pub Date : 2016-06-02 DOI: 10.1145/2949741.2949752

Jiexing Li, J. Naughton, Rimma V. Nehme

{"title":"Resource Bricolage for Parallel DBMSs on Heterogeneous Clusters","authors":"Jiexing Li, J. Naughton, Rimma V. Nehme","doi":"10.1145/2949741.2949752","DOIUrl":"https://doi.org/10.1145/2949741.2949752","url":null,"abstract":"Running parallel database systems in an environment with heterogeneous resources has become increasingly common, due to cluster evolution and increasing interest in moving applications into public clouds or shared infrastructures. For database systems running in a heterogeneous cluster, the default uniform data partitioning strategy may overload some of the slow machines while at the same time it may underutilize the more powerful machines. Since the processing time of a parallel query is determined by the slowest machine, such an allocation strategy may result in a significant query performance degradation.\u0000 We take a first step to address this problem by introducing a technique we call resource bricolage that improves database performance in heterogeneous environments. Our approach quantifies the performance differences among machines with various resources as they process workloads with diverse resource requirements. We formalize the problem of minimizing workload execution time and view it as an optimization problem, and then we employ linear programming to obtain a recommended data partitioning scheme. We verify the effectiveness of our technique with an extensive experimental study on a commercial database system.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"16 1","pages":"42-49"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82774507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Technical Perspective: Implicit Parallelism through Deep Language Embedding 技术视角:通过深度语言嵌入的隐式并行

SIGMOD Rec. Pub Date : 2016-06-02 DOI: 10.1145/2949741.2949753

Z. Ives

引用次数: 0

Technical Perspective: Natural Language to SQL Translation by Iteratively Exploring a Middle Ground 技术视角:通过迭代探索中间地带的自然语言到SQL的转换

SIGMOD Rec. Pub Date : 2016-06-02 DOI: 10.1145/2949741.2949743

J. Naughton

{"title":"Technical Perspective: Natural Language to SQL Translation by Iteratively Exploring a Middle Ground","authors":"J. Naughton","doi":"10.1145/2949741.2949743","DOIUrl":"https://doi.org/10.1145/2949741.2949743","url":null,"abstract":"A fundamental question in data management is how relational database management systems (RDBMSs) should be queried. Ideally, the query interface should be powerful enough to express arbitrary queries, yet simple enough to learn that users require virtually no training. Natural language is an obvious and appealing approach – presumably most users already know at least one natural language and use it to “query” other humans constantly. Unfortunately, employing natural language to query RDBMSs is highly nontrivial, and for the most part, not used. However, with the growing power and ubiquity of Natural Language Processing (NLP) systems, it makes sense to redouble efforts in applying NLP to database querying. At the most basic level, relational database systems are queried using SQL. (For that matter, most “NoSQL” systems are also queried using SQL.) SQL is very powerful and precise, and, for novices, very hard to write. So SQL cannot be used as a user interface for anyone but power users. Nonetheless, as the most widely used RDBMS query language, SQL is the most natural language into which to translate natural language questions over relational data. This translation is the focus of the following paper, “Understanding Natural Language Queries over Relational Databases”, by Li and Jagadish. The first important decision made by the authors of this paper is to reject a one-shot, one-way translation process from a natural language query to a corresponding SQL query. Instead, the authors advocate an iterative dialog between the person posing the query and the system building the relational query. This makes perfect sense – even in the much simpler world of keyword search systems, users iteratively refine their queries. Unfortunately, adopting this approach for RDBMS querying does not yield an easy problem – in fact, it uncovers a highly interesting and difficult challenge: how should the user and the system communicate in this iterative process? Answering this question is difficult. Unlike the case for keyword search systems, the answer to the query may not help the user know if the executed query was what they really wanted. For example, consider the simple query “find the difference between sales this year and last year.” In general the RDBMS will return a number – and it is very hard to tell just from that number if the query was correct or not. It would be far more precise for the system to respond to the user by presenting the generated SQL query itself. But this would require the person posing the natural language query to be able to read and understand SQL, which contradicts a major motivation for the system in the first place. Now we come to what is perhaps the heart of this paper: the decision to adopt an intermediate language the authors call “Query Tree,”a two-way domain-independent communication model allowing the user and system to understand one other. A query tree aids mapping a user query to its corresponding semantically correct SQL and ","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"35 1","pages":"5"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78079353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Technical Perspective: Taming Hardware Skew as Parallel DBMSs Scale Out 技术观点:在并行dbms向外扩展时控制硬件倾斜

SIGMOD Rec. Pub Date : 2016-06-02 DOI: 10.1145/2949741.2949751

D. DeWitt

{"title":"Technical Perspective: Taming Hardware Skew as Parallel DBMSs Scale Out","authors":"D. DeWitt","doi":"10.1145/2949741.2949751","DOIUrl":"https://doi.org/10.1145/2949741.2949751","url":null,"abstract":"For almost 40 years now, relational database management systems have successfully used data parallelism to speed up the evaluation of large queries. Here, by “data parallelism” we mean taking one operation (for example, a “join” or an “aggregation”) and spreading it over multiple machines, each operating on a part of the data. In general this approach works spectacularly well, yielding almost linear speedups over a wide variety of workloads. However, like any form of parallelism, data-parallel relational query processing is vulnerable to “skew.” The database literature is full of work dealing with the skew that arises when one node in a parallel system is allocated more work than the average. The following paper, by Li, Naughton, and Nehme, is interesting in that it deals with another kind of skew, one that has received much less attention: “hardware skew,” that is, skew that arises because the processing units in a parallel system are not all of equal power. Such skew can arise in several ways – for example, a parallel system could be constructed “on the fly” by allocating available nodes in a cloud, or a company could upgrade an on-premises system with the addition of new nodes that are of a different generation and class of hardware than the existing ones. If the DBMS is oblivious to the fact that the underlying system is not uniform, the result will be the same as that achieved if the system were constructed entirely of the slowest nodes in the system. If all the nodes in the system are equally “balanced” the solution is simple – if one node is 1/2 as fast as the average, give that node 1/2 the average work, and you are set. Unfortunately, in practice, things are not that simple. One node may have a faster CPU but the same I/O performance, or vice-versa; or nodes may have differing amounts of memory or network bandwidth. In such cases simple proportional allocation of work will be suboptimal. The situation is further complicated by the fact that different queries make different demands on the system with respect to CPU, memory, network, and disk; in fact, different stages of a single query can make very different demands. This, finally, is the situation addressed by the paper, “Resource Bricolage for Parallel DBMSs on Heterogeneous Clusters.” The authors make use of techniques for cost estimation growing out of the query optimization and query running time prediction literature; they combine these techniques with a linear programming model that chooses an optimal allocation for a given query on a given system. They demonstrate through an analytic model as well as experiments with an implementation that their proposed solution dominates simpler alternatives. An interesting question this work raises is the duality between “on-demand” load balancing of the type employed by MapReduce-like systems and the predictive, up-front allocation of work advocated by this paper. My suspicion is that both approaches have their place, and the choice of which ","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"94 1","pages":"41"},"PeriodicalIF":0.0,"publicationDate":"2016-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88412266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Understanding Natural Language Queries over Relational Databases 理解关系数据库的自然语言查询

SIGMOD Rec. Pub Date : 2016-06-02 DOI: 10.1145/2949741.2949744

Fei Li, H. Jagadish

引用次数: 54