技术视角:在SQL中支持线性代数操作

SIGMOD Rec. Pub Date : 2018-09-10 DOI:10.1145/3277006.3277012

Y. Papakonstantinou

{"title":"技术视角:在SQL中支持线性代数操作","authors":"Y. Papakonstantinou","doi":"10.1145/3277006.3277012","DOIUrl":null,"url":null,"abstract":"Linear algebra operations are at the core of Machine Learning. Multiple specialized systems have emerged for the scalable, distributed execution of matrix and vector operations. The relationship of such computations to data management and databases however brings frictions. It is well known that a great deal of human time and machine time is being spent nowadays on fetching data out of the database and performing a computation on a specialized system. One answer to the issue is that we truly need a new kind of non-SQL database that is tuned to these computations. The creators of SimSQL opted for the decidedly incremental approach. Can we make a very small set of changes to the relational model and RDBMS software to render them suitable for executing linear algebra in the database? We have come across the \"brand new system\" versus \"incremental to relational\" question many times in the database field. E.g., do we need brand new query languages and query processors for data cubes? Or do we need to have our query processors pay attention to specific cases that are especially common in data analytics queries over stars and snowflakes? Do semistructured query languages need to depart from SQL or it is enough to be incremental to SQL? Same for query processors. Repeat the questions to graph data and RDF data. In many cases, new custom systems emerged only to figure out later that we could/should have tackled the problem incrementally. That’s the trap that the authors of this paper avoid. This is not to say that radical changes and extensions should be forbidden. Rather it says that we should closely scrutinize the necessity of the changes, do them when needed and keep them minimal. The authors identify the right opportunities. Here is a non-exhaustive list:","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"7 1","pages":"23"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Technical Perspective:: Supporting Linear Algebra Operations in SQL\",\"authors\":\"Y. Papakonstantinou\",\"doi\":\"10.1145/3277006.3277012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Linear algebra operations are at the core of Machine Learning. Multiple specialized systems have emerged for the scalable, distributed execution of matrix and vector operations. The relationship of such computations to data management and databases however brings frictions. It is well known that a great deal of human time and machine time is being spent nowadays on fetching data out of the database and performing a computation on a specialized system. One answer to the issue is that we truly need a new kind of non-SQL database that is tuned to these computations. The creators of SimSQL opted for the decidedly incremental approach. Can we make a very small set of changes to the relational model and RDBMS software to render them suitable for executing linear algebra in the database? We have come across the \\\"brand new system\\\" versus \\\"incremental to relational\\\" question many times in the database field. E.g., do we need brand new query languages and query processors for data cubes? Or do we need to have our query processors pay attention to specific cases that are especially common in data analytics queries over stars and snowflakes? Do semistructured query languages need to depart from SQL or it is enough to be incremental to SQL? Same for query processors. Repeat the questions to graph data and RDF data. In many cases, new custom systems emerged only to figure out later that we could/should have tackled the problem incrementally. That’s the trap that the authors of this paper avoid. This is not to say that radical changes and extensions should be forbidden. Rather it says that we should closely scrutinize the necessity of the changes, do them when needed and keep them minimal. The authors identify the right opportunities. Here is a non-exhaustive list:\",\"PeriodicalId\":21740,\"journal\":{\"name\":\"SIGMOD Rec.\",\"volume\":\"7 1\",\"pages\":\"23\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SIGMOD Rec.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3277006.3277012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGMOD Rec.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3277006.3277012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

线性代数运算是机器学习的核心。针对矩阵和矢量运算的可伸缩、分布式执行，出现了多个专门的系统。然而，这种计算与数据管理和数据库的关系带来了摩擦。众所周知，现在大量的人力和机器时间都花在从数据库中提取数据和在专门的系统上执行计算上。这个问题的一个答案是，我们确实需要一种新的非sql数据库来适应这些计算。SimSQL的创建者果断地选择了增量方法。我们能否对关系模型和RDBMS软件进行一组非常小的更改，以使它们适合在数据库中执行线性代数?在数据库领域，我们已经多次遇到“全新系统”与“增量到关系”的问题。例如，我们是否需要全新的数据集查询语言和查询处理器?或者我们是否需要让查询处理器关注在星星和雪花的数据分析查询中特别常见的特定情况?半结构化查询语言是否需要脱离SQL，或者增量到SQL就足够了?查询处理器也是如此。对图数据和RDF数据重复上述问题。在许多情况下，新的定制系统出现只是为了后来发现我们可以/应该逐步解决这个问题。这是本文作者避免的陷阱。这并不是说应该禁止激进的改变和扩展。相反，它说我们应该仔细审查这些变化的必要性，在需要的时候进行，并将其保持在最低限度。作者指出了正确的机会。以下是一份不详尽的清单:

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Technical Perspective:: Supporting Linear Algebra Operations in SQL

Linear algebra operations are at the core of Machine Learning. Multiple specialized systems have emerged for the scalable, distributed execution of matrix and vector operations. The relationship of such computations to data management and databases however brings frictions. It is well known that a great deal of human time and machine time is being spent nowadays on fetching data out of the database and performing a computation on a specialized system. One answer to the issue is that we truly need a new kind of non-SQL database that is tuned to these computations. The creators of SimSQL opted for the decidedly incremental approach. Can we make a very small set of changes to the relational model and RDBMS software to render them suitable for executing linear algebra in the database? We have come across the "brand new system" versus "incremental to relational" question many times in the database field. E.g., do we need brand new query languages and query processors for data cubes? Or do we need to have our query processors pay attention to specific cases that are especially common in data analytics queries over stars and snowflakes? Do semistructured query languages need to depart from SQL or it is enough to be incremental to SQL? Same for query processors. Repeat the questions to graph data and RDF data. In many cases, new custom systems emerged only to figure out later that we could/should have tackled the problem incrementally. That’s the trap that the authors of this paper avoid. This is not to say that radical changes and extensions should be forbidden. Rather it says that we should closely scrutinize the necessity of the changes, do them when needed and keep them minimal. The authors identify the right opportunities. Here is a non-exhaustive list:

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

SIGMOD Rec.

自引率

0.00%

发文量