{"title":"TensorTable:为混合关系和线性代数管道扩展 PyTorch","authors":"Xu Wen","doi":"10.1016/j.tbench.2024.100161","DOIUrl":null,"url":null,"abstract":"<div><p>The mixed relational algebra (RA) and linear algebra (LA) pipelines have become increasingly common in recent years. However, contemporary widely used frameworks struggle to support both RA and LA operators effectively, failing to ensure optimal end-to-end performance due to the cost of LA operators and data conversion. This underscores the demand for a system capable of seamlessly integrating RA and LA while delivering robust end-to-end performance. This paper proposes TensorTable, a tensor system that extends PyTorch to enable mixed RA and LA pipelines. We propose TensorTable as the unified data representation, storing data in a tensor format to prioritize the performance of LA operators and reduce data conversion costs. Relational tables from RA, as well as vectors, matrices, and tensors from LA, can be seamlessly converted into TensorTables. Additionally, we provide TensorTable-based implementations for RA operators and build a system that supports mixed LA and RA pipelines. We implement TensorTable on top of PyTorch, achieving comparable performance for both RA and LA operators, particularly on small datasets. TensorTable achieves a 1.15x-5.63x speedup for mixed pipelines, compared with state-of-the-art frameworks—AIDA and RMA.</p></div>","PeriodicalId":100155,"journal":{"name":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2772485924000139/pdfft?md5=159d30f36fa85195e487f7a07663be37&pid=1-s2.0-S2772485924000139-main.pdf","citationCount":"0","resultStr":"{\"title\":\"TensorTable: Extending PyTorch for mixed relational and linear algebra pipelines\",\"authors\":\"Xu Wen\",\"doi\":\"10.1016/j.tbench.2024.100161\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The mixed relational algebra (RA) and linear algebra (LA) pipelines have become increasingly common in recent years. However, contemporary widely used frameworks struggle to support both RA and LA operators effectively, failing to ensure optimal end-to-end performance due to the cost of LA operators and data conversion. This underscores the demand for a system capable of seamlessly integrating RA and LA while delivering robust end-to-end performance. This paper proposes TensorTable, a tensor system that extends PyTorch to enable mixed RA and LA pipelines. We propose TensorTable as the unified data representation, storing data in a tensor format to prioritize the performance of LA operators and reduce data conversion costs. Relational tables from RA, as well as vectors, matrices, and tensors from LA, can be seamlessly converted into TensorTables. Additionally, we provide TensorTable-based implementations for RA operators and build a system that supports mixed LA and RA pipelines. We implement TensorTable on top of PyTorch, achieving comparable performance for both RA and LA operators, particularly on small datasets. TensorTable achieves a 1.15x-5.63x speedup for mixed pipelines, compared with state-of-the-art frameworks—AIDA and RMA.</p></div>\",\"PeriodicalId\":100155,\"journal\":{\"name\":\"BenchCouncil Transactions on Benchmarks, Standards and Evaluations\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2772485924000139/pdfft?md5=159d30f36fa85195e487f7a07663be37&pid=1-s2.0-S2772485924000139-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BenchCouncil Transactions on Benchmarks, Standards and Evaluations\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772485924000139\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BenchCouncil Transactions on Benchmarks, Standards and Evaluations","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772485924000139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
近年来,混合关系代数(RA)和线性代数(LA)管道越来越常见。然而,由于线性代数运算符和数据转换的成本问题,当代广泛使用的框架难以同时有效支持关系代数和线性代数运算符,无法确保最佳的端到端性能。这就凸显了对能够无缝集成 RA 和 LA 并提供强大端到端性能的系统的需求。本文提出的张量系统 TensorTable 对 PyTorch 进行了扩展,以实现 RA 和 LA 混合管道。我们建议将 TensorTable 作为统一的数据表示方式,以张量格式存储数据,从而优先考虑 LA 运算符的性能并降低数据转换成本。来自 RA 的关系表,以及来自 LA 的向量、矩阵和张量,都可以无缝转换成 TensorTable。此外,我们还为 RA 运算符提供了基于 TensorTable 的实现,并构建了一个支持 LA 和 RA 混合管道的系统。我们在 PyTorch 的基础上实现了 TensorTable,为 RA 和 LA 运算符实现了相当的性能,尤其是在小型数据集上。与最先进的框架--AIDA 和 RMA 相比,TensorTable 的混合管道速度提高了 1.15-5.63 倍。
TensorTable: Extending PyTorch for mixed relational and linear algebra pipelines
The mixed relational algebra (RA) and linear algebra (LA) pipelines have become increasingly common in recent years. However, contemporary widely used frameworks struggle to support both RA and LA operators effectively, failing to ensure optimal end-to-end performance due to the cost of LA operators and data conversion. This underscores the demand for a system capable of seamlessly integrating RA and LA while delivering robust end-to-end performance. This paper proposes TensorTable, a tensor system that extends PyTorch to enable mixed RA and LA pipelines. We propose TensorTable as the unified data representation, storing data in a tensor format to prioritize the performance of LA operators and reduce data conversion costs. Relational tables from RA, as well as vectors, matrices, and tensors from LA, can be seamlessly converted into TensorTables. Additionally, we provide TensorTable-based implementations for RA operators and build a system that supports mixed LA and RA pipelines. We implement TensorTable on top of PyTorch, achieving comparable performance for both RA and LA operators, particularly on small datasets. TensorTable achieves a 1.15x-5.63x speedup for mixed pipelines, compared with state-of-the-art frameworks—AIDA and RMA.