Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability.

Sepanta Zeighami, Cyrus Shahabi
{"title":"Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability.","authors":"Sepanta Zeighami, Cyrus Shahabi","doi":"","DOIUrl":null,"url":null,"abstract":"<p><p>Use of machine learning to perform database operations, such as indexing, cardinality estimation, and sorting, is shown to provide substantial performance benefits. However, when datasets change and data distribution shifts, empirical results also show performance degradation for learned models, possibly to worse than non-learned alternatives. This, together with a lack of theoretical understanding of learned methods undermines their practical applicability, since there are no guarantees on how well the models will perform after deployment. In this paper, we present the first known theoretical characterization of the performance of learned models in dynamic datasets, for the aforementioned operations. Our results show novel theoretical characteristics achievable by learned models and provide bounds on the performance of the models that characterize their advantages over non-learned methods, showing why and when learned models can outperform the alternatives. Our analysis develops the <i>distribution learnability</i> framework and novel theoretical tools which build the foundation for the analysis of learned database operations in the future.</p>","PeriodicalId":74504,"journal":{"name":"Proceedings of machine learning research","volume":"235 ","pages":"58283-58305"},"PeriodicalIF":0.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11534081/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of machine learning research","FirstCategoryId":"1085","ListUrlMain":"","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Use of machine learning to perform database operations, such as indexing, cardinality estimation, and sorting, is shown to provide substantial performance benefits. However, when datasets change and data distribution shifts, empirical results also show performance degradation for learned models, possibly to worse than non-learned alternatives. This, together with a lack of theoretical understanding of learned methods undermines their practical applicability, since there are no guarantees on how well the models will perform after deployment. In this paper, we present the first known theoretical characterization of the performance of learned models in dynamic datasets, for the aforementioned operations. Our results show novel theoretical characteristics achievable by learned models and provide bounds on the performance of the models that characterize their advantages over non-learned methods, showing why and when learned models can outperform the alternatives. Our analysis develops the distribution learnability framework and novel theoretical tools which build the foundation for the analysis of learned database operations in the future.

通过分布可学习性对分布转移下的可学习数据库操作进行理论分析》(Theoretical Analysis of Learned Database Operations under Distribution Shift through Distribution Learnability)。
使用机器学习执行数据库操作(如索引、卡片性估计和排序)可带来巨大的性能优势。然而,当数据集发生变化和数据分布发生变化时,经验结果也显示学习模型的性能下降,可能比非学习模型更差。由于无法保证模型部署后的性能,再加上缺乏对学习方法的理论理解,这就削弱了其实际应用性。在本文中,我们针对上述操作,首次从理论上描述了动态数据集中学习模型的性能。我们的结果表明了学习模型可实现的新理论特性,并提供了模型性能的界限,这些界限描述了模型相对于非学习方法的优势,说明了为什么以及什么时候学习模型可以优于其他方法。我们的分析建立了分布式可学习性框架和新颖的理论工具,为今后分析学习型数据库操作奠定了基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信