Online Model Management via Temporally Biased Sampling

Brian Hentschel, P. Haas, Yuanyuan Tian
{"title":"Online Model Management via Temporally Biased Sampling","authors":"Brian Hentschel, P. Haas, Yuanyuan Tian","doi":"10.1145/3371316.3371333","DOIUrl":null,"url":null,"abstract":"To maintain the accuracy of supervised learning models in the presence of evolving data streams, we provide temporallybiased sampling schemes that weight recent data most heavily, with inclusion probabilities for a given data item decaying exponentially over time. We then periodically retrain the models on the current sample. We provide and analyze both a simple sampling scheme (T-TBS) that probabilistically maintains a target sample size and a novel reservoirbased scheme (R-TBS) that is the first to provide both control over the decay rate and a guaranteed upper bound on the sample size. The R-TBS and T-TBS schemes are of independent interest, extending the known set of unequalprobability sampling schemes. We discuss distributed implementation strategies; experiments in Spark show that our approach can increase machine learning accuracy and robustness in the face of evolving data.","PeriodicalId":21740,"journal":{"name":"SIGMOD Rec.","volume":"43 1","pages":"69-76"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIGMOD Rec.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3371316.3371333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

To maintain the accuracy of supervised learning models in the presence of evolving data streams, we provide temporallybiased sampling schemes that weight recent data most heavily, with inclusion probabilities for a given data item decaying exponentially over time. We then periodically retrain the models on the current sample. We provide and analyze both a simple sampling scheme (T-TBS) that probabilistically maintains a target sample size and a novel reservoirbased scheme (R-TBS) that is the first to provide both control over the decay rate and a guaranteed upper bound on the sample size. The R-TBS and T-TBS schemes are of independent interest, extending the known set of unequalprobability sampling schemes. We discuss distributed implementation strategies; experiments in Spark show that our approach can increase machine learning accuracy and robustness in the face of evolving data.
通过时间偏差抽样的在线模型管理
为了在不断发展的数据流中保持监督学习模型的准确性,我们提供了暂时有偏差的抽样方案,该方案对最近的数据进行了最重的加权,给定数据项的包含概率随着时间呈指数衰减。然后我们定期在当前样本上重新训练模型。我们提供并分析了一种简单的抽样方案(T-TBS)和一种新的基于水库的方案(R-TBS),该方案在概率上保持了目标样本量,而R-TBS是第一个提供对衰减率的控制和保证样本量上限的方案。R-TBS和T-TBS格式是独立的,扩展了已知的非等概率抽样格式集。我们讨论了分布式实现策略;Spark的实验表明,面对不断变化的数据,我们的方法可以提高机器学习的准确性和鲁棒性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信