Beyond myopic inference in big data pipelines

Karthik Raman, Adith Swaminathan, J. Gehrke, T. Joachims
{"title":"Beyond myopic inference in big data pipelines","authors":"Karthik Raman, Adith Swaminathan, J. Gehrke, T. Joachims","doi":"10.1145/2487575.2487588","DOIUrl":null,"url":null,"abstract":"Big Data Pipelines decompose complex analyses of large data sets into a series of simpler tasks, with independently tuned components for each task. This modular setup allows re-use of components across several different pipelines. However, the interaction of independently tuned pipeline components yields poor end-to-end performance as errors introduced by one component cascade through the whole pipeline, affecting overall accuracy. We propose a novel model for reasoning across components of Big Data Pipelines in a probabilistically well-founded manner. Our key idea is to view the interaction of components as dependencies on an underlying graphical model. Different message passing schemes on this graphical model provide various inference algorithms to trade-off end-to-end performance and computational cost. We instantiate our framework with an efficient beam search algorithm, and demonstrate its efficiency on two Big Data Pipelines: parsing and relation extraction.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2487575.2487588","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

Abstract

Big Data Pipelines decompose complex analyses of large data sets into a series of simpler tasks, with independently tuned components for each task. This modular setup allows re-use of components across several different pipelines. However, the interaction of independently tuned pipeline components yields poor end-to-end performance as errors introduced by one component cascade through the whole pipeline, affecting overall accuracy. We propose a novel model for reasoning across components of Big Data Pipelines in a probabilistically well-founded manner. Our key idea is to view the interaction of components as dependencies on an underlying graphical model. Different message passing schemes on this graphical model provide various inference algorithms to trade-off end-to-end performance and computational cost. We instantiate our framework with an efficient beam search algorithm, and demonstrate its efficiency on two Big Data Pipelines: parsing and relation extraction.
超越大数据管道的短视推断
大数据管道将大型数据集的复杂分析分解为一系列更简单的任务,每个任务都有独立调优的组件。这种模块化设置允许跨多个不同管道重用组件。然而,独立调优的管道组件之间的交互会导致端到端性能差,因为一个组件引入的误差会级联整个管道,从而影响整体精度。我们提出了一种新颖的模型,用于以概率良好的方式跨大数据管道组件进行推理。我们的关键思想是将组件的交互视为对底层图形模型的依赖关系。该图形模型上的不同消息传递方案提供了不同的推理算法,以权衡端到端性能和计算成本。我们用一个高效的梁搜索算法实例化了我们的框架,并演示了它在两个大数据管道上的效率:解析和关系提取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信