Huijun Wu, Xiaoyao Qian, Hulya Pamukcu Crowell, Tushar Singh, Aleks Shulman, Prashil Bhimani, Abhishek Maloo, Chunxu Tang, Yao Li, Lu Zhang, Chris Ulherr
{"title":"Migrate On-Premises Real-Time Data Analytics Jobs Into the Cloud","authors":"Huijun Wu, Xiaoyao Qian, Hulya Pamukcu Crowell, Tushar Singh, Aleks Shulman, Prashil Bhimani, Abhishek Maloo, Chunxu Tang, Yao Li, Lu Zhang, Chris Ulherr","doi":"10.1109/DSAA53316.2021.9564177","DOIUrl":null,"url":null,"abstract":"Twitter's data platform team is serving a large number of real-time analytics jobs, powering a wide range of data science use cases, from aggregations over time to spam detection. These analytics jobs constitute a crucial step in Twitter's data science infrastructure. As a key part of Twitter's “partly cloudy” strategy, real-time data analytics jobs are being migrated from on-premises into the cloud. We would like to share our migration approach and findings in this paper. The jobs to be migrated vary but follow common patterns, including the “read-modify-write store” and “lambda architecture” patterns. Both patterns can be migrated to the Beam data model in general ways. Besides job patterns, the job IOs are handled by replicating or proxying between on-premises and the cloud. Tests are applied in two phases through monitoring metrics and control tests. A case study demonstrates the business impact of migration. Finally, we discuss lessons learned.","PeriodicalId":129612,"journal":{"name":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA53316.2021.9564177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Twitter's data platform team is serving a large number of real-time analytics jobs, powering a wide range of data science use cases, from aggregations over time to spam detection. These analytics jobs constitute a crucial step in Twitter's data science infrastructure. As a key part of Twitter's “partly cloudy” strategy, real-time data analytics jobs are being migrated from on-premises into the cloud. We would like to share our migration approach and findings in this paper. The jobs to be migrated vary but follow common patterns, including the “read-modify-write store” and “lambda architecture” patterns. Both patterns can be migrated to the Beam data model in general ways. Besides job patterns, the job IOs are handled by replicating or proxying between on-premises and the cloud. Tests are applied in two phases through monitoring metrics and control tests. A case study demonstrates the business impact of migration. Finally, we discuss lessons learned.