Feng Zhu, Lijie Xu, Gang Ma, Shuping Ji, Jie Wang, Gang Wang, Hongyi Zhang, K. Wan, Ming-ming Wang, Xingchao Zhang, Yuming Wang, Jingpin Li
{"title":"eBay大数据SQL分析平台质量问题实证研究","authors":"Feng Zhu, Lijie Xu, Gang Ma, Shuping Ji, Jie Wang, Gang Wang, Hongyi Zhang, K. Wan, Ming-ming Wang, Xingchao Zhang, Yuming Wang, Jingpin Li","doi":"10.1145/3510457.3513034","DOIUrl":null,"url":null,"abstract":"Big data SQL analytics platform has evolved as the key infrastructure for business data analysis. Compared with traditional costly commercial RDBMS, scalable solutions with open-source projects, such as SQL-on-Hadoop, are more popular and attractive to enter-prises. In eBay, we build Carmel, a company-wide interactive SQL analytics platform based on Apache Spark. Carmel has been serving thousands of customers from hundreds of teams globally for more than 3 years. Meanwhile, despite the popularity of open-source based big data SQL analytics platforms, few empirical studies on service quality issues (e.g., job failure) were carried out for them. However, a deep understanding of service quality issues and taking right mitigation are significant to the ease of manual maintenance efforts. To fill this gap, we conduct a comprehensive empirical study on 1,884 real-word service quality issues from Carmel. We summa-rize the common symptoms and identify the root causes with typical cases. Stakeholders including system developers, researchers, and platform maintainers can benefit from our findings and implications. Furthermore, we also present lessons learned from critical cases in our daily practice, as well as insights to motivate automatic tool support and future research directions.","PeriodicalId":119790,"journal":{"name":"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An Empirical Study on Quality Issues of eBay's Big Data SQL Analytics Platform\",\"authors\":\"Feng Zhu, Lijie Xu, Gang Ma, Shuping Ji, Jie Wang, Gang Wang, Hongyi Zhang, K. Wan, Ming-ming Wang, Xingchao Zhang, Yuming Wang, Jingpin Li\",\"doi\":\"10.1145/3510457.3513034\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Big data SQL analytics platform has evolved as the key infrastructure for business data analysis. Compared with traditional costly commercial RDBMS, scalable solutions with open-source projects, such as SQL-on-Hadoop, are more popular and attractive to enter-prises. In eBay, we build Carmel, a company-wide interactive SQL analytics platform based on Apache Spark. Carmel has been serving thousands of customers from hundreds of teams globally for more than 3 years. Meanwhile, despite the popularity of open-source based big data SQL analytics platforms, few empirical studies on service quality issues (e.g., job failure) were carried out for them. However, a deep understanding of service quality issues and taking right mitigation are significant to the ease of manual maintenance efforts. To fill this gap, we conduct a comprehensive empirical study on 1,884 real-word service quality issues from Carmel. We summa-rize the common symptoms and identify the root causes with typical cases. Stakeholders including system developers, researchers, and platform maintainers can benefit from our findings and implications. Furthermore, we also present lessons learned from critical cases in our daily practice, as well as insights to motivate automatic tool support and future research directions.\",\"PeriodicalId\":119790,\"journal\":{\"name\":\"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3510457.3513034\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 44th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3510457.3513034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Empirical Study on Quality Issues of eBay's Big Data SQL Analytics Platform
Big data SQL analytics platform has evolved as the key infrastructure for business data analysis. Compared with traditional costly commercial RDBMS, scalable solutions with open-source projects, such as SQL-on-Hadoop, are more popular and attractive to enter-prises. In eBay, we build Carmel, a company-wide interactive SQL analytics platform based on Apache Spark. Carmel has been serving thousands of customers from hundreds of teams globally for more than 3 years. Meanwhile, despite the popularity of open-source based big data SQL analytics platforms, few empirical studies on service quality issues (e.g., job failure) were carried out for them. However, a deep understanding of service quality issues and taking right mitigation are significant to the ease of manual maintenance efforts. To fill this gap, we conduct a comprehensive empirical study on 1,884 real-word service quality issues from Carmel. We summa-rize the common symptoms and identify the root causes with typical cases. Stakeholders including system developers, researchers, and platform maintainers can benefit from our findings and implications. Furthermore, we also present lessons learned from critical cases in our daily practice, as well as insights to motivate automatic tool support and future research directions.