{"title":"RBAS: A Real-Time User Behavior Analysis System for Internet TV in Cloud Computing","authors":"C. Zhu, Guang Cheng, Xiaojun Guo, Yuxiang Wang","doi":"10.1145/2935663.2935664","DOIUrl":null,"url":null,"abstract":"The characteristic of Internet TV user behavior is quite essential for designers to optimize resource schedule and improve user experience. With the rapid development of Internet, both Internet TV users and STB (set top boxes) models are booming. This brings a large amount of behavior data which requires matching computing and storage resource to process. Therefore, scalable Internet TV user behavior analysis becomes more difficult. As a solution, cloud computing framework such as Hive is emerged. But limited by performance, it's not an appropriate choice for interactive analysis or real-time data exploration. In this paper, we present a real-time Internet TV user behavior analysis system with advantages of high concurrency, low latency and good transportability. Firstly, we design an event capture scheme, consisted of agents embedded in STBs and capture server clusters, to capture every manipulation performed by users. Secondly, we develop a SQL-on-Hadoop engine with distributed transactional management to decrease the response time. The engine has excellent query performance and ability to interactively query various data sources in different Hadoop formats. Lastly, we evaluate RBAS in a commercial Internet TV platform of 16 million registered users. The results show that, with a 32-node cluster, the system can effectively process 10.2 TB of behavior data every day, which is about 40x faster than original Hive-based system.","PeriodicalId":305382,"journal":{"name":"Proceedings of the 11th International Conference on Future Internet Technologies","volume":"26 12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th International Conference on Future Internet Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2935663.2935664","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The characteristic of Internet TV user behavior is quite essential for designers to optimize resource schedule and improve user experience. With the rapid development of Internet, both Internet TV users and STB (set top boxes) models are booming. This brings a large amount of behavior data which requires matching computing and storage resource to process. Therefore, scalable Internet TV user behavior analysis becomes more difficult. As a solution, cloud computing framework such as Hive is emerged. But limited by performance, it's not an appropriate choice for interactive analysis or real-time data exploration. In this paper, we present a real-time Internet TV user behavior analysis system with advantages of high concurrency, low latency and good transportability. Firstly, we design an event capture scheme, consisted of agents embedded in STBs and capture server clusters, to capture every manipulation performed by users. Secondly, we develop a SQL-on-Hadoop engine with distributed transactional management to decrease the response time. The engine has excellent query performance and ability to interactively query various data sources in different Hadoop formats. Lastly, we evaluate RBAS in a commercial Internet TV platform of 16 million registered users. The results show that, with a 32-node cluster, the system can effectively process 10.2 TB of behavior data every day, which is about 40x faster than original Hive-based system.