{"title":"Towards Constant-Time Cardinality Estimation for Large-Scale RFID Systems","authors":"Binbin Li, Yuan He, Wenyuan Liu","doi":"10.1109/ICPP.2015.90","DOIUrl":"https://doi.org/10.1109/ICPP.2015.90","url":null,"abstract":"Cardinality estimation is the process to survey the quantity of tags in a RFID system. Generally, the cardinality is estimated by exchanging information between reader(s) and tags. To ensure the time efficiency and accuracy of estimation, numerous probability-based approaches have been proposed, most of which follow a similar way of minimizing the number of required time slots from tags to reader. The overall execution time of the estimator, however, is not necessarily minimized. The estimation accuracy of those approaches also largely depends on the repeated rounds, leading to a dilemma of choosing efficiency or accuracy. In this paper, we propose BFCE, a Bloom Filter based Cardinality Estimator, which only needs a constant number of time slots to meet desired estimation accuracy, regardless of the actual tag cardinality. The overall communication overhead is also significantly cut down, as the reader only needs to broadcast a constant number of messages for parameter setting. Results from extensive simulations under various tag IDs distributions shows that BFCE is accurate and highly efficient. In terms of the overall execution time, BFCE is 30 times faster than ZOE and 2 times faster than SRC in average, the two state-of-the-arts estimation approaches.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116204685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Han, Junwei Wang, Siguang Huang, Chenrong Shao, Shulin Zhan, Jianfeng Zhan, J. L. Vázquez-Poletti
{"title":"PCS: Predictive Component-Level Scheduling for Reducing Tail Latency in Cloud Online Services","authors":"Rui Han, Junwei Wang, Siguang Huang, Chenrong Shao, Shulin Zhan, Jianfeng Zhan, J. L. Vázquez-Poletti","doi":"10.1109/ICPP.2015.58","DOIUrl":"https://doi.org/10.1109/ICPP.2015.58","url":null,"abstract":"Modern latency-critical online services often rely on composing results from a large number of server components. Hence the tail latency (e.g. The 99th percentile of response time), rather than the average, of these components determines the overall service performance. When hosted on a cloud environment, the components of a service typically co-locate with short batch jobs to increase machine utilizations, and share and contend resources such as caches and I/O bandwidths with them. The highly dynamic nature of batch jobs in terms of their workload types and input sizes causes continuously changing performance interference to individual components, hence leading to their latency variability and high tail latency. However, existing techniques either ignore such fine-grained component latency variability when managing service performance, or rely on executing redundant requests to reduce the tail latency, which adversely deteriorate the service performance when load gets heavier. In this paper, we propose PCS, a predictive and component-level scheduling framework to reduce tail latency for large-scale, parallel online services. It uses an analytical performance model to simultaneously predict the component latency and the overall service performance on different nodes. Based on the predicted performance, the scheduler identifies straggling components and conducts near-optimal component-node allocations to adapt to the changing performance interferences from batch jobs. We demonstrate that, using realistic workloads, the proposed scheduler reduces the component tail latency by an average of 67.05% and the average overall service latency by 64.16% compared with the state-of-the-art techniques on reducing tail latency.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121210527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CoRec: A Cooperative Reconstruction Pattern for Multiple Failures in Erasure-Coded Storage Clusters","authors":"Jianzhong Huang, Er-wei Dai, C. Xie, X. Qin","doi":"10.1109/ICPP.2015.56","DOIUrl":"https://doi.org/10.1109/ICPP.2015.56","url":null,"abstract":"It is indispensable to speed up a reconstruction process in erasure-coded storage clusters, because a fast data recovery helps to shorten the vulnerability window while improving storage system reliability. To address double- and multiple-node failures, this paper proposes a cooperative reconstruction pattern - CoRec - to minimize reconstruction traffic. CoRec not only enables all rebuilding nodes to collaboratively reconstruct failed blocks but also limits each surviving block to be transferred over network only once. To clarify two CoRec based reconstruction schemes (i.e., CoRec-rn and CoRec-sn), we investigate two alternative reconstruction schemes (i.e., CRec and DRec). We develop reconstruction-time models, which are validated using empirical data, to estimate reconstruction performance of large-scale storage clusters and to pinpoint performance bottlenecks in the reconstruction process. We implement a proof-of-concept prototype where the four reconstruction schemes are quantitatively evaluated. Experimental results show that CoRec-rn and CoRec-sn significantly reduce the reconstruction time of CRec and DRec. In a real-world 9-node storage cluster, CoRec-rn speeds up the double-node reconstruction of CRec and DRec by a factor of at least 1.72, CoRec-sn accelerates the double-node reconstruction of CRec and DRec by a factor of at least 4.76.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"32 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122703631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring Hardware Profile-Guided Green Datacenter Scheduling","authors":"W. Tang, Yu Wang, Haopeng Liu, Zhang Tao, Chao Li, Xiaoyao Liang","doi":"10.1109/ICPP.2015.10","DOIUrl":"https://doi.org/10.1109/ICPP.2015.10","url":null,"abstract":"Recently, tapping into renewable energy sources has shown great promise in alleviating server energy poverty and reducing IT carbon footprint. Due to the limited, time-varying green power generation, matching server power demand to runtime power budget is often crucial in green data centers. However, existing studies mainly focus on the temporal variability of the power supply and demand, while largely ignore the spatial variation issue in server hardware. With more complex computing units integrated and the technology scaling, the performance/power variation among nodes and the conservative supply voltage margin of each core can greatly compromise the power matching effectiveness that a green datacenter can achieve. This paper explores green datacenter design that takes into account non-uniform hardware power characteristics. We propose is cope, a novel power management framework that can (1) expose architecture variability to the datacenter facility-level scheduler for efficient power matching, and (2) balance the energy usage and lifetime of compute nodes in the highly dynamic green computing environment. Using realistic hardware profiling data and renewable energy data, we show that is cope can reduce the energy cost up to 54%, while maintaining fairly balanced processor utilization rate and negligible profiling overhead.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121935467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianhua Sun, D. Zhou, Hao Chen, Cheng Chang, Zhiwen Chen, Wentao Li, Ligang He
{"title":"GPSA: A Graph Processing System with Actors","authors":"Jianhua Sun, D. Zhou, Hao Chen, Cheng Chang, Zhiwen Chen, Wentao Li, Ligang He","doi":"10.1109/ICPP.2015.80","DOIUrl":"https://doi.org/10.1109/ICPP.2015.80","url":null,"abstract":"Due to the increasing need to process the fast growing graph-structured data (e.g. Social networks and Web graphs), designing high performance graph processing systems becomes one of the most urgent problems facing systems researchers. In this paper, we introduce GPSA, a single-machine graph processing system based on an actor computation model inspired by the Bulk Synchronous Parallel(BSP) computation model. GPSA takes advantage of actors to improve the concurrency on a single machine with limited resource. GPSA improves the conventional BSP computation model to fit in the actor programming paradigm by decoupling the message dispatching from the computation. Furthermore, we exploit memory mapping to avoid explicit data management to improve I/O performance. Experimental evaluation shows that our system outperforms existing systems by 2x-6x in processing large-scale graphs on a single system.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124264125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunxiang Wu, F. Wang, Yu Hua, D. Feng, Yuchong Hu, Jingning Liu, Wei Tong
{"title":"Fast FCoE: An Efficient and Scale-Up Multi-core Framework for FCoE-Based SAN Storage Systems","authors":"Yunxiang Wu, F. Wang, Yu Hua, D. Feng, Yuchong Hu, Jingning Liu, Wei Tong","doi":"10.1109/ICPP.2015.42","DOIUrl":"https://doi.org/10.1109/ICPP.2015.42","url":null,"abstract":"Due to the high complexity in software hierarchy and the shared queue & lock mechanism for synchronized access, existing I/O stack for remote target access in FCoE-based SAN storage becomes a performance bottleneck, thus leading to a high I/O overhead and limited I/O scalability in multi-core servers. For scalable performance, existing works focus on improving the efficiency of lock algorithm or reducing the number of synchronization points to decrease the synchronization overhead. However, the synchronization problem still exists and leads to a limited I/O scalability. In this paper, we propose Fast FCoE, a protocol stack framework for remote storage access in FCoE based SAN storage. Fast FCoE uses private per-CPU structures and disables the kernel preemption to process I/Os. This method avoids the synchronization overhead. For further I/O efficiency, Fast FCoE directly maps the requests from the block-layer to the FCoE frames. A salient feature of Fast FCoE is using the standard interfaces, thus supporting all upper softwares (such as existing file systems and applications) and offering flexible use in existing infrastructure (e.g., Adaptors, switches, storage devices). Our results demonstrate that Fast FCoE achieves efficient and scalable I/O throughput, obtaining 1107.3K/831.3K IOPS (5.43/4.88 times as much as Open-FCoE stack) for read/write requests.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126674938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Social VoD: A Social Feature-Based P2P System","authors":"Wei Chang, Jie Wu","doi":"10.1109/ICPP.2015.66","DOIUrl":"https://doi.org/10.1109/ICPP.2015.66","url":null,"abstract":"Video-on-demand (VoD) service has been explosively growing since its first appearance. For maintaining an acceptable buffering delay, the bandwidth costs have become a huge burden for the service providers. Complementing the conventional client-server architecture with a peer-to-peer system(P2P) can significantly reduce the central server's bandwidth demands. However, the previous works focus on establishing a P2P overlay for each video, producing a high maintenance cost on users. Per-channel-based overlay construction was first introduced by Social Tube, which clusters the users subscribed to the same video channels into one P2P overlay. However, the current per-channel overlay structure is not suitable for users developing new watching preferences. Consider that a channel's subscribers tend to watch not only the videos from the channel, but also other videos from similar channels. In this paper, we propose a new overlay structure by exploring the existing social relations of users and the similarities of video channels. Our system creates a hierarchical overlay: subscribers of the same channel form the low-level overlay (also known as groups), and in high-level overlay, different groups are connected based on their similarities. The new structure has the small-world property, the existence of which has been found in most data-sharing patterns. Based on the new structure, we propose a routing algorithm for both channel subscribed and unsubscribed users. Extensive simulation results show the efficiency of our approach.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116792240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Redundancy-Aware Data Utility Maximization in Crowdsourced Sensing with Smartphones","authors":"Juan Li, Yanmin Zhu, Jiadi Yu, Qian Zhang, L. Ni","doi":"10.1109/ICPP.2015.99","DOIUrl":"https://doi.org/10.1109/ICPP.2015.99","url":null,"abstract":"This paper studies the critical problem of maximizing the aggregate data utility under budget constraint in mobile crowd sourced sensing. This problem is particularly challenging given the redundancy in sensing data, self-interested and strategic user behaviors, and private cost information of smartphones. Most of existing approaches do not consider the important performance objective - maximizing the redundancy-aware data utility of sensing data collected from smartphones. Furthermore, they do not consider the practical constraint on budget. In this paper, we propose a combinatorial auction mechanism based on a reverse auction framework. It consists of an approximation algorithm for winning bids determination and a critical payment scheme. The approximation algorithm guarantees a constant approximation ratio at polynomial-time complexity. The critical payment scheme guarantees truthful bidding. The rigid theoretical analysis demonstrates that our mechanism achieves truthfulness, individual rationality, computational efficiency, and budget feasibility. Extensive simulations show that the proposed mechanism produces high redundancy-aware data utility.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130075899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}