F. Nadeem, S. A. Ostadzadeh, M. Nadeem, Stephan Wong, K. Bertels
{"title":"A Simulation Framework for Reconfigurable Processors in Large-Scale Distributed Systems","authors":"F. Nadeem, S. A. Ostadzadeh, M. Nadeem, Stephan Wong, K. Bertels","doi":"10.1109/ICPPW.2011.50","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.50","url":null,"abstract":"The inclusion of reconfigurable processors in distributed grid systems promises to offer increased performance without compromising flexibility. Consequently, these large-scale distributed grid systems (such as TeraGrid) are utilizing reconfigurable computing resources next to general-purpose processors (GPPs) in their computing nodes. The near-optimal utilization of resources in such distributed systems considerably depends on the resource management and the application task scheduling. Many state-of-the-art simulators for application scheduling simulation in distributed computing systems have been proposed. However, there is no dedicated simulation framework to study the behavior of reconfigurable nodes in grids. The incorporation of reconfigurable nodes in these systems requires to take into account reconfigurable hardware characteristics, such as, area utilization, performance increase, reconfiguration time, and time to transfer configuration bit streams, execution code, and data. Many of these characteristics are not taken into account by traditional simulators. In this paper, we present a simulation framework for reconfigurable processors in large-scale distributed systems. It is capable of modeling reconfigurable nodes, processor configurations, and tasks in a distributed system. Furthermore, as part of the verification of the framework, we implemented a dynamic task scheduling algorithm with support for the scheduling of tasks on reconfigurable nodes. A number of experiments with various simulation parameters were conducted. The results show an expected trend. We also present a thorough discussion of the results.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115140416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bo Cheng, Rong Du, Bo Yang, Wenbin Yu, Cailian Chen, X. Guan
{"title":"An Accurate GPS-Based Localization in Wireless Sensor Networks: A GM-WLS Method","authors":"Bo Cheng, Rong Du, Bo Yang, Wenbin Yu, Cailian Chen, X. Guan","doi":"10.1109/ICPPW.2011.32","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.32","url":null,"abstract":"For wireless sensor networks, localization is crucial due to the dynamic nature of deployment. In absolute localization, a few nodes (called beacon nodes or anchors) need to know their absolute positions, and all the other nodes are absolutely localized in the coordinate system of the beacon nodes. Most of GPS-based localizations belong to absolute localization, and localization systems enable nodes to fix their positions in a global coordinate system using a relatively small number of beacon nodes that know their position through external measurement (e.g., GPS). Considering inevitable errors from unreliable GPS observations, the localization result will not be accurate. In this paper, we propose a Weighted Least Square (WLS) method for GPS-based localization to get more accurate position of sensor nodes that are not equipped with GPSreceivers. Following the description of the WLS algorithm as well as the localization system based on it, simulation analysis and real world experiments demonstrate the effectiveness of the proposed approach.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114957818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Femtocell-Assisted Data Forwarding Protocol in Relay Enhanced LTE Networks","authors":"Yuh-Shyan Chen, Chaojun Li, Wen-Lin Chiang","doi":"10.1109/ICPPW.2011.49","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.49","url":null,"abstract":"The femtocell networks, which is a small cellular base station in home and small business environment, is an attractive solution for operators to improve indoor coverage and network capability. In addition, relaying is one of the proposed technique for future releases of UTRAN Long Term Evolution (LTE) networks which aims to increase the coverage and capability of LTE networks. A LTE network is called as relay enhanced LTE network if the LTE network adopting the relays. A user can handover not only two relays, but also between relays and base stations, and two base stations. It is important to provide a seamlessly handover solution in the relay enhanced LTE network. During mobility, the packet loss problem is occurred if some packets are sent to the previous base station (or relay) when a user equipment (UE) is already handover to the current base station (or relay). To solve this problem, a data forwarding procedure is performed to re-direct these buffered packets from the previous base station (or relay) to the current base station (or relay). In this paper, we develop a new data forwarding protocol with the assistance of femtocells, called as a femtocell-assisted data forwarding, in the relay enhanced LTE networks to provide a seamlessly handover result with the low packet loss rate and the high throughput. Finally, the simulation results illustrate that our proposed protocol outperforms the existing data forwarding scheme.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122723049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guan Zhimin, Fu Yinxia, Zheng Ninghan, Zhang Jianxun, Cai Min, Huang Yan, Tang Jie
{"title":"Improving Performance of the Irregular Data Intensive Application with Small Computation Workload for CMPs","authors":"Guan Zhimin, Fu Yinxia, Zheng Ninghan, Zhang Jianxun, Cai Min, Huang Yan, Tang Jie","doi":"10.1109/ICPPW.2011.7","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.7","url":null,"abstract":"The data needs of scientific or commercial applications from a diverse range of fields have been increasing exponentially over the recent years. Although the traditional systems work well for computation that requires limited data handling, the CMPs in cloud computing may below performance for the computation that requires large amounts of intensive data. Conventional helper thread techniques try to improve the high performance overheads, but they can not improve performance of the irregular data intensive applications with small computation workload. Our goal is to provide a novel solution to improve the application performance in data intensive computing environments. By introducing the prepuce look ahead Size K, the prepush block size P and the synchronization block size B three operations to helper thread, we expect to reduce the overheads introduced by the traditional helper thread and leave the computing resources to perform useful prefetch work. As a starting point, we design the KPB interleaved data prepush algorithm, and use Q6600 and IBM 5110 multi-core computers as our test platforms to study behaviors of the benchmarks fromSPEC2006 suite and Olden suite. We construct the helper threads of mcf from SPEC2006, mst and em3d from Olden by using our method, the average result of speedup is 1.23, 1.32and 1.09 separately on the Q6600 machine, and 1.28, 1.35 and1.23 separately on another machine. Compared with the AP and PV methods, our method is less negative impact than both AP and PV, our KPB-method is also better than AP and PV in the prefetching timeliness and control ability.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121989049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implications of Recovery Schemes for Virtualization Platform","authors":"Guanhua Tian, Dan Meng","doi":"10.1109/ICPPW.2011.9","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.9","url":null,"abstract":"In this paper, we propose a general simulation platform integrated with a general failure model framework. Based on our simulation platform, we investigate the implications of system failures on virtualization platform. Meanwhile, we investigate the capability of virtualization platform's recovery mechanism, both reactive and proactive ones. In fact, we find that proactive recovery, in terms of both performance and resource debts, outperforms reactive ones. It has the potential to extract the most of performance gap between system with failures and one without. Furthermore, as proactive recovery depends on the results of system status predictor, we investigate the impacts of predictor's attributes and find that predictor's 'False Negative' attribute is most important for exploiting the proactive recovery mechanism's capability.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114363094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kuei-Li Huang, C. Tseng, Jui-Tang Wang, Tsung-Hsi Yang
{"title":"A Controller-Assisted Distributed (CAD) Load Balancing Scheme for ZigBee Networks","authors":"Kuei-Li Huang, C. Tseng, Jui-Tang Wang, Tsung-Hsi Yang","doi":"10.1109/ICPPW.2011.20","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.20","url":null,"abstract":"In this paper, we propose a controller-assisted distributed (CAD) load-balancing mechanism for the ZigBee network containing multiple personal area networks (PANs). Shifting the enforcement part from the central controller to PANs, each PAN in CAD maintains its load status whereas the central controller simply maintains node numbers of PANs and a list of switch pairs which each is formed from two nodes nearby in different PANs, denoting a possible load switch between the two PANs. Upon perceiving a network unbalanced, the controller just provides a heavy-loaded PAN a switch pair and an offload threshold so that the PAN enforces the offload of a sub tree onto a neighbor PAN. The number of nodes in sub tree may be exactly smaller than the threshold due to maintenance situation in PAN. Simulation results show that CAD achieves a same load balancing result as a centralized method which outperforms other methods, yet costing fewer control messages.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121449495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Extensible Design of a Load-Aware Virtual Router Monitor in User Space","authors":"Harry F. W. Choi, P. Lee","doi":"10.1109/ICPPW.2011.16","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.16","url":null,"abstract":"Router virtualization enables multiple virtual routers to be hosted on a physical shared substrate, and hence facilitates network management and experimentation. One critical issue of router virtualization is resource allocation of virtual routers. We explore this issue in the user-space design in order to allow extensibility. We develop a user-space load-aware virtual router monitor (LVRM) atop a commodity multicore architecture, with a key feature that it can dynamically manage CPU core resources among virtual routers based on their traffic loads. Also, LVRM adopts an extensible design so that each component can support different variants of implementation. We implement a proof-of-concept prototype for LVRM and empirically evaluate its performance overhead. Our work provides insights into resource management in user space in the context of router virtualization.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"255 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115839815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Middleware for Concurrent Programming in MPI Applications","authors":"Tobias Berka, Helge Hagenauer, M. Vajtersic","doi":"10.1109/ICPPW.2011.39","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.39","url":null,"abstract":"A wide range of computationally intensive applications such as information retrieval, on-line analytical processing and data mining inherently require concurrency, because concurrent data maintenance, query processing and multi-user operation are functional requirements. Therefore, concurrent programming is a prerequisite for such systems. However, existing tools for parallel programming fail to meet these demands for concurrency and the adoption of parallel processing for these application domains is thus hindered. In this paper, we discuss the use of threads and concurrent programming constructs in the state of the art in parallel programming tools and environments. We find that the necessary functionality is available, but often in an inconvenient and unreliable manner. Due to the fact that the programmability and maintainability of parallel programs is a major concern, we consider the existing solutions inadequate or insufficient. We argue that an additional layer of middleware for threads and inter-thread communication and synchronization is necessary to support the effective development of persistently deployed parallel services for our targeted application domain and present the MPI Threads (MPIT) interface specification. We give several real-world examples to demonstrate its use and present performance benchmarks to illustrate the cost of the additional layer of indirection.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127430383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Kangaroo: Reliable Execution of Scientific Applications with DAG Programming Model","authors":"Kai Zhang, Kang Chen, Wei Xue","doi":"10.1109/ICPPW.2011.28","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.28","url":null,"abstract":"As high performance computing (HPC) systems increase in scale with higher potential level of component failure, the need rises for developing fault tolerant systems. However, current fault tolerance mechanisms, including Reply, Check pointing, and Redundant Execution, dose not scale well in large-scale scientific computing. Kangaroo is a reliable execution engine for scientific applications. Parallel programs are modeled as directed acyclic graph (DAG), and executed on clusters with graph theory based scheduling policy. Kangaroo provides effective execution of scalable parallel programs and transparently tolerates failures during runtime. In this paper, we describe the implementations of Kangaroo system, discuss designs of scheduling and fault tolerance, and evaluate the performance by a dense matrix inversion program. The results demonstrate that scheduling policies have a strong effect on program performance. They also demonstrate the feasibility and effectiveness of our approach to fault tolerance.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128406958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Te-Feng Su, Jia-Jhe Li, Chih-Hsueh Duan, Shu-Fan Wang, S. Lai
{"title":"Parallelized Face Based RMS System on a Multi-core Embedded Computing Platform","authors":"Te-Feng Su, Jia-Jhe Li, Chih-Hsueh Duan, Shu-Fan Wang, S. Lai","doi":"10.1109/ICPPW.2011.52","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.52","url":null,"abstract":"A new framework for the Recognition, Mining and Synthesis (RMS)system, has been proposed to make meaningful use of the enormous amount of information. Based on the same concept, we propose a face RMS system, which consists of face detection, facial expression recognition, and facial expression exaggeration components, for generating exaggerated views of different expressions for an input face video. In this paper, the parallel algorithms of the face RMS system were developed to reduce the execution time on a multi-core embedded system. The experimental results show the robustness and efficiency of face RMS system under complex environments. The quantitative comparisons indicate the proposed parallelized strategies has a significant increase in computational speedup compared to the single-processor implementation on a multi-core embedded platform.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127095793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}