Long-Sheng Li, Shr-Shiuan Tzeng, Rui-Chung Bai, Mengxia Li
{"title":"End to End Security and Path Security in Network Mobility","authors":"Long-Sheng Li, Shr-Shiuan Tzeng, Rui-Chung Bai, Mengxia Li","doi":"10.1109/ICPPW.2011.35","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.35","url":null,"abstract":"At RFC 3776, IP security protocol (IPsec) has been implemented in mobile IP for securing IP datagram at IP layer. Previous research only considered the traffic between mobile node (MN) and home agent (HA), but the traffic from HA to correspondent node (CN) was not considered. Network Mobility (NEMO) is based on Mobile IPv6 (MIPv6), so it inherits the same problem of only providing protection between mobile router (MR) and MR_HA. This paper aims to improve the security vulnerability by proposing a nested IPsec Encapsulating Security Payload (ESP) scheme capable of establishing nested IPsec ESP from MN to CN. The proposed scheme obviously enhances security with confidentiality and integrity in NEMO.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128027171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Recomposing an Irregular Algorithm Using a Novel Low-Level PGAS Model","authors":"M. Cason, P. Kogge","doi":"10.1109/ICPPW.2011.55","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.55","url":null,"abstract":"This paper presents analysis and simulation results for a toolkit of parallel graph traversal primitives which were built using a novel, low-level partitioned global address space (PGAS) programming model. Unlike high-level HEC PGAS languages (UPC, Chapel, Fortress), this mobile-subjective (MoS) model does not hide parallelization or communication overhead in the compiler or runtime. Unlike other low-level HEC languages (C/MPI) this model provides 1) facilities for fine-grain synchronization, 2) PGAS view of memory, and 3) object encapsulation. This paper shows how this programming model facilitated the transformation of the well-studied minimum spanning forest (MSF) algorithm into a new MSF algorithm which allowed for million way well-behaved parallelism on a novel multithreaded architecture. We provide analysis to show why naive formulations of MSF are not scalable for certain input graphs. We then provide analysis of the MoS reformulation to show how scalability is achieved by ensuring a good distribution of data and computation for arbitrary input graphs.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114178011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Efficient Approach of Power Reducing for Scratch-Pad Memory Based Embedded Systems","authors":"Yanqin Yang, Wenchao Xu, M. Guo, Z. Shao","doi":"10.1109/ICPPW.2011.11","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.11","url":null,"abstract":"Scratch-pad memory (SPM) is widely used in embedded systems. It is a topical and crucial subject to reduce power consumption for SPM systems, since high power consumption can reduce systems reliability and increase the cost and size of heat sinks. In this paper, we propose an effective approach of power reducing to scale down voltage and frequency as much as possible. We first pipelined data transference and processing. Second, we find the comparative time slack between fast data processing and low data transference, and then provide both single and dynamic scaling to reduce power consumption. We conduct our approach on the simulator of Trim ran, and the experimental results show that the approach achieves significant power reduction improvement while the run-time performance outperforms previous work.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123688539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Routing and Buffering Strategies in Delay-Tolerant Networks: Survey and Evaluation","authors":"Shou-Chih Lo, Min-Hua Chiang, Jhan-Hua Liou, Jhih-Siao Gao","doi":"10.1109/ICPPW.2011.19","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.19","url":null,"abstract":"Delay Tolerant Networks (DTNs) have attracted considerable attention in recent years. This kind of network works in communication environments subject to delays and disruptions. Traditional end-to-end routing fails in DTNs due to intermittent connections. A variety of routing strategies for DTNs have been proposed in the past. In this paper, we present a survey of these strategies and provide a classification. Moreover, we evaluate their use in social contact networks. Buffer management strategies are also needed to support routing operations. The technical issue is to design sorting policies that determine the transmission and drop order of messages in the buffer. We identify several sorting indexes and evaluate their performance.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122614486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trusted Dynamic Scheduling for Large-Scale Parallel Distributed Systems","authors":"Wei Wang, Guosun Zeng","doi":"10.1109/ICPPW.2011.8","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.8","url":null,"abstract":"The terms parallel distributed systems, and grid and cloud computing, actually refer to slightly different things. But the underlying concept is the same. This is based on delivering computing resources through a large and often global network of computers. To meet the new requirements of a massive distributed computing paradigm, such as cloud, in this paper, a kind of trust mechanism-based task scheduling model was presented. Referring to the trust relationship models of social persons, trust relationship is built among computing nodes, and the trustworthiness of nodes is evaluated by utilizing the Bayesian cognitive method. Moreover, a benchmark is structured to span a range of parallelism and computing characteristics for evaluation the proposed method. Theoretical analysis and simulations prove that the proposed algorithm can efficiently meet the requirement of large-scale workloads in trust, sacrificing fewer time costs, and assuring the execution of tasks in a security way in parallel distributed computing environment.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121501797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"JETS: Language and System Support for Many-Parallel-Task Computing","authors":"J. Wozniak, M. Wilde","doi":"10.1109/ICPPW.2011.64","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.64","url":null,"abstract":"Many-task computing is a well-established paradigm for implementing loosely coupled applications on large-scale computing systems. However, few of the model's existing implementations provide efficient, low-latency support for the execution of tightly coupled applications as atomic tasks. Thus, a vast array of parallel applications cannot readily be used effectively within many-task workloads. In this work, we present JETS, a middleware component that provides high performance support for many-emph{parallel}-task-computing (MPTC). JETS is based on a highly concurrent approach to parallel task dispatch and on new capabilities now available in the MPICH2 MPI implementation and the ZeptoOS Linux operating system. JETS represents an advancement over the few known examples of multi-level many-parallel-task scheduling systems by more efficiently scheduling many emph{short-duration} parallel application invocations, by overcoming the challenges of coupling the user processes of each application invocation via the messaging fabric, and by concurrently managing many application executions in various stages. We report here on the JETS architecture and its performance on both synthetic benchmarks and the NAMD molecular dynamics application.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128749369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"View-Oriented Transactional Memory","authors":"K. Leung, Zhiyi Huang","doi":"10.1109/ICPPW.2011.10","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.10","url":null,"abstract":"This paper proposes the View-Oriented Transactional Memory (VOTM) model to seamlessly integrate locking mechanism and transactional memory. The VOTM model allows programmers to partition the shared memory into \"views\", which are non-overlapping sets of shared data objects. The Restricted Admission Control (RAC) scheme can then control the number of processes accessing each view individually in order to reduce the number of aborts of transactions. The RAC scheme has the merits of both the locking mechanism and the transactional memory. Experimental results demonstrate that VOTM outperforms traditional transactional memory models such as TinySTM by up to 270%.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132003689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CellPilot: A Seamless Communication Solution for Hybrid Cell Clusters","authors":"N. Girard, W. B. Gardner, J. Carter, G. Grewal","doi":"10.1109/ICPPW.2011.61","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.61","url":null,"abstract":"The Cell Pilot library provides a comprehensive inter process communication solution for parallel programming in Con clusters comprised of Cell BE and other computers. It extends the process/channel approach of the existing Pilot library to cover processes running on Cell PPEs and SPEs. The same simple API is used to read and write messages on channels defined between pairs of processes regardless of location, while hiding communication details from the user. Cell Pilot uses MPI for inter-node communication, and the Cell SDK within a Cell node.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"379 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121239851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Espeland, P. Beskow, H. Stensland, Preben N. Olsen, S. Kristoffersen, C. Griwodz, P. Halvorsen
{"title":"P2G: A Framework for Distributed Real-Time Processing of Multimedia Data","authors":"H. Espeland, P. Beskow, H. Stensland, Preben N. Olsen, S. Kristoffersen, C. Griwodz, P. Halvorsen","doi":"10.1109/ICPPW.2011.22","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.22","url":null,"abstract":"The computational demands of multimedia data processing are steadily increasing as consumers call for progressively more complex and intelligent multimedia services. New multi-core hardware architectures provide the required resources, but writing parallel, distributed applications remains a labor-intensive task compared to their sequential counter-part. For this reason, Google and Microsoft implemented their respective processing frameworks MapReduce and Dryad, as they allow the developer to think sequentially, yet benefit from parallel and distributed execution. An inherent limitation in the design of these batch processing frameworks is their inability to express arbitrarily complex workloads. The dependency graphs of the frameworks are often limited to directed acyclic graphs, or even pre-determined stages. This is particularly problematic for video encoding and other algorithms that depend on iterative execution. With the Nornir runtime system for parallel programs, which is a Kahn Process Network implementation, we addressed and solved several of these limitations. However, it is more difficult to use than other frameworks due to its complex programming model. In this paper, we build on the knowledge gained from Nornir and present a new framework, called P2G, designed specifically for developing and processing distributed real-time multimedia data. P2G supports arbitrarily complex dependency graphs with cycles, branches and deadlines, and provides both data- and task-parallelism. The framework is implemented to scale transparently with available (heterogeneous) resources, a concept familiar from the cloud computing paradigm. We have implemented an (interchangeable) P2G kernel language to ease development. In this paper, we present a proof of concept implementation of a P2G execution node and some experimental examples using complex workloads like Motion JPEG and K-means clustering. The results show that theP2G system is a feasible approach to multimedia processing.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127005106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chun-Yi Shih, Ming Li, Chao-Sheng Lin, Pao-Ann Hsiung, Chih-Hung Chang, W. Chu, Nien-Lin Hsueh, Chihhsiong Shih, Chao-Tung Yang, C. Koong
{"title":"Adaptive Performance Monitoring for Embedded Multicore Systems","authors":"Chun-Yi Shih, Ming Li, Chao-Sheng Lin, Pao-Ann Hsiung, Chih-Hung Chang, W. Chu, Nien-Lin Hsueh, Chihhsiong Shih, Chao-Tung Yang, C. Koong","doi":"10.1109/ICPPW.2011.27","DOIUrl":"https://doi.org/10.1109/ICPPW.2011.27","url":null,"abstract":"With the advent of multicore processors, the performance of software has been elevated to new unforeseen heights via parallelization. However, this has not been achieved without new problems cropping up due to parallelization. One serious issue is the performance bottleneck due to cache misses or resource starvation, which is hard to detect in application software especially when the software has dynamically changing behavior. Performance monitors are usually employed for such purposes. Nevertheless, monitors have introduced their own computation and communication overheads, especially in embedded multicore systems. In this work, we try to estimate the effects of monitor overheads on different types of applications, such as CPU-bound and IO-bound tasks. Further, we give suggestions on the number and type of monitors to use for such embedded multicore applications. Besides trying to reduce monitor overheads, we also aim for the accuracy and the immediacy of the monitored information. Through a real-world example, namely digital video recording system, we demonstrate how different monitoring periods affect the tradeoff between accuracy and immediacy of the monitored information.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130323070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}