{"title":"Parallel Construction of Bidirected String Graphs for Genome Assembly","authors":"Benjamin G. Jackson, S. Aluru","doi":"10.1109/ICPP.2008.70","DOIUrl":"https://doi.org/10.1109/ICPP.2008.70","url":null,"abstract":"Graph theoretic models for genome assembly are continually being proposed and refined. At the same time, large scale assembly projects rely on the overlap-layout-consensus assembly paradigm, in which the best pairwise alignments serve as seeds for a greedy extension of contigs. These methods, which largely rely on local information, are used despite research that demonstrates the superiority of other graph models, largely because the memory requirement of such models is prohibitive on single processor architectures. In this paper, we present a parallel algorithm for constructing bidirected string graphs from whole genome shotgun sequencing data, for use in the assembly problem. Our algorithm uses O(n/p) local computation - where n is the total size of shotgun sequences and p is the number of processors - and a constant number of all-to-all communications. We demonstrate scalability of the algorithm on the Blue Gene/L, and show that graphs for large, complex genome sequencing projects with deep sequence coverage can be effectively handled using parallel computers.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130945080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Performance Counter Based Workload Characterization on Blue Gene/P","authors":"K. Ganesan, L. John, V. Salapura, J. Sexton","doi":"10.1109/ICPP.2008.57","DOIUrl":"https://doi.org/10.1109/ICPP.2008.57","url":null,"abstract":"IBM's Blue Gene/P, the second generation of the Blue Genesupercomputer is designed with a Universal Performance Counter (UPC) Unit at each node capable of monitoring 256 events concurrently, unlike many microprocessors that provide only a few performance counters. In this paper we demonstrate the efficacy of the interface library that we have developed, taking advantage of the UPC unit, enabling users to effortlessly instrument applications and get a profound insight into its execution on the Blue Gene/P system which could scale in thousands of nodes. The interface library allows the user to monitor about 512 performance related events out of a total of 1024 possible events and aggregate the data collected at different nodes and compute meaningful metrics through data mining.Using the developed interface, we instrumented the NAS parallel benchmarks and collected the performance counter data. We studied the MFLOPS, L3-DDR Traffic and the dynamic instruction mix based on the counters in the FPU and the cache hierarchy for different compiler optimizations, modes of operations of the system and different L3, L2 configurations for the NAS benchmarks. Our analysis identifies that compiler optimization O5 along with \"-qarch440d\", which uses the architectural information of the chip in optimization, is very effective in incorporating a lot of SIMD instructions and results in the most efficient execution of the benchmarks. The experiments on the L3 size indicate that an L3 size of 4MB is optimal for the NAS benchmarks and they do not benefit by increasing it further. Also, the virtual node mode of operation of the Blue Gene/P system is very effective and yields superior performance for the selected benchmarks taking advantage of the chip multiprocessor architecture of the quad-core HPC chip.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123869537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sirui Yang, Hai Jin, Bo Li, Xiaofei Liao, Hong Yao, Xuping Tu
{"title":"The Content Pollution in Peer-to-Peer Live Streaming Systems: Analysis and Implications","authors":"Sirui Yang, Hai Jin, Bo Li, Xiaofei Liao, Hong Yao, Xuping Tu","doi":"10.1109/ICPP.2008.80","DOIUrl":"https://doi.org/10.1109/ICPP.2008.80","url":null,"abstract":"There has been significant progress in the development and deployment of peer-to-peer (P2P) live video streaming systems. However, there has been little study on the security aspect in such systems. Our prior experiences in Anysee exhibit that existing systems are largely vulnerable to intermediate attacks, in which the content pollution is a common attack that can significantly reduce the content availability, and consequently impair the playback quality. This paper carries out a formal analysis of content pollution and discusses its implications in P2P live video streaming systems. Specifically, we establish a probabilistic model to capture the progress of content pollution. We verify the model using a real implementation based on Anysee system; we evaluate the content pollution effect through extensive simulations. We demonstrate that (1) the number of polluted peers can grow exponentially, similar to random scanning worms. This is vital that with 1% polluters, the overall system can be compromised within minutes; (2) the effective bandwidth utilization can be sharply decreased due to the transmission of polluted packets; (3) Augmenting the number of polluters does not imply a faster progress of content pollution, in which the most influential factors are the peer degree and access bandwidth. We further examine several techniques and demonstrate that a hash-based signature scheme can be effective against the content pollution, in particular when being used during the initial phase.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"170 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133661168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A List-Based Strategy for Optimal Replica Placement in Data Grid Systems","authors":"Yi-Fang Lin, Jan-Jan Wu, Pangfeng Liu","doi":"10.1109/ICPP.2008.31","DOIUrl":"https://doi.org/10.1109/ICPP.2008.31","url":null,"abstract":"Data replications is a typical strategy for improving access performance and data availability in data grid systems. Current works on data replication in grid systems focus on the infrastructure for data replication and the mechanism of replicas creation and deletion.The important problem of choosing suitable locations for placing replicas in data grids has not been fully studied. This paper addresses replica placement problem in data grids when given a sequence of priority lists that specify the forwarding policies for data requests. We propose the concept of priority list to address two issues. First, a user may have limited authority in accessing the resources, and thus his/her data requests should be prohibited from accessing some of the sites. Second, a static policy may not satisfy a data request with special requirements (e.g. quality of service requirement). In this priority-list-based model we propose a placement algorithm that finds optimal locations for replicas so that the workload among the replicas is balanced. We also propose an algorithm that determines the minimum number of replicas when the maximum workload capacity of each replica is given.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"114 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116088829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Michael F. Spear, M. Silverman, Luke Dalessandro, Maged M. Michael, M. Scott
{"title":"Implementing and Exploiting Inevitability in Software Transactional Memory","authors":"Michael F. Spear, M. Silverman, Luke Dalessandro, Maged M. Michael, M. Scott","doi":"10.1109/ICPP.2008.55","DOIUrl":"https://doi.org/10.1109/ICPP.2008.55","url":null,"abstract":"Transactional Memory (TM) takes responsibility for concurrent, atomic execution of labeled regions of code, freeing the programmer from the need to manage locks. Typical implementations rely on speculation and rollback, but this creates problems for irreversible operations like interactive I/O. A widely assumed solution allows a transaction to operate in an inevitable mode that excludes all other transactions and is guaranteed to complete, but this approach does not scale. This paper explores a richer set of alternatives for software TM, and demonstrates that it is possible for an inevitable transaction to run in parallel with (non-conflicting) non-inevitable transactions, without introducing significant overhead in the non-inevitable case. We report experience with these alternatives in a graphical game application. We also consider the use of inevitability to accelerate certain common-case transactions.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124760236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scheduling CPU-Intensive Grid Applications Using Partial Information","authors":"Nelson Nobrega, L. D. Assis, F. Brasileiro","doi":"10.1109/ICPP.2008.40","DOIUrl":"https://doi.org/10.1109/ICPP.2008.40","url":null,"abstract":"Scheduling parallel applications on computational grids is a difficult task. In order to map the parallel application's tasks onto resources in a efficient way, grid schedulers apply scheduling heuristics. The existing scheduling heuristics can be broadly classified in two approaches: i) bin-packing schedulers, and ii) replication schedulers. The first approach requires complete and accurate information about the applications and the grid environment. The second approach does not use any information but, instead, applies the principle of task replication to achieve good performance. Each of these approaches have drawbacks; attaining accurate and complete information about resources and applications is not always possible in a grid environment, while the redundancy of replication schedulers yield an extra consumption of resources. In this work, we investigate the trade-off between these two approaches. We propose scheduling heuristics that use any available information to perform efficient scheduling of bag-of-tasks applications, a subclass of parallel applications. Our results show that judicious use of whatever information is available leads to a reduction on resource consumption, without compromising the application's performance.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124818819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Resource Allocation for Distributed Streaming Applications","authors":"Qian Zhu, G. Agrawal","doi":"10.1109/ICPP.2008.49","DOIUrl":"https://doi.org/10.1109/ICPP.2008.49","url":null,"abstract":"We consider resource allocation for distributed streaming applications running in a grid environment, where continuously streaming data needs to be aggregated and processed to produce output streams. Because such an application comprises a pipeline of processing stages, both communication and computational requirements need to be taken into account while performing resource allocation. In this paper, we give a rigorous formulation of this resource allocation problem, based on the DAG representation of the application as well as the environment. We have shown how we can use the notion of subgraph isomorphism and developed an effective resource allocation algorithm. The main observations from the experiments we conducted to evaluate our algorithms were as follows: the overhead caused by our algorithm is comparable to an existing algorithm, Streamline, which is based onheuristics. At the same time, the application performance was improved by 30% on average. When compared to the allocation performed by the optimal algorithm, which enumerates all mappings, the application performance with our algorithm was within 4%. At the same time, unlike the optimal algorithm, our algorithm scaled well to large graphs.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121687782","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jiexing Gu, Ziming Zheng, Z. Lan, John White, Eva Hocks, Byung H. Park
{"title":"Dynamic Meta-Learning for Failure Prediction in Large-Scale Systems: A Case Study","authors":"Jiexing Gu, Ziming Zheng, Z. Lan, John White, Eva Hocks, Byung H. Park","doi":"10.1109/ICPP.2008.17","DOIUrl":"https://doi.org/10.1109/ICPP.2008.17","url":null,"abstract":"Despite great efforts on the design of ultra-reliable components, the increase of system size and complexity has outpaced the improvement of component reliability. As a result, fault management becomes crucial in high performance computing. The advance of fault management relies on effective failure prediction. Despite years of research on failure prediction, it remains an open problem, especially in large-scale systems. In this paper, we address the problem by presenting a dynamic meta-learning prediction engine. It extends our previous work by exploring dynamic training, testing and prediction. Here, the \"dynamic\" part is from two perspectives: one is to continuously increase the training set during the system operation; and the other is to dynamically modify the rules of failure patterns by tracing prediction accuracy at runtime. Our case study indicates that the proposed predictor is promising by being capable of capturing more than 70% of failures, with the false alarm rate less than 10%.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"54 63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130258946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimizing Issue Queue Reliability to Soft Errors on Simultaneous Multithreaded Architectures","authors":"Xin Fu, Wangyuan Zhang, Tao Li, J. Fortes","doi":"10.1109/ICPP.2008.23","DOIUrl":"https://doi.org/10.1109/ICPP.2008.23","url":null,"abstract":"The issue queue (IQ) is a key microarchitecture structure for exploiting instruction-level and thread-level parallelism in dynamically scheduled simultaneous multithreaded (SMT) processors. However, exploiting more parallelism yields high susceptibility to transient faults on a conventional IQ. With the rapidly increasing soft error rates, the IQ is likely to be a reliability hot-spot on SMT processors fabricated with advanced technology nodes using smaller and denser transistors with lower threshold voltages and tighter noise margins. In this paper, we explore microarchitecture techniques to optimize IQ reliability to soft error on SMT architectures. We propose to use off-line instruction vulnerability profiling to identify reliability critical instructions. The gathered information is then used to guide reliability-aware instruction scheduling and resource allocation in multithreaded execution environments. We evaluate the efficiency of the proposed schemes across various SMT workload mixes. Extensive simulation results show that, on average, our microarchitecture level soft error mitigation techniques can significantly reduce IQ vulnerability by 42% with 1% performance improvement. To maintain runtime IQ reliability for pre-defined thresholds, we propose dynamic vulnerability management (DVM) mechanisms. Experimental results show that our DVM techniques can effectively achieve desired reliability/performance tradeoffs.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"337 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116451352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lin Gao, Q. Nguyen, Lian Li, Jingling Xue, Tin-fook Ngai
{"title":"Thread-Sensitive Modulo Scheduling for Multicore Processors","authors":"Lin Gao, Q. Nguyen, Lian Li, Jingling Xue, Tin-fook Ngai","doi":"10.1109/ICPP.2008.46","DOIUrl":"https://doi.org/10.1109/ICPP.2008.46","url":null,"abstract":"This paper describes a generalisation of modulo scheduling to parallelize loops for SpMT processors that exploits simultaneously both instruction-level parallelism and thread-level parallelism while preserving the simplicity and effectiveness of modulo scheduling. Our generalisation is simple, drops easily into traditional modulo scheduling algorithms such as Swing in GCC 4.1.1 and produces good speedups for SPECfp2000 benchmarks, particularly in terms of its ability in parallelising DOACROSS loops.","PeriodicalId":388408,"journal":{"name":"2008 37th International Conference on Parallel Processing","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125281708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}