Basil AsSadhan, Hyong S. Kim, José M. F. Moura, Xiaohui Wang
{"title":"Network traffic behavior analysis by decomposition into control and data planes","authors":"Basil AsSadhan, Hyong S. Kim, José M. F. Moura, Xiaohui Wang","doi":"10.1109/IPDPS.2008.4536559","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536559","url":null,"abstract":"In this paper, we analyze network traffic behavior by decomposing header traffic into control and data planes to study the relationship between the two planes. By computing the cross-correlation between the control and data traffics, we observe a general 'similar' behavior between the two planes during normal behavior, and that this similarity is affected during abnormal behaviors. This allows us to focus on abnormal changes in network traffic behavior. We test our approach on the Network Intrusion Dataset provided by the Information Exploration Shootout (IES) project and the 1999 DARPA Intrusion detection Evaluation Dataset from the MIT Lincoln Lab. We find that TCP control and data traffic have high correlation levels during benign normal applications. This correlation is reduced when attacks that affect the aggregate traffic are present in the two datasets.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131733400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. S. Talebi, Fahimeh Jafari, A. Khonsari, M. Moghaddam
{"title":"Proportionally-fair best effort flow control in network-on-chip architectures","authors":"M. S. Talebi, Fahimeh Jafari, A. Khonsari, M. Moghaddam","doi":"10.1109/IPDPS.2008.4536499","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536499","url":null,"abstract":"The research community has recently witnessed the emergence of multi-processor system on chip (MPSoC) platforms consisting of a large set of embedded processors. Particularly, Interconnect networks methodology based on Network-on-Chip (NoC) in MP-SoC design is imminent to achieve high performance potential. More importantly, many well established schemes of networking and distributed systems inspire NoC design methodologies. Employing end-to-end congestion control is becoming more imminent in the design process of NoCs. This paper presents a centralized congestion scheme in the presence of both elastic and streaming flow traffic mixture. In this paper, we model the desired Best Effort (BE) source rates as the solution to a utility maximization problem which is constrained with link capacities while preserving Guaranteed Service (GS) traffics services requirements at the desired level. We proposed an iterative algorithm as the solution to the maximization problem which has the benefit of low complexity and fast convergence. The proposed algorithm may be implemented by a centralized controller with low computation and communication overhead.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"380 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131786532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Implementation of NAMD molecular dynamics non-bonded force-field on the cell broadband engine processor","authors":"Guochun Shi, V. Kindratenko","doi":"10.1109/IPDPS.2008.4536470","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536470","url":null,"abstract":"We present results of porting an important kernel of a production molecular dynamics simulation program, NAMD, to the Cell/B.E. processor. The non-bonded force-field kernel, as implemented in the NAMD SPEC 2006 CPU benchmark, has been implemented. Both single-precision and double-precision floating-point kernel variations are considered, and performance results obtained on the Cell/B.E., as well as several other platforms, are reported. Our results obtained on a 3.2 GHz Cell/B.E. blade show linear speedups when using multiple synergistic processing elements.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127577044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Higor A. V. Alves, Maocir A. Campos, Francisco J. A. Fernandes, Marcia N. S. Kondo, A. N. D. Mello, A. Hira, M. Zuffo, Paola R. G. Accioly, Luiz Guimarães, M. D. Novaes
{"title":"Oncogrid: A proposal of grid infrastructure for the establishment of a national health information system on childhood cancer","authors":"Higor A. V. Alves, Maocir A. Campos, Francisco J. A. Fernandes, Marcia N. S. Kondo, A. N. D. Mello, A. Hira, M. Zuffo, Paola R. G. Accioly, Luiz Guimarães, M. D. Novaes","doi":"10.1109/IPDPS.2008.4536213","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536213","url":null,"abstract":"In Brazil, a country of continental proportions, law requires that health institutions directly involved in the cancer treatment in the country have to register and consolidate cancer patients' data. This information is important for the management and evaluation of the cancer treatment, and for the definition of public policies. In this work we propose the use of a high-performance computational infrastructure, based on grid computing, to ease and accelerate the consolidation of the remotely distributed patients' records in these institutions. In particular, we present a case study of implementation of a tool for patients' survival estimation based on the Kaplan-Meier method. We also discuss the issues related to architecture, implementation and integration, as well as the benefits and prospects for the use of this infrastructure on the establishment of a national health information system.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130717371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using hardware multithreading to overcome broadcast/reduction latency in an associative SIMD processor","authors":"K. Schaffer, R. Walker","doi":"10.1142/S0129626408003533","DOIUrl":"https://doi.org/10.1142/S0129626408003533","url":null,"abstract":"The latency of broadcast/reduction operations has a significant impact on the performance of SIMD processors. This is especially true for associative programs, which make extensive use of global search operations. Previously, we developed a prototype associative SIMD processor that uses hardware multithreading to overcome the broadcast/reduction latency. In this paper we show, through simulations of the processor running an associative program, that hardware multithreading is able to improve performance by increasing system utilization, even for processors with hundreds or thousands of processing elements. However, the choice of thread scheduling policy used by the hardware is critical in determining the actual utilization achieved. We consider three thread scheduling policies and show that a thread scheduler that avoids issuing threads that will stall due to pipeline dependencies or thread synchronization operations is able to maintain system utilization independent of the number of threads.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131120983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational monitoring and steering using network-optimized visualization and Ajax web server","authors":"Mengxia Zhu, C. Wu, N. Rao","doi":"10.1109/IPDPS.2008.4536260","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536260","url":null,"abstract":"We describe a system for computational monitoring and steering of an on-going computation or visualization on a remote host such as workstation or supercomputer. Unlike the conventional \"launch-and-leave\" batch computations, this system enables: (i) continuous monitoring of variables of an on-going remote computation using visualization tools, and (ii) interactive specification of chosen computational parameters to steer the computation. The visualization and control streams are supported over wide-area networks using transport protocols based on stochastic approximation methods to provide stable throughput. Using performance models for transport channels and visualization modules, we develop a visualization pipeline configuration solution that minimizes end-to-end delay over wide- area connections. The user interface utilizes Asynchronous JavaScript and XML (Ajax) technologies to provide an interactive environment that can be accessed by multiple remote users using web browsers. We present experimental results on a geographically distributed deployment to illustrate the effectiveness of the proposed system.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132821344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supporting high performance bioinformatics flat-file data processing using indices","authors":"Xuan Zhang, G. Agrawal","doi":"10.1109/IPDPS.2008.4536176","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536176","url":null,"abstract":"As an essential part of in vitro analysis, biological database query has become more and more important in the research process. A few challenges that are specific to bioinformatics applications are data heterogeneity, large data volume and exponential data growth, constant appearance of new data types and data formats. We have developed an integration system that processes data in their flat file formats. Its advantages include the reduction of overhead and programming efforts. In the paper, we discuss the usage of indicing techniques on top of this flat file query system. Besides the advantage of processing flat files directly, the system also improves its performance and functionality by using indexes. Experiments based on real life queries are used to test the integration system.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132144821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NOISEMINER: An algorithm for scalable automatic computational noise and software interference detection","authors":"Isaac Dooley, Chao Mei, L. Kalé","doi":"10.1109/IPDPS.2008.4536186","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536186","url":null,"abstract":"This paper describes a new scalable stream mining algorithm called NOISEMINER that analyzes parallel application traces to detect computational noise, operating system interference, software interference, or other irregularities in a parallel application's performance. The algorithm detects these occurrences of noise during real application runs, whereas standard techniques for detecting noise use carefully crafted test programs to detect the problems. This paper concludes by showing the output of NOISEMINER for a real-world case in which 6 ms delays, caused by a bug in an MPI implementation, significantly limited the performance of a molecular dynamics code on a new supercomputer.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132626909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards the development of a decentralized market information system: Requirements and architecture","authors":"R. Brunner, Felix Freitag, L. Navarro","doi":"10.1109/IPDPS.2008.4536461","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536461","url":null,"abstract":"In a market, information about its specifications and the behavior of its participants is essential for sophisticated and efficient negotiation strategies. However, there is currently no completely researched system to provide and consult an overall knowledge of economic information in distributed markets. These markets are implemented for example by grid applications and gained importance over the last few years. This paper presents the economic information requirements and a high-level architecture overview for a decentralized market information system (DMIS). The proposed system acquires economic data in a distributed environment for providing it to individual traders or other participants in a decentralized manner. First, we outline the economic information requirements which the system needs to achieve. Therefore their properties and a privacy model has to be considered. Then we propose an architecture for the system which combines technologies of distributed information aggregation system and distributed publish-subscribe models, based on a structured overlay network. The architecture has been designed to meet both the economic information requirements and that of scalability and robustness of a large-scale distributed environment. Initial measurements confirm the proof-of-concept implementation of the existing prototype.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128896266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"High-speed string searching against large dictionaries on the Cell/B.E. Processor","authors":"D. Scarpazza, Oreste Villa, F. Petrini","doi":"10.1109/IPDPS.2008.4536300","DOIUrl":"https://doi.org/10.1109/IPDPS.2008.4536300","url":null,"abstract":"Our digital universe is growing, creating exploding amounts of data which need to be searched, filtered and protected. String searching is at the core of the tools we use to curb this explosion, such as search engines, network intrusion detection systems, spam filters, and anti-virus programs. But as communication speed grows, our capability to perform string searching in real-time seems to fall behind. Multi-core architectures promise enough computational power to cope with the incoming challenge, but it is still unclear which algorithms and programming models to use to unleash this power. We have parallelized a popular string searching algorithm, Aho-Corasick, on the IBM Cell/B.E. processor, with the goal of performing exact string matching against large dictionaries. In this article we propose a novel approach to fully exploit the DMA-based communication mechanisms of the Cell/B.E. to provide an unprecedented level of aggregate performance with irregular access patterns. We have discovered that memory congestion plays a crucial role in determining the performance of this algorithm. We discuss three aspects of congestion: memory pressure, layout issues and hot spots, and we present a collection of algorithmic solutions to alleviate these problems and achieve quasi-optimal performance. The implementation of our algorithm provides a worst- case throughput of 2.5 Gbps, and a typical throughput between 3.3 and 4.4 Gbps, measured on realistic scenarios with a two-processor Cell/B.E. system.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127655550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}