S. Kaushik, Ashutosh Kumar Singh, W. Jigang, T. Srikanthan
{"title":"Run-Time Computation and Communication Aware Mapping Heuristic for NoC-Based Heterogeneous MPSoC Platforms","authors":"S. Kaushik, Ashutosh Kumar Singh, W. Jigang, T. Srikanthan","doi":"10.1109/PAAP.2011.32","DOIUrl":"https://doi.org/10.1109/PAAP.2011.32","url":null,"abstract":"The rapid increase in the complexity of real-life applications has led to the perpetual demand of refined architectural designs. Multiprocessor systems-on-chip (MPSoC) emerges as one of the possible solution for satiating such enormous computational needs. These MPSoCs are employed with Network-On-Chip (NoC) interconnect for power efficient and scalable inter-communication required between processors. Mapping parallelized tasks of applications onto these MPSoCs is the next gigantic problem, which can be done either at design-time or at run-time. However, design-time strategies may sometimes provide a more optimal mapping but they are restricted to predefined set of applications and seem incapable of run-time resource management. On the contrary, run-time mapping techniques overcome this limitation by determining the state of the platform and incorporating resource management before mapping. This paper describes a heuristic for run-time mapping of parallelized tasks of an application considering efficient computation, communication and resource utilization as the main parameters for optimization.","PeriodicalId":213010,"journal":{"name":"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116943601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PCAR: Parallelism Based Cache Replacement Scheme to Exploit Inter-disks Parallelism and Intra-disk Spatial Locality in Parallel Disk Array","authors":"Xiaodong Shi, D. Feng","doi":"10.1109/PAAP.2011.69","DOIUrl":"https://doi.org/10.1109/PAAP.2011.69","url":null,"abstract":"For parallel disk array systems, the parallelism among disks is the key factor influencing the performance and the scale of systems. Unfortunately, the parallelism of cached blocks is largely ignored by cache management schemes that focus on reducing the number of cache misses. Therefore, the performance of parallel disks array systems for workloads with a skew access pattern can be seriously degraded. To solve this problem, we propose a Parallelism based Cache Replacement scheme (PCAR) for parallel disks array systems, which can exploit both of the inter-disks parallelism and the intra-disk spatial locality. We have implemented the prototype of PCAR algorithm in Linux 2.6.18. And, the experimental results show that PCAR outperforms DULO and LRU by up to 22.8% and 33.1% in terms of the average response time, and by up to 20% and 43.9% in terms of throughput.","PeriodicalId":213010,"journal":{"name":"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming","volume":"166 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125972514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Algorithm of Visualization of Reservoir Numerical Simulation Based on PEBI Grids","authors":"Lanfang Dong, D. Lu, Meng Li","doi":"10.1109/PAAP.2011.73","DOIUrl":"https://doi.org/10.1109/PAAP.2011.73","url":null,"abstract":"The speed of calculating, tracking and filling the isolines has a direct impact on the performance of user interaction. In this paper, we begin with the serial algorithm of visualization and implement its parallel algorithm. First, we divide the Delaunay grids generated from the PEBI grids into several regions. Calculation, tracking of isolines and calculation of saturation are implemented in each region respectively. Then the tracking results of each region are integrated for the entire work area. The parallel examples using OpenMP on computers with dual-core/quad-core are given at the end of this paper. The experimental results show that the parallel processing can greatly reduce the time required for data processing in visualization.","PeriodicalId":213010,"journal":{"name":"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116607938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Performance Optimization of Top-k Queries on GPU","authors":"Tao Luo, Guangzhong Sun, Guoliang Chen","doi":"10.1109/PAAP.2011.11","DOIUrl":"https://doi.org/10.1109/PAAP.2011.11","url":null,"abstract":"With the development of web search engines, the concern on real-time performance of Top-k queries has attracted more and more attention. The author studies implement of classic algorithm No Random Access Algorithm in order to optimize performance of Top-k queries on GPU. We give a novel GPU algorithm by using the features of CUDA's programming model. Experiment results show that an implementation of the algorithm on one GPU runs more than 7000 times faster than a single core implementation on a latest CPU.","PeriodicalId":213010,"journal":{"name":"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129933397","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuhao Chen, Zhong Zheng, Li Shen, Wei Chen, Zhiying Wang
{"title":"GSM: An Efficient Code Generation Algorithm for Dynamic Binary Translator","authors":"Xuhao Chen, Zhong Zheng, Li Shen, Wei Chen, Zhiying Wang","doi":"10.1109/PAAP.2011.34","DOIUrl":"https://doi.org/10.1109/PAAP.2011.34","url":null,"abstract":"Dynamic binary translation is an effective way to address binary compatibility problem. Embedded systems and other novel RISC ISAs are developing fast without consideration of the binary compatibility with off-the-shelf x86 applications, making dynamic binary translator (DBT) from CISC to RISC more important. However, dynamic code generation is still inefficient due to the code expansion. Conventional code generators in DBTs use one-to-many mapping scheme between source and target code which cannot take full advantage of the target ISA. We propose a novel lightweight code generation algorithm GSM (Greedy Sub graph Mapping), which can generate compact code with low overhead using many-to-one mapping. GSM is implemented and evaluated in a DBT prototype system called TransARM. Experimental results demonstrate that GSM generates higher quality target code compared to a conventional implementation, which brings code expansion rate close to 1.3. Moreover, GSM causes slightly extra overhead and negligible slowdown of translation, and enables 10% performance improvement for target code execution.","PeriodicalId":213010,"journal":{"name":"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130617200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Multiphase Routing Scheme in Irregular Mesh-Based NoCs","authors":"Xinming Duan, Yuanyuan Li","doi":"10.1109/PAAP.2011.42","DOIUrl":"https://doi.org/10.1109/PAAP.2011.42","url":null,"abstract":"At present, typical application-specific NoC systems often integrate a number of heterogeneous components which have varied functions, sizes and communication requirements. Instead of regular topology networks, constructing irregular mesh topology network on chip (NoCs) becomes an attractive approach to building future NoC systems with irregular structure. Deadlock-free routing control algorithm is a promising problem for irregular mesh topology. The available routing algorithms from regular mesh are not suitable for irregular mesh network. So in this paper, we introduce a hybrid scheme multiphase routing algorithm for irregular mesh integrating oversized rectangle modules. The basic idea of the scheme is borrowed from the area of fault tolerant networks, where a network topology is rendered irregular due to fault regions. The proposed scheme only employs 2 virtual channels per physical channel with fast routing decisions. In the case that the proposed two-phase routing scheme does not keep connection between some pairs of nodes, certain healthy nodes are deactivated to guarantee its deadlock-freeness. A greedy method is presented to ensure that only the minimum nodes are deactivated.","PeriodicalId":213010,"journal":{"name":"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123828129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Huang, Jinshuo Liu, Meisheng Tu, Siyuan Li, Juan Deng
{"title":"Research on CUDA-Based SIFT Registration of SAR Image","authors":"Yang Huang, Jinshuo Liu, Meisheng Tu, Siyuan Li, Juan Deng","doi":"10.1109/PAAP.2011.21","DOIUrl":"https://doi.org/10.1109/PAAP.2011.21","url":null,"abstract":"This paper proposes an implementation and optimization of SIFT algorithm for SAR (Synthetic Aperture Radar) images. It improves the SIFT algorithm efficiency using the graphics processing unit (GPU) architecture based on Compute Unified Device Architecture (CUDA) framework, to attend the request of real-time remote sensing image processing. The experiments of large-size SAR image demonstrate that the algorithms achieve 57.8 times speedup.","PeriodicalId":213010,"journal":{"name":"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123818699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Research of Printed Character Recognition Based on Neural Network","authors":"Yingqiao Shi, Wenbing Fan, Guodong Shi","doi":"10.1109/PAAP.2011.23","DOIUrl":"https://doi.org/10.1109/PAAP.2011.23","url":null,"abstract":"Firstly, This paper introduces the application status of the artificial neural network technology in the print character recognition, and then elaborated on the technology of Standard BP neural network. By formula derivation, we showed that Standard BP neural Network exists some defects in the application, and then we take the approach by adding a momentum term to improve the Network, and increases the training speed. Secondly, we randomly selecte 200 printed number-characters and 50 printed letter-characters as a sample of the improved BP neural network experiments, the results show that the method of the number-character recognition rate higher than the alphabetic characters, the performance of convergence speed and recognition is better.","PeriodicalId":213010,"journal":{"name":"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134315229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Block-Level Linkes Based Content Extraction","authors":"Shixing Shen, Hui Zhang","doi":"10.1109/PAAP.2011.49","DOIUrl":"https://doi.org/10.1109/PAAP.2011.49","url":null,"abstract":"We present block-level links based content extraction (BLCE)-a method to extract content from the web pages by using the link attributes of blocks, which contains the number of links and the length of link text (anchor text).We describe how to divide one web page into blocks and how to merge the similar blocks into one, then compute the number of links and the total length of anchor text. We find that extracting content only with the number of links and length of anchor text is not effective because the number of links and length of link text are proportional to the length of page. Density of links is a good method to solve this. So we use the content links ratios and the content anchor text ratios to describe the link attribute of the blocks. BLCE performs better than other methods especially in the new web pages with DIV and CSS where traditional algorithm can't work well.","PeriodicalId":213010,"journal":{"name":"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132839334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Implementation of Cluster-Based Traffic Simulation System","authors":"Ling Guo, Zhengji Zhou, Jing Xu","doi":"10.1109/PAAP.2011.60","DOIUrl":"https://doi.org/10.1109/PAAP.2011.60","url":null,"abstract":"Self-similarity is one of the most important characteristics of real network traffic. Constructing an accurate model can reflect the real network environment precisely. It is also an effective measure to predict the upcoming network traffic, thus ensuring the quality and reliability of network service. In this paper, we present a cluster-based distributed system, which is based on FGN(Fraction Gaussian Noise) algorithm. With the purpose to generate large-scale traffic and simulate the real network environment. We offer a detailed description for this system in the following three aspects: algorithm, architecture and implementation. Experiments have proved that the cluster-based system can generate large-scale similar network traffic with designated Hurst parameter","PeriodicalId":213010,"journal":{"name":"2011 Fourth International Symposium on Parallel Architectures, Algorithms and Programming","volume":"153 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115384023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}