2010 First International Conference on Networking and Computing最新文献_第4页

Matrix Multiply-Add in Min-plus Algebra on a Short-Vector SIMD Processor of Cell/B.E. Cell/B.E.短矢量SIMD处理器上最小加代数的矩阵乘加

2010 First International Conference on Networking and Computing Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.29

Kazuya Matsumoto, S. Sedukhin

引用次数: 3

Evaluation Framework for GPU Performance Based on OpenCL Standard 基于OpenCL标准的GPU性能评估框架

2010 First International Conference on Networking and Computing Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.32

Martin Jurecko, J. Kocisová, J. Buša, T. Kasanický, M. Domiter, M. Zvada

引用次数: 3

Proposition of Criteria for Aborting Transaction Based on Log Data Size in LogTM LogTM中基于日志数据大小的事务中止准则的提出

2010 First International Conference on Networking and Computing Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.51

Hiroki Asai, Tomoaki Tsumura, H. Matsuo

{"title":"Proposition of Criteria for Aborting Transaction Based on Log Data Size in LogTM","authors":"Hiroki Asai, Tomoaki Tsumura, H. Matsuo","doi":"10.1109/IC-NC.2010.51","DOIUrl":"https://doi.org/10.1109/IC-NC.2010.51","url":null,"abstract":"Lock-based synchronization techniques are commonly used in parallel programming on multi-core processors. However, lock can cause deadlocks and poor scalabilities. Hence, LogTM has been proposed and studied for lock-free synchronization. LogTM is a kind of hardware transactional memory. In LogTM, transactions are executed speculatively to ensure serializability and atomicity. LogTM stores original values in a log before it is modified by a transaction. If a transaction accesses a shared datum which has been accessed by another transaction running in parallel, LogTM detects it as conflict and restores all data from the associated log and restarts the transaction. This is called aborting. On abort, the costs for restoring data from a log increases in proportion to the data size on the log. However, LogTM selects which transaction should be aborted by their initiated time. Hence, if conflicts occur frequently, it may degrades the performance. This paper proposes a criterion for selecting which transaction should be aborted taking account of data size in each log. In addition, another criterion which takes account of degree of conflict is also proposed. The result of the experiment with SPLASH-2 benchmark suite programs shows that the proposed methods improve the performance 2.7% in maximum.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116255415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CODIE: Continuation-Based Overlapping Data-Transfers with Instruction Execution CODIE:基于连续的重叠数据传输与指令执行

2010 First International Conference on Networking and Computing Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.26

T. Miyoshi, Kenji Kise, H. Irie, T. Yoshinaga

{"title":"CODIE: Continuation-Based Overlapping Data-Transfers with Instruction Execution","authors":"T. Miyoshi, Kenji Kise, H. Irie, T. Yoshinaga","doi":"10.1109/IC-NC.2010.26","DOIUrl":"https://doi.org/10.1109/IC-NC.2010.26","url":null,"abstract":"In this paper, a runtime system termed CODIE is proposed to execute sequential part of programs efficiently in a many-core architecture. All independent processing elements in a many-core architecture use a shared network and off-chip memory. Therefore, contentions on such resources substantially degrade the system performance. On the CODIE system, when a cache miss occurs, the system first initiates a data transfer operation. Next, the system creates a continuation of executing instructions related to the missing data. The continuation is stored into the buffer, and the instructions not related to the missing data are executed subsequently. In other words, data transfer and instruction executions can be performed simultaneously. In this way, the effect of the overhead of the updating cache entry (increased by memory access contention) is tolerated. The results of evaluation show that the proposed CODIE system realizes a 1.86x speed up of the execution of the sequential write/read program on the M-Core architecture at 36 cores and a 1.97x speed up of the execution of the blacks holes(from PARSEC benchmark suite) on the Cell/BE processor with 6 SPEs.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122034265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Traffic Analysis Using Cardinalities and Header Information 使用基数和标题信息的流量分析

2010 First International Conference on Networking and Computing Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.36

Y. Shomura, K. Yoshida, Akira Sato, Satoshi Matsumoto, K. Itano

引用次数: 3

IEEE802.11b/g Standard: Theoretical Maximum Throughput IEEE802.11b/g标准:理论最大吞吐量

2010 First International Conference on Networking and Computing Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.40

J. Bordim, A. V. Barbosa, Marcos F. Caetano, P. S. Barreto

引用次数: 13

A New Wireless TCP Issue in Cognitive Radio Networks 认知无线网络中一个新的无线TCP问题

2010 First International Conference on Networking and Computing Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.37

Yunlei Cheng, E. Wu, Gen-Huey Chen

引用次数: 5

Acceleration of Hessenberg Reduction for Nonsymmetric Eigenvalue Problems Using GPU 利用GPU加速非对称特征值问题的Hessenberg约简

2010 First International Conference on Networking and Computing Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.52

Jun-ichi Muramatsu, Shaoliang Zhang, Yusaku Yamamoto

引用次数: 1

Implementations of Parallel Computation of Euclidean Distance Map in Multicore Processors and GPUs 欧几里得距离图在多核处理器和gpu中的并行计算实现

2010 First International Conference on Networking and Computing Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.55

Duhu Man, K. Uda, Hironobu Ueyama, Yasuaki Ito, K. Nakano

{"title":"Implementations of Parallel Computation of Euclidean Distance Map in Multicore Processors and GPUs","authors":"Duhu Man, K. Uda, Hironobu Ueyama, Yasuaki Ito, K. Nakano","doi":"10.1109/IC-NC.2010.55","DOIUrl":"https://doi.org/10.1109/IC-NC.2010.55","url":null,"abstract":"Given a 2-D binary image of size $n times n$, Euclidean Distance Map (EDM) is a 2-D array of the same size such that each element is storing the Euclidean distance to the nearest black pixel. It is known that a sequential algorithm can compute the EDM in $O(n^2)$ and thus this algorithm is optimal. Also, work-time optimal parallel algorithms for shared memory model have been presented. However, these algorithms are too complicated to implement in existing shared memory parallel machines. The main contribution of this paper is to develop a simple parallel algorithm for the EDM and implement it in two parallel platforms: multicore processors and a Graphics Processing Unit (GPU). More specifically, we have implemented our parallel algorithm in a Linux server with four Intel hexad-core processors (Intel Xeon X7460 2.66GHz). We have also implemented it in a modern GPU system, Tesla C1060, respectively. The experimental results have shown that, for an input binary image with size of $10000times 10000$, our implementation in the multi-core system achieves a speedup factor of 18 over the performance of a sequential algorithm using a single processor in the same system. Meanwhile, for the same input binary image, our implementation on the GPU achieves a speedup factor of 5 over the sequential algorithm implementation.","PeriodicalId":375145,"journal":{"name":"2010 First International Conference on Networking and Computing","volume":"36 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124989381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Efficient Canny Edge Detection Using a GPU 使用GPU进行高效的边缘检测

2010 First International Conference on Networking and Computing Pub Date : 2010-11-17 DOI: 10.1109/IC-NC.2010.13

Kohei Ogawa, Yasuaki Ito, K. Nakano

引用次数: 120