2016 Fourth International Symposium on Computing and Networking (CANDAR)最新文献_第8页

A Cost and Performance Analytical Model for Large-Scale On-Chip Interconnection Networks 大规模片上互连网络的成本与性能分析模型

2016 Fourth International Symposium on Computing and Networking (CANDAR) Pub Date : 2016-11-01 DOI: 10.1109/CANDAR.2016.0083

Takanori Kurihara, Yamin Li

{"title":"A Cost and Performance Analytical Model for Large-Scale On-Chip Interconnection Networks","authors":"Takanori Kurihara, Yamin Li","doi":"10.1109/CANDAR.2016.0083","DOIUrl":"https://doi.org/10.1109/CANDAR.2016.0083","url":null,"abstract":"As an interconnection topology, two-dimensional mesh is widely used in the design of the network-on-chip (NoC) for integrating dozens of cores on a VLSI chip because of its very simple structure and ease of on-chip implementation. However, as the progress of IC technology, it becomes possible to integrate a large-scale system on a chip that contains more than one thousand processing elements or cores. In such a case, mesh topology will deteriorate performance due to the increase of communication time among cores. This paper investigates topologies and IC layout schemes of mesh, torus, hypercube, and metacube for achieving good cost-performance tradeoffs. We propose an analytical model for evaluating cost-performance ratio by considering NoC's topology and layout. The model is parameterized with node degree, graph diameter, the number of routers, the router complexity, the bandwidth of the connection for the router, the number of processing cores, the total length of links, and the cost ratios of the link section and the router section. This model is helpful for us to find out the optimal topology and layout for NoC with a given network size. It was found that when the network size is small, mesh has a better cost-performance than others; as the network size increases, torus and hypercube outperform mesh; and metacube has the best cost-performance among them.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128614829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A Semantic Dataflow Logger Connecting Java Objects and Database Rows and Columns 连接Java对象和数据库行、列的语义数据流记录器

2016 Fourth International Symposium on Computing and Networking (CANDAR) Pub Date : 2016-11-01 DOI: 10.1109/CANDAR.2016.0027

Toshio Ito, Y. Kaneko

引用次数: 0

Polling-Based P2P File Sharing with High Success Rate and Low Communication Cost 基于轮询的P2P文件共享，成功率高，通信成本低

2016 Fourth International Symposium on Computing and Networking (CANDAR) Pub Date : 2016-11-01 DOI: 10.1109/CANDAR.2016.0060

Kouhei Ootani, S. Fujita

引用次数: 0

Topology-Aware Data Aggregation for High Performance Collective MPI-IO on a Multi-core Cluster System 基于拓扑感知的多核集群系统高性能MPI-IO数据聚合

2016 Fourth International Symposium on Computing and Networking (CANDAR) Pub Date : 2016-11-01 DOI: 10.1109/CANDAR.2016.0022

Y. Tsujita, A. Hori, Toyohisa Kameyama, Y. Ishikawa

{"title":"Topology-Aware Data Aggregation for High Performance Collective MPI-IO on a Multi-core Cluster System","authors":"Y. Tsujita, A. Hori, Toyohisa Kameyama, Y. Ishikawa","doi":"10.1109/CANDAR.2016.0022","DOIUrl":"https://doi.org/10.1109/CANDAR.2016.0022","url":null,"abstract":"Parallel I/O such as MPI-IO is one of the performance improvement solutions in parallel computing using MPI. ROMIO is a widely used MPI-IO implementation which addresses to improve collective I/O performance by using its optimization named two-phase I/O. File I/O task is given to a subset of or all of MPI processes, which are called aggregators. Multiple CPUs or CPU cores give a chance to increase computing power by deploying multiple MPI processes per compute node, while such deployment leads to poor I/O performance due to ROMIO's topology-unaware aggregator layout. In our previous work, optimized aggregator layout which was suitable for striping accesses on a Lustre file system improved I/O performance, however, its unbalanced communication load due to unawareness in MPI rank layout among compute nodes led to ineffective data aggregation. To address minimization in data aggregation time for further I/O performance improvements, we introduce a topology-aware data aggregation scheme which takes care of MPI rank layout across compute nodes. The proposal arranges data collection sequence by aggregators in order to mitigate network contention. The optimization has achieved up to 67% improvements in I/O performance compared with the original ROMIO in HPIO benchmark runs using 768 processes on 64 compute nodes of the TSUBAME2.5 supercomputer at the Tokyo Institute of Technology. Even if the number of aggregators was half or 1/3 of the total number of processes, the optimization has still kept comparable I/O performance with the maximum performance.","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124013540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Last Path Caching: A Simple Way to Remove Redundant Memory Accesses of Path ORAM 最后路径缓存:一种简单的方法来消除冗余的内存访问路径ORAM

2016 Fourth International Symposium on Computing and Networking (CANDAR) Pub Date : 2016-11-01 DOI: 10.1109/CANDAR.2016.0068

Naoki Fujieda, Ryoichi Yamauchi, S. Ichikawa

引用次数: 3

The Firing Squad Synchronization Problem on Higher-Dimensional CA with Multiple Updating Cycles 多更新周期高维CA上的行刑队同步问题

2016 Fourth International Symposium on Computing and Networking (CANDAR) Pub Date : 2016-11-01 DOI: 10.1109/CANDAR.2016.0053

L. Manzoni, A. Porreca, H. Umeo

引用次数: 5

Communication Link Switching Method Based on Destination IP Address for Power Savings 基于目的IP地址的通信链路切换方法

2016 Fourth International Symposium on Computing and Networking (CANDAR) Pub Date : 2016-11-01 DOI: 10.1109/CANDAR.2016.0067

Masato Nishiguchi, S. Kimura

引用次数: 2

CPRtree: A Tree-Based Checkpointing Architecture for Heterogeneous FPGA Computing CPRtree:一种基于树的异构FPGA计算检查点架构

2016 Fourth International Symposium on Computing and Networking (CANDAR) Pub Date : 2016-11-01 DOI: 10.1109/CANDAR.2016.0024

H. Vu, S. Kajkamhaeng, Shinya Takamaeda-Yamazaki, Y. Nakashima

{"title":"CPRtree: A Tree-Based Checkpointing Architecture for Heterogeneous FPGA Computing","authors":"H. Vu, S. Kajkamhaeng, Shinya Takamaeda-Yamazaki, Y. Nakashima","doi":"10.1109/CANDAR.2016.0024","DOIUrl":"https://doi.org/10.1109/CANDAR.2016.0024","url":null,"abstract":"FPGAs provide reconfigurability and high performance for parallel applications. Modern FPGAs can be integrated in computing systems as accelerators so that they can combine with host CPU to execute offload applications. This integration puts more pressure on the fault tolerance of computing systems and the question how to improve the dependability becomes crucial. Similar to CPU-based system, checkpoint/restart techniques are expected to be developed and applied to FPGA-based computing systems. There are two issues rising in this situation: how to checkpoint and restart FPGA, and how this checkpoint/restart model works well with the checkpoint/restart model of the whole computing system. In this paper, first we propose a new checkpoint/restart architecture along with a checkpointing mechanism on FPGA. Second, we propose \"fine-grain\" management for checkpointing to reduce performance degradation. Third, we propose a technique to capture consistent snapshots of FPGA and the rest of the computing system. For host software, we also provide CPRtree stack including API functions to manage checkpoint/restart procedures on FPGA. Our experimental results show that the checkpointing architecture causes up to 9.73% maximum clock frequency degradation, small breakdown, and small data footprint, while the LUT overhead varies from 17.98% (Dijkstra) to 160.67% (Matrix Multiplication).","PeriodicalId":322499,"journal":{"name":"2016 Fourth International Symposium on Computing and Networking (CANDAR)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131421325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Service Identification by Packet Inspection Based on N-grams in Multiple Connections 基于N-grams的多连接报文检测服务识别

2016 Fourth International Symposium on Computing and Networking (CANDAR) Pub Date : 2016-11-01 DOI: 10.1109/CANDAR.2016.0123

Masaki Hara, Shinnosuke Nirasawa, A. Nakao, M. Oguchi, Shu Yamamoto, Saneyasu Yamaguchi

引用次数: 5

The Importance of Dynamic Load Balancing among OpenMP Thread Teams for Irregular Workloads OpenMP线程组在不规则工作负载下动态负载平衡的重要性

2016 Fourth International Symposium on Computing and Networking (CANDAR) Pub Date : 2016-11-01 DOI: 10.1109/CANDAR.2016.0097

Xiong Xiao, S. Hirasawa, H. Takizawa, Hiroaki Kobayashi

引用次数: 3