Y. Miura, Kentaro Shimozono, Naohisa Fukase, S. Watanabe, Kazuya Matoyama
{"title":"An Adaptive Routing Algorithm of 2-D Torus Network Based on Turn Model: The Communication Performance","authors":"Y. Miura, Kentaro Shimozono, Naohisa Fukase, S. Watanabe, Kazuya Matoyama","doi":"10.15803/IJNC.5.1_223","DOIUrl":"https://doi.org/10.15803/IJNC.5.1_223","url":null,"abstract":"A 2-D torus network is one of the most popular networks for parallel processing. Many algorithms have been proposed based on the turn model, but most of them cannot be applied to a torus network without modification. In this paper, we propose North-South First (NSF) routing that is applicable to a 2-D torus and combines the north-first method (NF) and the south-first method (SF). NF and SF are algorithms yielded by the turn model. A software simulation comparing NSF routing with other forms of deterministic and adaptive routing showed that NSF routing improves throughput in three types of communication patterns, but yields no improvement for one other communication pattern.Â","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128204551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Bosilca, Aurélien Bouteiller, T. Hérault, Y. Robert, J. Dongarra
{"title":"Composing resilience techniques: ABFT, periodic and incremental checkpointing","authors":"G. Bosilca, Aurélien Bouteiller, T. Hérault, Y. Robert, J. Dongarra","doi":"10.15803/IJNC.5.1_2","DOIUrl":"https://doi.org/10.15803/IJNC.5.1_2","url":null,"abstract":"Algorithm Based Fault Tolerant (ABFT) approaches promise unparalleled scalability and performance in failure-prone environments. Thanks to recent advances in the understanding of the involved mechanisms, a growing number of important algorithms (including all widely used factorizations) have been proven ABFT-capable. In the context of larger applications, these algorithms provide a temporal section of the execution, where the data is protected by its own intrinsic properties, and can therefore be algorithmically recomputed without the need of checkpoints. However, while typical scientific applications spend a significant fraction of their execution time in library calls that can be ABFT-protected, they interleave sections that are difficult or even impossible to protect with ABFT. As a consequence, the only practical fault-tolerance approach for these applications is checkpoint/restart. In this paper we propose a model to investigate the efficiency of a composite protocol, that alternates between ABFT and checkpoint/restart for the effective protection of an iterative application composed of ABFT- aware and ABFT-unaware sections. We also consider an incremental checkpointing composite approach in which the algorithmic knowledge is leveraged by a novel optimal dynamic program- ming to compute checkpoint dates. We validate these models using a simulator. The model and simulator show that the composite approach drastically increases the performance delivered by an execution platform, especially at scale, by providing the means to increase the interval between checkpoints while simultaneously decreasing the volume of each checkpoint.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122409355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Novel Computational Model for GPUs with Applications to Efficient Algorithms","authors":"A. Koike, K. Sadakane","doi":"10.15803/IJNC.5.1_26","DOIUrl":"https://doi.org/10.15803/IJNC.5.1_26","url":null,"abstract":"We propose a novel computational model for GPUs. Known parallel computational models such as the PRAM model are not appropriate for evaluating GPU-based algorithms. Our model, called AGPU , abstracts the essence of current GPU architectures such as global and shared memory, memory coalescing and bank conflicts. Using our model, we can evaluate asymptotic behavior of GPU algorithms more efficiently than the known models and we can develop algorithms that run fast on real GPU devices. As a showcase, we analyze the asymptotic behavior of basic existing algorithms including reduction, prefix scan, and comparison sorting. We further develop new algorithms by detecting and resolving performance bottlenecks of the existing algorithms. Our reduction algorithm has the optimal time and I/O complexities and works with non-commutative operators. Our com- parison sorting algorithm has the optimal I/O complexity. Additionally, we show our algorithms run faster than the existing algorithms not only in theory but also in practice.Â","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127743206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development of an Algorithm for Extracting Parallelism and Pipeline Structure from Stream-based Processing flow with Spanning Tree","authors":"S. Yamagiwa, G. Wang, K. Wada","doi":"10.15803/IJNC.5.1_159","DOIUrl":"https://doi.org/10.15803/IJNC.5.1_159","url":null,"abstract":"It is a fashion to use the manycore accelerators to promote the computing power in a computing platform. Especially GPU is one of the main series of the high performance computing, which is also employed by top supercomputers in the world. Programming methods on such accelerators includes development of control programs which accelerators executes to schedule the invocation of the accelerator’s kernel program. The kernel program needs to be written based on the stream computing paradigm. Connecting I/Os of the kernel programs, we can develop a large application. When we consider the processing flow as a directed graph, we can implement a GUI-based programming tool for the accelerators. It visualizes a pipeline-based processing flow. However, it is very hard to find a starting point of a complex processing flow. Moreover, although the processing pipeline include the potential parallelism, it is hard for the programmer to exploit it intuitively. This paper proposes an algorithm applying the spanning tree that mechanically exploits the parallelism and determines an execution order. To verify the algorithm, this paper performs evaluation with realistic applications. The algorithm exploits effectively the parallelism and construct the optimal pipeline processing flow.Â","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133980091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Handling Non-determinism with Description Logics using a Fork/Join Approach","authors":"J. Faddoul, W. MacCaull","doi":"10.15803/IJNC.5.1_61","DOIUrl":"https://doi.org/10.15803/IJNC.5.1_61","url":null,"abstract":"The increasing use of Ontologies, formulated using expressive Description Logics, for time sensitive applications necessitates the development of fast (near realtime) reasoning tools. Mul- ticore processors are nowadays widespread across desktop, laptop, server, and even smartphone and tablets devices. The rise of such powerful execution environments calls for new parallel and distributed Description Logics (DLs) reasoning algorithms. Many sophisticated optimizations have been explored and have considerably enhanced DL reasoning with light ontologies. Non- determinism remains a main source of complexity for implemented systems handling ontologies relying on more expressive logics. In this work, we explore handling non-determinism with DL languages enabling qualified cardinality restrictions. We implement a fork/join parallel framework into our tableau-based al- gebraic reasoner, which handles qualified cardinality restrictions and nominals using in-equation solving. The preliminary results are encouraging and show that using a parallel framework with algebraic reasoning is worth investigating and more promising than parallelizing standard tableau-based reasoning.Â","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127088587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Employing Cooperative Communication to Recover Network Connectivity in Ad Hoc Networks","authors":"U. R. Afonseca, Thiago F. Neves, J. Bordim","doi":"10.15803/IJNC.4.2_336","DOIUrl":"https://doi.org/10.15803/IJNC.4.2_336","url":null,"abstract":"In wireless ad hoc networks, bridges and articulation nodes are critical elements that, in case of failure, render the network disconnected. Owing to their relevance, a number of works try to extend the life span of these elements. Nevertheless, in critical situations, such as the unavailability of a critical link, ways to reestablish the communication, even if for short periods of time, can be of importance in a number of urgent tasks. In this context, this work explores the concept of Cooperative Communication (CC) to monitor critical nodes and links and recover network connectivity in case of disruption. Unlike other works that perform exhaustive search to locate suitable CC-links that require global topology information, the proposed scheme identifies critical nodes and links based solely on local information. Compared to other prominent works, the proposed solution was able to reduce the computing cost to create CC-links in ≈ 67 times in the evaluated scenarios while persisting a lower message overhead.Â","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114834940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Yamanaka, H. Takeshita, S. Okamoto, Takehiro Sato
{"title":"Using Optical-Approaches to Raise Energy Efficiency of Future Central and/or Linked Distributed Data Center Network Services","authors":"N. Yamanaka, H. Takeshita, S. Okamoto, Takehiro Sato","doi":"10.15803/IJNC.4.2_209","DOIUrl":"https://doi.org/10.15803/IJNC.4.2_209","url":null,"abstract":"Novel optical network architectures are proposed for creating future network services. The �rst architecture is a centralized approach for higher energy efficiency; it yields a data centercentric metro/access optical aggregation network based on wavelength/time- multiplexing. Not only higher application layer functions but also all layer-3 or upper traffic are transferred through the simple metro/access optical aggregation network and switched in a huge centralized giant router at the data center. Its simple optical switching is 200 times more energy efficient and only one electrical router is needed, so power consumption of the network can be reduced ten or twenty fold compared to the existing Internet. The second is service mash-up by linked data through a network that uses broadband optical wire for the IoT era. All service contents,","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124275140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Liu, Lin Meng, Ittetsu Taniguchi, H. Tomiyama
{"title":"Novel List Scheduling Strategies for Data Parallelism Task Graphs","authors":"Yang Liu, Lin Meng, Ittetsu Taniguchi, H. Tomiyama","doi":"10.15803/IJNC.4.2_279","DOIUrl":"https://doi.org/10.15803/IJNC.4.2_279","url":null,"abstract":"This paper studies task scheduling algorithms which schedule a set of tasks on multiple cores so that the total scheduling length is minimized. Most of the algorithms developed in the past assume that a task is executed on a single core. Unlike the previous algorithms, the algorithms studied in this paper allow a task to be executed on multiple cores. This paper proposes six algorithms. All of the six algorithms are based on list scheduling, but the strategy for priority assignment is different. In our experiments, the six algorithms as well as an integer linear programming method are evaluated.Â","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127971138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pronto: A Low Overhead Message Passing System for High Performance Many-Core Processors","authors":"Sumeet S. Kumar, M. T. A. Djie, R. V. Leuken","doi":"10.15803/IJNC.4.2_307","DOIUrl":"https://doi.org/10.15803/IJNC.4.2_307","url":null,"abstract":"Many-core processors provide the raw computation power required by modern high-performance multimedia and signal processing workloads. The conversion of this computation power into ex- ecution performance is often constrained by the overheads of communication between concurrent tasks. This paper presents Pronto, a low overhead message passing system which simplies the semantics of data movement between communicating tasks by performing buffer management, message synchronization and address translation directly in hardware. The integration of these functions into hardware results in transfer latencies upto 30% shorter than state of the art MPI derivatives. The overheads for communication with Pronto in an 18-core processor array are under 5% for 64-word burst transfers, and less than 0.5% of total execution time using workloads such as the JPEG decoder and FIRlter. Furthermore, this paper also studies the effect of task mapping and interconnect traffic on the predictability of data block arrival times, and provides insight on where interconnect contention can be tolerated.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121429715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}