{"title":"Exploiting Long-Term Temporal Cache Access Patterns for LRU Insertion Prioritization","authors":"Shane Carroll, Wei-Ming Lin","doi":"10.1142/S0129626421500109","DOIUrl":"https://doi.org/10.1142/S0129626421500109","url":null,"abstract":"In a CPU cache utilizing least recently used (LRU) replacement, cache sets manage a buffer which orders all cache lines in the set from LRU to most recently used (MRU). When a cache line is brought into cache, it is placed at the MRU and the LRU line is evicted. When re-accessed, a line is promoted to the MRU position. LRU replacement provides a simple heuristic to predict the optimal cache line to evict. However, LRU utilizes only simple, short-term access patterns. In this paper, we propose a method that uses a buffer called the history queue to record longer-term access-eviction patterns than the LRU buffer can capture. Using this information, we make a simple modification to LRU insertion policy such that recently-recalled blocks have priority over others. As lines are evicted, their addresses are recorded in a FIFO history queue. Incoming lines that have recently been evicted and now recalled (those in the history queue at recall time) remain in the MRU for an extended period of time as non-recalled lines entering the cache thereafter are placed below the MRU. We show that the proposed LRU insertion prioritization increases performance in single-threaded and multi-threaded workloads in simulations with simple adjustments to baseline LRU.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129360478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LSTM Cell Implementation on FPGAs","authors":"G. Dec","doi":"10.1142/S0129626421500110","DOIUrl":"https://doi.org/10.1142/S0129626421500110","url":null,"abstract":"This paper presents and discusses the implementation of an LSTM cell on an FPGA with an activation function inspired by the CORDIC algorithm. The realization is performed using both IEEE754 standard and 32-bit integer numbers. The case with floating-point arithmetic is analyzed with and without DSP blocks provided by the Xilinx design suite. The alternative implementation including the integer arithmetic was optimized for a minimal number of clock cycles. Presented implementation uses xc6slx150t-2fgg900 and achieves high calculations accuracy for both cases.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121746566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Versatility of Bracha's Byzantine Reliable Broadcast Algorithm","authors":"M. Raynal","doi":"10.1142/S0129626421500067","DOIUrl":"https://doi.org/10.1142/S0129626421500067","url":null,"abstract":"G. Bracha presented in 1987 a simple and efficient reliable broadcast algorithm for [Formula: see text]-process asynchronous message-passing systems, which tolerates up to [Formula: see text] Byzantine processes. Following an idea recently introduced by Hirt, Kastrato and Liu-Zhang (OPODIS 2020), instead of considering the upper bound on the number of Byzantine processes [Formula: see text], the present short article considers two types of Byzantine behavior: the ones that can prevent the safety property from being satisfied, and the ones that can prevent the liveness property from being satisfied (a Byzantine process can exhibit only one or both types of failures). This Byzantine differentiated failure model is captured by two associated upper bounds denoted [Formula: see text] (for safety) and [Formula: see text] for liveness). The article shows that only the threshold values used in the predicates of Bracha’s algorithm must be modified to obtain an algorithm that works with this differentiated Byzantine failure model.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128778342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating Data-Parallel Neural Network Training with Weighted-Averaging Reparameterisation","authors":"Sterling Ramroach, A. Joshi","doi":"10.1142/S0129626421500092","DOIUrl":"https://doi.org/10.1142/S0129626421500092","url":null,"abstract":"Recent advances in artificial intelligence has shown a direct correlation between the performance of a network and the number of hidden layers within the network. The Compute Unified Device Architecture (CUDA) framework facilitates the movement of heavy computation from the CPU to the graphics processing unit (GPU) and is used to accelerate the training of neural networks. In this paper, we consider the problem of data-parallel neural network training. We compare the performance of training the same neural network on the GPU with and without data parallelism. When data parallelism is used, we compare with both the conventional averaging of coefficients and our proposed method. We set out to show that not all sub-networks are equal and thus, should not be treated as equals when normalising weight vectors. The proposed method achieved state of the art accuracy faster than conventional training along with better classification performance in some cases.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133145599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Real-Time Learning-Based Super-Resolution System on FPGA","authors":"Daolu Zha, Xi Jin, Rui Shang, Pengfei Yang","doi":"10.1142/S0129626420500115","DOIUrl":"https://doi.org/10.1142/S0129626420500115","url":null,"abstract":"This paper proposes a real-time super-resolution (SR) system. The proposed system performs a fast SR algorithm that generates a high-resolution image from a low-resolution image using direct regression functions with an up-scaling factor of 2. This algorithm contained two processes: feature learning and SR image prediction. The feature learning stage is performed offline, in which several regression functions were trained. The SR image prediction stage is implemented on the proposed system to generate high-resolution image patches. The system implemented on a Xilinx Virtex 7 field-programmable gate array achieves output resolution of [Formula: see text] (UHD) at 85 fps and 700Mpixels/s throughput. Structure similarity (SSIM) is measured for image quality. Experimental results show that the proposed system provides high image quality for real-time applications. And the proposed system possesses high scalability for resolution.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128511455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fault-Tolerant Maximal Local-Edge-Connectivity of Augmented Cubes","authors":"Liyang Zhai, Liqiong Xu, Weihua Yang","doi":"10.1142/S0129626420400010","DOIUrl":"https://doi.org/10.1142/S0129626420400010","url":null,"abstract":"An interconnection network is usually modeled as a graph, in which vertices and edges correspond to processors and communication links, respectively. Connectivity is an important metric for fault t...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127636747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uniformly Connected Graphs - A Survey","authors":"G. Chartrand, Ping Zhang","doi":"10.1142/S0129626420400022","DOIUrl":"https://doi.org/10.1142/S0129626420400022","url":null,"abstract":"A graph G of order n ≥ 2 is k-uniformly connected for an integer k with 1 ≤ k ≤ n − 1 if for every pair u, v of distinct vertices of G, there is a u − v path of length k. A number of results, conje...","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125088728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The g-Extra Conditional Diagnosability of Graphs in Terms of g-Extra Connectivity","authors":"Aixia Liu, Jun Yuan, Shiying Wang","doi":"10.1142/S012962642040006X","DOIUrl":"https://doi.org/10.1142/S012962642040006X","url":null,"abstract":"The [Formula: see text]-extra conditional diagnosability and [Formula: see text]-extra connectivity are two important parameters to measure ability of diagnosing faulty processors and fault tolerance in a multiprocessor system. The [Formula: see text]-extra conditional diagnosability [Formula: see text] of graph [Formula: see text] is defined as the diagnosability of a multiprocessor system under the assumption that every fault-free component contains more than [Formula: see text] vertices. While the [Formula: see text]-extra connectivity [Formula: see text] of graph [Formula: see text] is the minimum number [Formula: see text] for which there is a vertex cut [Formula: see text] with [Formula: see text] such that every component of [Formula: see text] has more than [Formula: see text] vertices. In this paper, we study the [Formula: see text]-extra conditional diagnosability of graph [Formula: see text] in terms of its [Formula: see text]-extra connectivity, and show that [Formula: see text] under the MM* model with some acceptable conditions. As applications, the [Formula: see text]-extra conditional diagnosability is determined for some BC networks such as hypercubes, varietal hypercubes, and [Formula: see text]-ary [Formula: see text]-cubes under the MM* model.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133696185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Brief Account on the Development and Future Research Directions of Connectivity Properties of Interconnection Networks","authors":"E. Cheng, K. Qiu, Z. Shen, Weihua Yang","doi":"10.1142/S0129626420400095","DOIUrl":"https://doi.org/10.1142/S0129626420400095","url":null,"abstract":"Connectivity type measures form an important topic in graph theory. Such measures provide an important part of analyzing the vulnerability and resilience of interconnection networks. In this short commentary, we outline our perspective on the development of this topic with respect to interconnection networks.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"296 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132551083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fractional Matching Preclusion for Data Center Networks","authors":"Bo Zhu, Tianlong Ma, Shuangshuang Zhang, He Zhang","doi":"10.1142/s0129626420500103","DOIUrl":"https://doi.org/10.1142/s0129626420500103","url":null,"abstract":"An edge subset [Formula: see text] of [Formula: see text] is a fractional matching preclusion set (FMP set for short) if [Formula: see text] has no fractional perfect matchings. The fractional matching preclusion number (FMP number for short) of [Formula: see text], denoted by [Formula: see text], is the minimum size of FMP sets of [Formula: see text]. A set [Formula: see text] of edges and vertices of [Formula: see text] is a fractional strong matching preclusion set (FSMP set for short) if [Formula: see text] has no fractional perfect matchings. The fractional strong matching preclusion number (FSMP number for short) of [Formula: see text], denoted by [Formula: see text], is the minimum size of FSMP sets of [Formula: see text]. Data center networks have been proposed for data centers as a server-centric interconnection network structure, which can support millions of servers with high network capacity by only using commodity switches. In this paper, we obtain the FMP number and the FSMP number for data center networks [Formula: see text], and show that [Formula: see text] for [Formula: see text], [Formula: see text] and [Formula: see text] for [Formula: see text], [Formula: see text]. In addition, all the optimal fractional strong matching preclusion sets of these graphs are categorized.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116288042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}