Seyyed Hossein Seyyedaghaei Rezaei, M. Modarressi, M. Daneshtalab, Shervin Roshanisefat
{"title":"A Three-Dimensional Networks-on-Chip Architecture with Dynamic Buffer Sharing","authors":"Seyyed Hossein Seyyedaghaei Rezaei, M. Modarressi, M. Daneshtalab, Shervin Roshanisefat","doi":"10.1109/PDP.2016.124","DOIUrl":"https://doi.org/10.1109/PDP.2016.124","url":null,"abstract":"3D integration is a practical solution for overcoming the failure of Dennard scaling in future technology generations. This emerging technology stacks several die slices on top of each other on a single chip in order to provide higher-bandwidth and lower-latency than a 2D design due to extremely shorter inter-layer distances in the third dimension and. In this paper, we leverage the low-latency vertical links to address buffer management, one of the most important design and management issues in Network-on-Chip (NoC). To this end, we present VerBuS, an architecture for 3D routers with Vertical BUffer Sharing capability enabled by ultra-low latency vertical links of a 3D chip. VerBuS can share virtual channels (VC) between vertically stacked routers. This way, the buffering capacity of a highly loaded router is increased by using idle VCs of vertically adjacent routers. Experimental results show up to 20% improvement in NoC performance metrics over state-of-the-art 3D router designs.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126254817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-Aware Programming Model for Distributed Infrastructures","authors":"F. Lordan, J. Ejarque, R. Sirvent, Rosa M. Badia","doi":"10.1109/PDP.2016.39","DOIUrl":"https://doi.org/10.1109/PDP.2016.39","url":null,"abstract":"Day after day, cloud technologies are more and more adopted by very diverse types of stakeholders, and this success creates a side-effect problem: the energy spent by this kind of infrastructures is growing bigger every day. With the objective of reducing energy consumption when programming applications for cloud infrastructures, we have implemented energy-aware mechanisms in the COMPSs Programming Model, inside the context of the ASCETiC Project. In this paper, we demonstrate that application-level scheduling can have a big impact on the energy consumed by an application when executed in a heterogeneous cloud. We have implemented an energy-aware scheduling mechanism in COMPSs, together with a versioning technique, and we have run experiments with a use case coming from the real estate sector that proves our hypotheses.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126661622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daichi Fujiki, Hiroki Matsutani, M. Koibuchi, H. Amano
{"title":"Randomizing Packet Memory Networks for Low-Latency Processor-Memory Communication","authors":"Daichi Fujiki, Hiroki Matsutani, M. Koibuchi, H. Amano","doi":"10.1109/PDP.2016.18","DOIUrl":"https://doi.org/10.1109/PDP.2016.18","url":null,"abstract":"Three-dimensional stacked memory is considered to be one of the innovative elements for the next-generation computing system, for it provides high bandwidth and energy efficiency. Particularly, packet routing ability of Hybrid Memory Cubes (HMCs) enables new interconnects for the memories, giving flexibility to its topological design space. Since memory-processor communication is latency-sensitive, our challenge is to alleviate latency of the memory interconnection network, which is subject to high overheads from hop-count increase. Interestingly, random network topologies are known to have remarkably low diameter that is even comparable to theoretical Moore graph. In this context, we first propose to exploit the random topologies for the memory networks. Second, we also propose several optimizations to leverage the random topologies to be further adaptive to the latency-sensitive memory-processor communication: communication path length based selection, deterministic minimal routing, and page-size granularity memory mapping. Finally, we present interesting results of our evaluation: the random networks with universal memory access outperformed non-random networks of which memory access was optimally localized.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134085223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Quantitative Performance Evaluation of Fast on-Chip Memories of GPUs","authors":"E. Konstantinidis, Y. Cotronis","doi":"10.1109/PDP.2016.56","DOIUrl":"https://doi.org/10.1109/PDP.2016.56","url":null,"abstract":"Modern Graphics Processing Units (GPUs) have evolved to high performance general purpose processors, forming an alternative to CPUs. However, programming them effectively has proven to be a challenge, not only due to the mandatory requirement of extracting massive fine grained parallelism but also due to its susceptible performance on memory traffic. Apart from regular memory caches, GPUs feature other types of fast memories as well, for instance scratchpads, texture caches, etc. In order to gain more insight to the efficient usage of these memory types some quantitative performance measures could be beneficial. In this paper we describe a set of micro-benchmarks which aim to provide effective bandwidth performance measurements of the on-chip special memories of GPUs. We compare the peak measurements of different memory types and the use of different data type sizes. In addition, we validate the peak measurements on real world problems as provided by the polybench-gpu benchmark suite. We compare the profiling bandwidth of on-chip memories with the peak measurements as captured with the proposed micro-benchmarks. The source code of the micro-benchmark suite is publicly available.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125639609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siavash Ghiasvand, F. Ciorba, R. Tschüter, W. Nagel
{"title":"Lessons Learned from Spatial and Temporal Correlation of Node Failures in High Performance Computers","authors":"Siavash Ghiasvand, F. Ciorba, R. Tschüter, W. Nagel","doi":"10.1109/PDP.2016.101","DOIUrl":"https://doi.org/10.1109/PDP.2016.101","url":null,"abstract":"In this paper we study the correlation of node failures in time and space. Our study is based on measurements of a production high performance computer over an 8-month time period. We draw possible types of correlations between node failures and show that, in many cases, there are direct correlations between observed node failures. The significance of such a study is twofold: achieving a clearer understanding of correlations between node failures and enabling failure detection as early as possible. The results of this study are aimed at helping the system administrators minimize (or even prevent) the destructive effects of correlated node failures.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115629398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cloud-Based NoSQL Data Migration","authors":"Aryan Bansel, H. González-Vélez, Adriana E. Chis","doi":"10.1109/PDP.2016.111","DOIUrl":"https://doi.org/10.1109/PDP.2016.111","url":null,"abstract":"Cloud computing has enabled the Database-as-a-Service (DBaaS) model to manage large volumes of user-generated data using NoSQL data repositories. There are several NoSQL implementations such as document, columnar, and key-value which ensure high availability, fault tolerance and scalability to serve distinct client requirements. Nonetheless, different NoSQL data models may also introduce unnecessary heterogeneity in DBaaS, which further restricts the user to migrate the application services according to business or technology changes. In this paper, we propose a NoSQL data migration framework to foster data portability across cloud-based heterogeneous NoSQL data repositories. The proposed approach involves data standardisation and classification stages to render an efficient mapping, and translation between cloud-based different NoSQL data stores. The current implementation of the framework supports three different data models: document, columnar and graph. Moreover, the framework is meta-model driven, and therefore allows developers to extend the support for new database models. Our approach includes an online compression algorithm for data migration (document to graph) whereby a graph database requires up to 46% less space. There is also a significant reduction (37% to 55%) in the number of nodes in the compressed graph database.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115283766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Suitability of the Random Topology for HPC Applications","authors":"Fabien Chaix, I. Fujiwara, M. Koibuchi","doi":"10.1109/PDP.2016.10","DOIUrl":"https://doi.org/10.1109/PDP.2016.10","url":null,"abstract":"With each technology improvement, parallel systems get larger, and the impact of interconnection networks becomes more prominent. Random topologies and their variants received more and more attention lately due to their low diameter, low average shortest path length and high scalability. However, existing supercomputers still prefer torus and fat-tree topologies, because a number of existing parallel algorithms are optimized for them and the interconnect implementation is more straight-forward in terms of floor layout. In this paper, we investigate the performance of traditional and emerging parallel workloads on these network topologies, using a event-discrete simulation called SimGrid. We observe that random topology is better for Fourier Transform (FT), Graph500, Himeno benchmarks, and its improvement over the counterpart torus is 18 percent in average. Through this study, our recommendation is to use random topology in current and future supercomputers for these scientific and big-data analysis parallel applications.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123908344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Beretta, M. Castelli, Yuliana Martínez, Luis Muñoz Delgado, Sara Silva, L. Trujillo, L. Milanesi, I. Merelli
{"title":"A Machine Learning Approach for the Integration of miRNA-Target Predictions","authors":"S. Beretta, M. Castelli, Yuliana Martínez, Luis Muñoz Delgado, Sara Silva, L. Trujillo, L. Milanesi, I. Merelli","doi":"10.1109/PDP.2016.125","DOIUrl":"https://doi.org/10.1109/PDP.2016.125","url":null,"abstract":"Although several computational methods have been developed for predicting interactions between miRNA and target genes, there are substantial differences in the achieved results. For this reason, machine learning approaches are widely used for integrating the predictions obtained from different tools. In this work we adopt a method, called M3GP, which relies on a genetic programming approach, to classify results from three tools: miRanda, TargetScan, and RNAhybrid. Such algorithm is highly parallelizable and its adoption provides great advantages while handling problems involving big datasets, since it is independent from the implementation and from the architecture on which it is executed. More precisely, we apply this technique for the classification of the achieved miRNA target predictions and we compare its results with those obtained with other classifiers.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124007806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Efficient In-band Management for Interconnect Network in Tianhe-2 System","authors":"Jijun Cao, Liquan Xiao, Zhengbin Pang, Kefei Wang, Jiaqing Xu","doi":"10.1109/PDP.2016.58","DOIUrl":"https://doi.org/10.1109/PDP.2016.58","url":null,"abstract":"Interconnect network plays an important role in high performance computing systems. And its manageability directly affects the RAS (i.e., Reliability, Availability, and Serviceability) of the whole system. The Tianhe-2 system located in NSCC-gz (i.e., National Supercomputing Center of China in Guangzhou) uses proprietary interconnect network, which includes 5,856 high-radix network router chips (i.e., NRC) and 18,304 network interface chips (i.e., NIC). For such a very large-scale interconnect network, it is a great challenge to manage (such as configure, monitor, and debug) the numerous network chips and its network ports in an efficient way. By implementing the in-band management with very few hardware resources, the interconnect network in Tianhe-2 system achieves a highly efficient network management. In this paper, we introduce the design and implementation of the in-band management for interconnect network in Tianhe-2 system, especially emphasizing on several key features, including the set of achieved management functionalities, the architecture of network management, the format of management packets, the data flow and processing of management packets, etc. In this paper, we also evaluate the performance of in-band management by mainly comparing with out-band management scheme. The preliminary results demonstrate the efficiency of the in-band management for interconnect network in Tianhe-2 system.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130621202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transient Temperature Prediction for Aging Thermal Sensors Using Artificial Neural Network","authors":"Kameswar Rao Vaddina, J. M. Cebrian, L. Natvig","doi":"10.1109/PDP.2016.89","DOIUrl":"https://doi.org/10.1109/PDP.2016.89","url":null,"abstract":"As technology scales down and power density increases, the temperature sensor characteristics will drift, leading to temperature errors which increase over time. Transistor aging is one of the leading contributors to temperature sensing inaccuracies. The prominent aging failure mechanisms like Negative Bias Temperature Instability (NBTI), Hot Carrier Injection (HCI) and electromigration have emerged as the main sources of system unreliability which manifest as an increase in the propagation delay over time. On-chip thermal sensors are not immune to this phenomenon and get affected by these aging mechanisms. Thermal sensor aging exacerbated by increased temperatures leads to temperature sensing inaccuracies requiring repeated sensor calibration. In this work, we propose a novel approach of using performance metrics to predict the transient temperature profile of an application as seen by the aging thermal sensor. Firstly, we make offline profiling of applications and then cluster them into groups using k-means clustering mechanism. Then we use a neural network model to predict the thermal profile of a new application given its performance metrics. The forecasting ability of our model is accessed using MSE and RMSE. This approach is highly scalable and can be used to predict future temperatures which can then be used for run-time dynamic thermal management of multi-core systems.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"348 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132697432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}