Ronald D. Hagan, Brett D. Hagan, C. Phillips, B. Rhodes, M. Langston
{"title":"Compound Analytics using Combinatorics for Feature Selection: A Case Study in Biomarker Detection","authors":"Ronald D. Hagan, Brett D. Hagan, C. Phillips, B. Rhodes, M. Langston","doi":"10.1109/IPDPSW.2019.00050","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00050","url":null,"abstract":"Computer and data scientists are increasingly tasked with analyzing data growing at unprecedented rates. These data frequently involve a high level of dimensionality. In this work, we present a novel method for dimension reduction that combines statistical scoring with graph theoretical filtering to distill salient features for machine learning. We apply this method to the timely problem of detecting epigenetic biomarkers in DNA methylation data.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114153647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards a Methodology for Benchmarking Edge Processing Frameworks","authors":"Pedro Silva, Alexandru Costan, Gabriel Antoniu","doi":"10.1109/IPDPSW.2019.00149","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00149","url":null,"abstract":"With the spectacular growth of the Internet of Things, edge processing emerged as a relevant means to offload data processing and analytics from centralized Clouds to the devices that serve as data sources (often provided with some processing capabilities). While a large plethora of frameworks for edge processing were recently proposed, the distributed systems community has no clear means today to discriminate between them. Some preliminary surveys exist, focusing on a feature-based comparison. We claim that a step further is needed, to enable a performance-based comparison. To this purpose, the definition of a benchmark is a necessity. In this short paper, we make a step towards the definition of a methodology for benchmarking Edge processing frameworks.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132731454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Rettkowski, Safdar Mahmood, Arij Shallufa, M. Hübner, D. Göhringer
{"title":"Inspection of Partial Bitstreams for FPGAs Using Artificial Neural Networks","authors":"J. Rettkowski, Safdar Mahmood, Arij Shallufa, M. Hübner, D. Göhringer","doi":"10.1109/IPDPSW.2019.00023","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00023","url":null,"abstract":"Incorporating FPGAs in embedded designs, both for research and industry related applications, is getting increasingly common. Due to the inherent capability of an FPGA to reconfigure itself during run-time, entirely or partially, it has become a very cost effective and time efficient solution for end-users with ever-changing needs for their embedded and custom hardware designs. This capability allowing dynamic reconfiguration of FPGAs, unfortunately also poses a threat to hardware security in terms of malicious bitstream manipulation that can include attacks through intended hardware changes by insertion of hardware trojans, spy-wares or even energy thirsty hardware modules which eventually have adverse effects on energy critical applications. In this paper, we introduce a novel approach to tackle this problem using machine learning techniques for FPGA bitstream analysis. By making use of different Neural Networks, we present how it paves a way to analyze partial FPGA bistreams to trace a certain module, or to find inconsistencies which can be malicious to the target hardware. In contrast to traditional methods to inspect bitstreams, our method saves a significant amount of time.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126207372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Tchernykh, M. Babenko, V. Kuchukov, V. Miranda-López, A. Avetisyan, R. Rivera-Rodríguez, G. Radchenko
{"title":"Data Reliability and Redundancy Optimization of a Secure Multi-cloud Storage Under Uncertainty of Errors and Falsifications","authors":"A. Tchernykh, M. Babenko, V. Kuchukov, V. Miranda-López, A. Avetisyan, R. Rivera-Rodríguez, G. Radchenko","doi":"10.1109/IPDPSW.2019.00099","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00099","url":null,"abstract":"Despite all the benefits a cloud data storages offer to customers, there is a high risk of breach of confidentiality, integrity, and availability related with the uncertainty of errors and falsifications, loss of information, denial of access for a long time, information leakage, conspiracy, and technical failures. In this article, we propose a configurable, reliable, and secure distributed data storage scheme with improved data redundancy, reliability, and encoding/decoding speed. Our system utilizes a Polynomial Residue Number System (PRNS) with a new method of error correction codes and secret sharing schemes. We introduce the concept of an approximate value of a rank (AR) of a polynomial. It reduces the computational complexity of the encoding/decoding and PRNS coefficients size. Based on the properties of the approximate value and PRNS, we introduce the AR-PRNS method for error detection, correction, and controlling computational results with capabilities of scalable parallel computing. We provide a theoretical basis to configure and optimize the redundancy of stored data and encoding/decoding speed to cope with different objective preferences, workloads, and storage properties. Theoretical analysis shows that, by appropriate selection of AR-PRNS parameters, the proposed scheme increases the safety, reliability, and reduces the overhead of data storage.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122212723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating Clustering using Approximate Spanning Tree and Prime Number Based Filter","authors":"D. Rao, Sutharzan Sreeskandarajan, C. Liang","doi":"10.1109/IPDPSW.2019.00037","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00037","url":null,"abstract":"Motivation: Clustering genomic data, including those generated via high-throughput sequencing, is an important preliminary step for assembly and analysis. However, clustering a large number of sequences is time-consuming. Methods: In this paper, we discuss algorithmic performance improvements to our existing clustering system called PEACE via the following two new approaches: (1) using Approximate Spanning Tree (AST) that is computed much faster than the currently used Minimum Spanning Tree (MST) approach, and (2) a novel Prime Numbers based Heuristic (PNH) for generating features and comparing them to further reduce comparison overheads. Results: Experiments conducted using a variety of data sets show that the proposed method significantly improves performance for datasets with large clusters with only minimal degradation in clustering quality. We also compare our methods against wcd-kaboom, a state-of-the-art clustering software. Our experiments show that with AST and PNH underperform wcd-kaboom for datasets that have many small clusters. However, they significantly outperform wcd-kaboom for datasets with large clusters by a conspicuous ~550x with comparable clustering quality. The results indicate that the proposed methods hold considerable promise for accelerating clustering of genomic data with large clusters.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127689807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Edge-Based Framework for Enabling Data-Driven Pipelines for IoT Systems","authors":"E. G. Renart, Daniel Balouek-Thomert, M. Parashar","doi":"10.1109/IPDPSW.2019.00146","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00146","url":null,"abstract":"Due to the proliferation of the Internet of Things (IoT) paradigm, the number of devices connected to the Internet is growing. These devices are generating unprecedented amounts of data at the edges of the infrastructure. Although the generated data provides great potential, identifying and processing relevant data points hidden in streams of unimportant data, and doing this in near real time, remains a significant challenge. Existing stream processing platforms require the data to be transported to the cloud for processing, resulting in latencies that can prevent timely decision making or may reduce the amount of data processed. To tackle this problem, we designed an IoT Edge Framework, called R-Pulsar, that extends cloud capabilities to local devices and provides a programming model for deciding what, when, and where data get collected and processed. In this paper, we discuss motivating use cases and the architectural design of R-Pulsar. We have deployed and tested R-Pulsar on embedded devices (Raspberry Pi and Android phone) and present an experimental evaluation that demonstrates that R-Pulsar can enable timely data analytics by effectively leveraging edge and cloud resources.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126437846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Container-Based Framework to Facilitate Reproducibility in Employing Stochastic Process Algebra for Modeling Parallel Computing Systems","authors":"W. Sanders, Srishti Srivastava, I. Banicescu","doi":"10.1109/IPDPSW.2019.00070","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00070","url":null,"abstract":"Scientific applications are increasingly complex and domain specific, and the underlying architectures of the parallel and distributed systems on which they are executed also continue to grow in complexity. As these high performance parallel and distributed computing applications and environments continue to grow both in complexity and computing power, there is an increasing financial cost associated with both the acquisition and maintenance of those systems. Therefore, the ability to model the performance of these applications and systems before and during their development and deployment to guide cost-effective decisions about their resources and configurations is highly important to the designers of those applications and systems. Performance Evaluation Process Algebra (PEPA) is a modeling language and framework for modeling parallel and distributed computing and communication applications and systems, and numerous examples are present in the literature where PEPA has been utilized to model these systems for evaluating or predicting their performance using various metrics, including throughput, utilization, and robustness. Since its development, the PEPA modeling framework has been expanded to model biological systems and networks (Bio-PEPA), and massive (on the order of ~10^129 components) homogeneous systems with Grouped PEPA (GPEPA). PEPA and its derivatives are implemented in a variety of ways, ranging from plug-ins integrated with the Eclipse integrated development environment to standalone command-line based interpreters, each with their own unique and often challenging installation and configuration requirements. To help enable other researchers to more easily utilize these frameworks and facilitate increased and robust reproducibility across end-user platforms, we present and make available containerized versions of a number of these PEPA frameworks. We have validated the functionality of these containers by testing them with models available from the research community that utilizes PEPA. These containers serve as a readily available resource for the community and can be executed on any environment capable of executing the underlying containerization framework.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128978324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Smart-Cache: Optimising Memory Accesses for Arbitrary Boundaries and Stencils on FPGAs","authors":"S. Nabi, W. Vanderbauwhede","doi":"10.1109/IPDPSW.2019.00024","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00024","url":null,"abstract":"A key requirement for high performance on FPGAs is to maintain continuous data streaming from the DRAM. An impediment in many computations, especially in the scientific computing domain, is irregular stencils and boundary conditions, requiring memory accesses that are random, redundant, or both. To address this problem, we present Smache, a novel smart-caching framework that uses FPGA on-chip memory resources for optimising access for arbitrary stencil shapes and boundary conditions. We propose a combination of stream and static buffers, and it is the latter that allows arbitrarily large offsets in stencils. The architecture is complemented by a formal model for determining buffer configuration. We propose a hybrid use of the block and distributed RAM on the FPGA. The design is validated for a 2D grid, 4-point stencil with circular boundaries.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114065356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A GPU Inference System Scheduling Algorithm with Asynchronous Data Transfer","authors":"Qin Zhang, L. Zha, Xiaohua Wan, Boqun Cheng","doi":"10.1109/IPDPSW.2019.00083","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00083","url":null,"abstract":"With the rapid expansion of application range, Deep-Learning has increasingly become an indispensable practical method to solve problems in various industries. In different application scenarios, especially in high concurrency areas such as search and recommendation, deep learning inference system is required to have high throughput and low latency, which can not be easily obtained at the same time. In this paper, we build a model to quantify the relationship between concurrency, throughput and job latency. Then we implement a GPU scheduling algorithm for inference jobs in deep learning inference system based on the model. The algorithm predicts the completion time of batch jobs being executed, and reasonably chooses the batch size of the next batch jobs according to the concurrency and upload data to GPU memory ahead of time. So that the system can hide the data transfer delay of GPU and achieve the minimum job latency under the premise of meetingthethroughputrequirements.Experimentsshowthatthe proposed GPU asynchronous data transfer scheduling algorithm improves throughput by 9% compared with the traditional synchronous algorithm, reduces the latency by 3%-76% under different concurrency, and can better suppress the job latency fluctuation caused by concurrency changing.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115305491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Fuentes, Weiyu Chen, Guei-Yuan Lueh, I. Scherson
{"title":"A Lock-Free Skiplist for Integrated Graphics Processing Units","authors":"J. Fuentes, Weiyu Chen, Guei-Yuan Lueh, I. Scherson","doi":"10.1109/IPDPSW.2019.00015","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00015","url":null,"abstract":"With the advent of computing systems with on-die integrated graphics processing unit (iGPU), new general-purpose GPU programming challenges have emerged from these heterogeneous processors. We propose a lock-free skiplist for Intel's integrated graphics processor that is optimized to achieve the best performance using the C for Media framework. To the best of our knowledge, this is the first implementation of a lock-free data structure for iGPU. Experimental results show that our proposal is more compute-efficient than an existing discrete GPU implementation and outperforms state-of-the-art lock-free and lock-based skiplists for multi-core CPU, achieving up to 3.5x speedup. Additionally, energy savings of up to 300% are obtained when running different skiplist workloads on iGPU instead of CPU cores, hence further improving energy efficiency.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115730414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}