{"title":"High Performance Analysis of Omics Data: Experiences at University Magna Graecia of Catanzaro","authors":"Giuseppe Agapito, P. Guzzi, M. Cannataro","doi":"10.1109/HPCS.2017.157","DOIUrl":"https://doi.org/10.1109/HPCS.2017.157","url":null,"abstract":"Several omics disciplines, such as genomics, proteomics, and interactomics , are gaining an increasing interest in the scientific community due to the availability of high throughput experimental platforms (e.g. next generation sequencing, microarray, mass spectrometry, to cite a few), that are producing an overwhelming amount of experimental omics data. However, efficient analysis of omics data requires large data stores as well as novel algorithms and data structures for data preprocessing, analysis, and integration. As a result, parallel bioinformatics tools for the analysis of omics data, often made available on the Cloud, start to be available. This paper surveys some parallel and distributed bioinformatics tools for the preprocessing and analysis of omics data. The description includes some tools developed at the Bioinformatics Laboratory of the University Magna Graecia of Catanzaro and validated using real data made available by the University Hospital.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128506505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xubin Tan, Jaume Bosch, Miquel Vidal Piñol, C. Álvarez, Daniel Jiménez-González, E. Ayguadé, M. Valero
{"title":"Picos, A Hardware Task-Dependence Manager for Task-Based Dataflow Programming Models","authors":"Xubin Tan, Jaume Bosch, Miquel Vidal Piñol, C. Álvarez, Daniel Jiménez-González, E. Ayguadé, M. Valero","doi":"10.1109/HPCS.2017.134","DOIUrl":"https://doi.org/10.1109/HPCS.2017.134","url":null,"abstract":"Task-based programming Task-based programming models such as OpenMP, Intel TBB and OmpSs are widely used to extract high level of parallelism of applications executed on multi-core and manycore platforms. These programming models allow applications to be expressed as a set of tasks with dependences to drive their execution at runtime. While managing these dependences for task with coarse granularity proves to be highly beneficial, it introduces noticeable overheads when targeting fine-grained tasks, diminishing the potential speedups or even introducing performance losses. To overcome this drawback, we propose a hardware/software co-design Picos that manages inter-task dependences efficiently. In this paper we describe the main ideas of our proposal and a prototype implementation. This prototype is integrated with a parallel task- based programming model and evaluated with real executions in Linux embedded system with two ARM Cortex-A9 and a FPGA. When compared with a software runtime, our solution results in more than 1.8x speedup and 40% of energy savings with only 2 threads.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114362385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Parallel and Distributed Future of Data Series Mining","authors":"Themis Palpanas","doi":"10.1109/HPCS.2017.155","DOIUrl":"https://doi.org/10.1109/HPCS.2017.155","url":null,"abstract":"There is an increasingly pressing need, by several applications in diverse domains, for developing techniques able to index and mine very large collections of sequences, or data series. Examples of such applications come from biology, astronomy, entomology, the web, and other domains. It is not unusual for these applications to involve numbers of data series in the order of hundreds of millions to billions, which are often times not analyzed in their full detail due to their sheer size. In this work, we describe past efforts in designing techniques for indexing and mining truly massive collections of data series, based on indexing techniques for fast similarity search, an operation that lies at the core of many mining algorithms. We show that there are two bottlenecks in mining such massive datasets, namely, the time taken to build the index, and the time required to answer exactly similarity queries. In response to these challenges, we discuss novel techniques that adaptively create data series indexes, allowing users to correctly answer queries before the indexing task is finished. We also show how our methods allow mining on datasets that would otherwise be completely untenable, including the first published experiments using one billion data series. Moreover, we present our vision for the future in big sequence management and mining research: we argue that more efforts should concentrate on parallel (including modern hardware optimization opportunities) and distributed solutions, which have until now been largely unexploited.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124381464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using Virtualisation for Reproducible Research and Code Portability","authors":"Svetlana Sveshnikova, I. Gankevich","doi":"10.1109/HPCS.2017.139","DOIUrl":"https://doi.org/10.1109/HPCS.2017.139","url":null,"abstract":"Research reproducibility is an emerging topic in computer science. One of the problems in research reproducibility is the absence of tools to reproduce specified operating system with specific version of the software installed. In the proposal reported here we investigate how a tool based on lightweight virtualisation technologies reproduces them. The experiments show that creating reproducible environment adds significant overhead only on the first run of the application, and propose a number of ways to improve the tool.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121718859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Modeling the Internet of Things: a simulation perspective","authors":"Gabriele D’angelo, S. Ferretti, V. Ghini","doi":"10.1109/HPCS.2017.13","DOIUrl":"https://doi.org/10.1109/HPCS.2017.13","url":null,"abstract":"This paper deals with the problem of properly simulating the Internet of Things (loT). Simulating an loT allows evaluating strategies that can be employed to deploy smart services over different kinds of territories. However, the heterogeneity of scenarios seriously complicates this task. This imposes the use of sophisticated modeling and simulation techniques. We discuss novel approaches for the provision of scalable simulation scenarios, that enable the real-time execution of massively populated IoT environments. Attention is given to novel hybrid and multi-level simulation techniques that, when combined with agent-based, adaptive Parallel and Distributed Simulation (PADS) approaches, can provide means to perform highly detailed simulations on demand. To support this claim, we detail a use case concerned with the simulation of vehicular transportation systems.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130044534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effect of Different Varactor Models on Antenna Tunability","authors":"M. Madi, K. Kabalan, M. Al‐Husseini","doi":"10.1109/HPCS.2017.50","DOIUrl":"https://doi.org/10.1109/HPCS.2017.50","url":null,"abstract":"This paper explains why antennas can achieve limited frequency tunability by mounting varactor diodes across slots. The reason is that majority of varactors have a resistance in parallel to the variable capacitor. Resistance mainly acts as a short and thereby the varactor loses its capacitive behavior at low frequencies ∼ 2 GHz. At higher frequencies, the varactor retains its capacitive effect. Simulations and calculations are done for various configurations of the varactor to investigate the effect of capacitance, resistance and series inductance on tunability and on the capacitive impedance of the varactor. A cedar shape antenna [1] is used in simulations, as it possesses many possibilities for placing the varactors.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133380140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Word Embeddings in Parallel by Alignment","authors":"Sahil Zubair, M. Zubair","doi":"10.1109/HPCS.2017.90","DOIUrl":"https://doi.org/10.1109/HPCS.2017.90","url":null,"abstract":"Distributed representations have become the de facto standard by which many modern neural network architectures deal with natural language processing tasks. In particular, the word2vec algorithm introduced by Mikolov, et al. popularized the use of distributed representations by demonstrating that learned embeddings capture semantic relationships geometrically. Though word2vec addresses some of the scaling issues of earlier approaches, it can still take days to complete the training process for very large data sets. Recently, researchers have tried to address this by proposing parallel variants of the word2vec algorithm. Note that in these approaches, the data set is partitioned among multiple processors that asynchronously update a shared model. We propose a parallel approach for word2vec that is based on instantiating multiple models and working with their own data sets. Our scheme transfers the learning between different models at discrete intervals (synchronously). The frequency with which we transfer the learning between different models is much less compared to the frequency of asynchronous updates in existing approaches. In our approach, we treat each of our instantiated word2vec instances as independent models. This implies that off the shelf implementations of word2vec can be used in our parallel approach. The key feature of our algorithm is in how we transfer the parameters between different models that have been independently trained using distinct partitions of a large data set. For this we propose a computationally inexpensive alignment and merge step. We validate our algorithm on a publicly available dataset using an implementation of word2vec in Google's tensorflow software. We evaluate our algorithm by comparing its runtime with the runtime of the sequential algorithm for a given training loss. Our results show that our parallel algorithm is able to achieve efficiency up to 57%.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"os-30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127864012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Chunk-Wise Parallelization Based on Dynamic Performance Prediction on Heterogeneous Multicores","authors":"A. Dab, Y. Slama","doi":"10.1109/HPCS.2017.28","DOIUrl":"https://doi.org/10.1109/HPCS.2017.28","url":null,"abstract":"Multicore machines are becoming more and more common. Ideally, all applications benefit from these advances in computer architecture. A complex challenge in parallel computing is cores load balancing to minimize the overall execution time called Make span of the parallel program. As multicores may have different architectures, an effective mapping should support this unknown variation to avoid drawbacks on make span. In fact, mapping or static load balancing method may not be effective when the target state machine changes during program execution. Thread affinity has appeared as an important technique to improve the program performance and for better performance stability. In this context, we propose a predictive approach using iterations chunking at runtime allowing parallel code adaptation to processor's performance. Our approach is based on thread pinning and performance detection at execution time. From a parallel program, we define a set of loop nest iterations, forming what is called chunk, and we run it using a first mapping assuming homogeneous cores. Then, performance assessment would correct mapping by speculating the future core's state. The new mapping would be then applied to a new chunk for further evaluation and prediction. The process would stop when the program is fully executed or when judging that chunking is no longer effective.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"45 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115752318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Efficient Algorithms for Compressed Sparse-Sparse Matrix Product","authors":"S. Ezouaoui, O. Hamdi-Larbi, Z. Mahjoub","doi":"10.1109/HPCS.2017.101","DOIUrl":"https://doi.org/10.1109/HPCS.2017.101","url":null,"abstract":"We study the sparse matrix product problem where the input matrices are sparse. Starting with an original DO- loop nest structured algorithm, different versions involving body kernels such as GAXPY, AXPY and DOT are generated by the loop interchange technique. We particularly focus on the GAXPY- Row body kernel where the matrices are acceded row-wise. Various versions corresponding to the most used sparse matrix compression formats are designed. We then derive other versions by applying improving techniques such as loop invariant motion and loop unrolling. A theoretical multi-fold performance study permits to establish accurate comparisons between the different versions. Our contribution is validated through experiments achieved on two input sets i.e. a set of randomly generated matrices and a set of benchmark matrices of different sizes and densities. This permitted to notice that the improvement procedure led to an efficient version dramatically reducing the run time up to 98%. Our algorithms were also compared with kernels from NIST Sparse Blas, CSparse and SPARSKIT2 libraries.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124342938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatic Generation of Wireless Sensor Networks Scheduling","authors":"Anis Mezni, E. Dumitrescu, É. Niel, S. Ahmed","doi":"10.1109/HPCS.2017.32","DOIUrl":"https://doi.org/10.1109/HPCS.2017.32","url":null,"abstract":"In this paper, the Discrete Controller Synthesis (DCS) technique is applied in order to obtain a correct-by- construction automatic distributed scheduling of Wireless Sensor Network (WSN). Our approach starts from an abstract formal model considering that communication between nodes are instantaneous. Then, a refined model is obtained by adding a realistic communication mechanism while preserving the global controlled behavior generated initially by automatic synthesis. This communication mechanism is called a synchronization “barrier“: a software mechanism constraining a set of sensors to wait for each other before making its own local decision and before deactivation. The approach is illustrated by a WSN model with two communicating nodes.","PeriodicalId":115758,"journal":{"name":"2017 International Conference on High Performance Computing & Simulation (HPCS)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117008790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}