{"title":"An Extensible Approach for Materialized Big Data Integration in Distributed Computation Environments","authors":"V. Sazontev, S. Stupnikov","doi":"10.1109/IVMEM.2019.00011","DOIUrl":"https://doi.org/10.1109/IVMEM.2019.00011","url":null,"abstract":"Modern IT world requires data integration systems to deal with the large number of heterogeneous data sources. Such systems should perform not only data extraction, but also schema alignment, entity resolution and data fusion. In the world of big data with large number of heterogenous data sources, there are number of methods that address various aspects of integration, to make the system automatic and less user-dependent. This work proposes an extensible approach for development of data integration system to perform materialized integration of heterogenous sources in a distributed computation environment. A prototype of the system with implementation of advanced methods for big data integration has been developed. The system is applied in e-commerce domain.","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123120872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ISP-Fuzzer: Extendable Fuzzing Framework","authors":"Sevak Sargsyan, Jivan Hakobyan, Matevos Mehrabyan, Maxim Mishechkin, Vitaliy Akozin, Shamil Kurmangaleev","doi":"10.1109/IVMEM.2019.00017","DOIUrl":"https://doi.org/10.1109/IVMEM.2019.00017","url":null,"abstract":"In this paper we introduce ISP-Fuzzer, an extendable fuzzing framework. The framework supports plugins which makes possible to tune it for any fuzzing task. ISP-Fuzzer capable of performing fuzzing for: files, standard input, network, network protocols. As well it can generate BNF structured data for compilers and interpreters fuzzing. The framework supports number of plugins for performing: code static analysis, dynamic symbolic execution, directed fuzzing etc. ISP-Fuzzer designed to run on multiprocessor and distributed systems. During experimental setup the tool has detected number of defects in binary files from different Linux distributions.","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123326281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Labelling Hierarchical Clusters of Scientific Articles","authors":"Irina Peganova, A. Rebrova, Y. Nedumov","doi":"10.1109/IVMEM.2019.00010","DOIUrl":"https://doi.org/10.1109/IVMEM.2019.00010","url":null,"abstract":"Exploration of document collections is a complex task. One way to do this is to cluster the initial collection hierarchically and then label each cluster with a set of extracted terms. Good labelling should help exploration. We focus on the scientific domain and particularly on collections of abstracts of articles. Abstract is commonly a brief of a paper that outlines the research area, the challenge, the proposed solution and the results; so it could be used instead of a full article despite the difficulties related to its shortness. In this paper, we propose a new method HCBasic for labelling hierarchical clusters. It is particularly tuned for articles' abstracts and compared to three other methods: MTWL, hierMTWL and ComboBasic. To evaluate the quality of the labelling algorithms we did A/B testing in which eight volunteers searched for the articles that they were familiar with in the labelled cluster tree. We show that there is no single winner in terms of quality, and different methods are preferable in different cases.","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116820572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Vasiliev, V. Makarov, P. Dovgalyuk, M. Klimushenkova
{"title":"Selective Instrumentation Mechanism and its Application in a Virtual Machine","authors":"I. Vasiliev, V. Makarov, P. Dovgalyuk, M. Klimushenkova","doi":"10.1109/IVMEM.2019.00018","DOIUrl":"https://doi.org/10.1109/IVMEM.2019.00018","url":null,"abstract":"Among existing approaches to software analysis one stands out: dynamic binary analysis, implemented with dynamic binary instrumentation (DBI). Instrumentation allows to perform analysis by inserting user-defined instructions into examined code flow. DBI frameworks allows to perform analysis in the absence of original source code, as well as providing functionality to change and supplement analysis conditions on-the-go. These capabilities provide performing analysis of any complexity and for any software. However, analysis quality and ease of use of dynamic binary instrumentation directly depends on implemented functionality in a chosen framework. One of the key features, allowing convenient analysis process is a possibility to specify and to narrow instrumentation target from operating system to smaller and more precise entities in system, like: process, thread, memory range. This ability is called selective instrumentation. Having this feature analyst may switch freely between whole system instrumentation and selective instrumentation both ways, which allows to benefit from both approaches while using the same framework. Whole system instrumentation affords the most comprehensive overview of all running applications in the system and the system itself. However the downside is a noticeable slowdown of the analyzed system, which can lead to malfunctioning of the system, and excessive amount of data that needs to be processed and analyzed. Selective instrumentation allows one to specify the area of interest for analysis routines. This can be performed at the right time and for specific entities, which provides a more accurate result depending on the goals. In this paper we are going to look through existing approaches for selective instrumentation and define their flaws. Then we will propose an approach for instrumentation of processes, threads, fibers and memory, and will describe test implementation for ARM and x86 architectures. In the last part of the paper we will describe application examples of developed selective instrumentation approaches.","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124156060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. Olkhovskaya, G. Bagdasarov, V. Gasilov, Y. Sharova
{"title":"High Performance Computations for Short-Lived Plasmas","authors":"O. Olkhovskaya, G. Bagdasarov, V. Gasilov, Y. Sharova","doi":"10.1109/IVMEM.2019.00021","DOIUrl":"https://doi.org/10.1109/IVMEM.2019.00021","url":null,"abstract":"Supercomputer simulations are of fundamental importance for understanding the physics of nonlinear processes in high temperature pulse plasma. The development of predictive codes is an urgent problem in computational plasma physics, as well as many other fields of science. Radiative magneto-hydrodynamics 3D code MARPLE is a full-scale multiphysics research code using the state-of-the-art physics, mathematics, and numerics as well as the up-to-date high performance computing functionality. Scalability study demonstrated that the code can fit existing petaFLOPS supercomputers as well as next-generation exaFLOPS ones. The code is currently used for multiphysics simulations, specifically for high energy density plasma in pulsed-power facilities. Compression of a wire array by a high-current discharge is a valuable tool for fundamental study of matter in extreme states. Different configurations of wire arrays were investigated numerically. A series of high resolution computations helped to create a very compact spherical bright radiation source using dedicated design of the electrodes, the wire array, and the mass distribution along the wires.","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127674821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Analysis of Consistency for In Memory Data Grid Apache Ignite","authors":"Andrey Tapekhin, I. Bogomolov, Oleg Velikanov","doi":"10.1109/IVMEM.2019.00013","DOIUrl":"https://doi.org/10.1109/IVMEM.2019.00013","url":null,"abstract":"The requirements for speed and capacity of data storage systems permanently increase. As a result, NoSQL and NewSQL database management systems became more popular nowadays. The CAP theorem implies that for a distributed data storage, one usually has to choose a tradeoff between its availability and consistency. However, the developers of distributed data storages could use these terms not in their original meanings, making the end users misunderstand the limitations of the systems. Moreover, even if the limitations are described in detail in the documentation, the software could have errors. Therefore a need for testing of data storage systems, not only from the performance point of view, but also for consistency and other properties. In this paper we present our results for consistency analysis for Apache Ignite using Jepsen framework.","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127642699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The VM2D Open Source Code for Incompressible Flow Simulation by Using Meshless Lagrangian Vortex Methods on CPU and GPU","authors":"K. Kuzmina, I. Marchevsky, E. Ryatina","doi":"10.1109/IVMEM.2019.00020","DOIUrl":"https://doi.org/10.1109/IVMEM.2019.00020","url":null,"abstract":"The main features and data structure of the VM2D code is described. VM2D is the original code that implements the meshless Lagrangian vortex methods for two-dimensional viscous incompressible flows simulation. The code is open source and supports parallel technologies OpenMP, MPI and Nvidia CUDA. The VM2D code can be useful for flow simulation around airfoils as well as system of airfoils and unsteady hydrodynamic loads computation. It is possible to simulate flow around immovable airfoils, airfoils moving according to the given law and solve fluid-structure interaction problems in weakly coupled and strongly coupled statement. Known algorithms of vortex methods and original improvements developed by authors are implemented. The current version of source code of the VM2D is available on GitHub under GNU GPL license (https://github.com/vortexmethods/VM2D)","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"409 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115240792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Labeled Property Graphs: SQL or NoSQL?","authors":"Dmitry Anikin, O. Borisenko, Y. Nedumov","doi":"10.1109/IVMEM.2019.00007","DOIUrl":"https://doi.org/10.1109/IVMEM.2019.00007","url":null,"abstract":"There are two main approaches to graph databases: based on RDF model and based on labeled property graph model. RDF is well known and studied, but modern graph databases with labeled property graph model are studied much lesser. In this paper we evaluated several possible solutions for storing and querying graph data using Gremlin - general purpose graph query language from Apache TinkerPop. We used LDBC Graphalytics framework and compared NoSQL-based setups with SQL-based setups. We evaluated JanusGraph on HBase both on single machine and cluster and SQLG on top of PostgreSQL and H2. We used datasets from the different domains and of different sizes up to tens of millions vertices and edges. Evaluation results show that for the used workload SQLG with PostgreSQL is about ten times faster than JanusGraph on HBase and SQLG with H2 performance is in between.","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"20 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131435148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research of Techniques to Improve the Performance of Explicit Numerical Methods on the CPU","authors":"V. Furgailo, A. Ivanov, Nikolay Khokhlov","doi":"10.1109/IVMEM.2019.00019","DOIUrl":"https://doi.org/10.1109/IVMEM.2019.00019","url":null,"abstract":"This article explores the feasibility of high-performance computing of explicit numerical methods. For this purpose, the use of vector instructions and AVX operations of local and non-local physical representation of data was investigated. The application of different variations of recursive and non-recursive tiling for 2D and 3D stencil methods was also investigated.","PeriodicalId":166102,"journal":{"name":"2019 Ivannikov Memorial Workshop (IVMEM)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123567087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}