{"title":"ARPA: Armenian Paraphrase Detection Corpus and Models","authors":"Arthur Malajyan, K. Avetisyan, Tsolak Ghukasyan","doi":"10.1109/IVMEM51402.2020.00012","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00012","url":null,"abstract":"In this work, we employ a semi-automatic method based on back translation to generate a sentential paraphrase corpus for the Armenian language. The initial collection of sentences is translated from Armenian to English and back twice, resulting in pairs of lexically distant but semantically similar sentences. The generated paraphrases are then manually reviewed and annotated. Using the method train and test datasets are created, containing 2360 paraphrases in total. In addition, the datasets are used to train and evaluate BERT-based models for detecting paraphrase in Armenian, achieving results comparable to the state-of-the-art of other languages.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130092186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Aslanyan, Mariam Arutunian, G. Keropyan, S. Kurmangaleev, V. Vardanyan
{"title":"BinSide : Static Analysis Framework for Defects Detection in Binary Code","authors":"H. Aslanyan, Mariam Arutunian, G. Keropyan, S. Kurmangaleev, V. Vardanyan","doi":"10.1109/IVMEM51402.2020.00007","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00007","url":null,"abstract":"Software developers make mistakes that can lead to failures of a software product. One approach to detect defects is static analysis: examine code without execution. Currently, various source code static analysis tools are widely used to detect defects. However, source code analysis is not enough. The reason for this is the use of third-party binary libraries, the unprovability of the correctness of all compiler optimizations. This paper introduces BinSide : binary static analysis framework for defects detection. It does interprocedural, context-sensitive and flow-sensitive analysis. The framework uses platform independent intermediate representation and provide opportunity to analyze various architectures binaries. The framework includes value analysis, reaching definition, taint analysis, freed memory analysis, constant folding, and constant propagation engines. It provides API (application programming interface) and can be used to develop new analyzers. Additionally, we used the API to develop checkers for classic buffer overflow, format string, command injection, double free and use after free defects detection.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114780259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Teslyuk, S. Bobkov, Alexander Belyaev, Alexander Filippov, K. Izotov, I. Lyalin, Andrey Shitov, Leonid Yasnopolsky, V. Velikhov
{"title":"Architecture and deployment details of scalable Jupyter environment at Kurchatov Institute supercomputing centre","authors":"A. Teslyuk, S. Bobkov, Alexander Belyaev, Alexander Filippov, K. Izotov, I. Lyalin, Andrey Shitov, Leonid Yasnopolsky, V. Velikhov","doi":"10.1109/IVMEM51402.2020.00017","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00017","url":null,"abstract":"Jupyter notebook is a popular framework for interactive application development and data analysis. Deployment of JupyterHub on a supercomputer infrastructure would allow to combine high computing power and large storage capacity with convenience and ease of use for end users. In this work we present the architecture and deployment details of Jupyter framework in Kurchatov Institute computing infrastructure. In our setup we combined JupyterHub with CEPHfs storage system, FreeIPA user management system, customized CUDA-compatible image with worker applications and used Kubernetes as a component orchestrator.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124809545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Khoroshilov, V. Kuliamin, A. Petrenko, I. Shchepetkov
{"title":"A State-based Refinement Technique for Event-B","authors":"A. Khoroshilov, V. Kuliamin, A. Petrenko, I. Shchepetkov","doi":"10.1109/IVMEM51402.2020.00015","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00015","url":null,"abstract":"Formal models can be used to describe and reason about the behavior and properties of a given system. In some cases, it is even possible to prove that the system satisfies the given properties. This allows detecting design errors and inconsistencies early and fixing them before starting development. Such models are usually created using stepwise refinement: starting with the simple, abstract model of the system, and then incrementally refining it adding more details at each subsequent level of refinement. Top levels of the model usually describe the high-level design or purpose of the system, while the lower levels are more directly comparable with the implementation code. In this paper, we present a new, alternative refinement technique for Event-B which can simplify the development of complicated models with a large gap between high-level design and implementation.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129483255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Grigorieva, E. Tretyakov, A. Klimentov, D. Golubkov, T. Korchuganova, A. Alekseev, A. Artamonov, T. Galkin
{"title":"High Energy Physics Data Popularity : ATLAS Datasets Popularity Case Study","authors":"M. Grigorieva, E. Tretyakov, A. Klimentov, D. Golubkov, T. Korchuganova, A. Alekseev, A. Artamonov, T. Galkin","doi":"10.1109/IVMEM51402.2020.00010","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00010","url":null,"abstract":"The amount of scientific data generated by the LHC experiments has hit the exabyte scale. These data are transferred, processed and analyzed in hundreds of computing centers. The popularity of data among individual physicists and University groups has become one of the key factors of efficient data management and processing. It was actively used during LHC Run 1 and Run 2 by the experiments for the central data processing, and allowed the optimization of data placement policies and to spread the workload more evenly over the existing computing resources. Besides the central data processing, the LHC experiments provide storage and computing resources for physics analysis to thousands of users. Taking into account the significant increase of data volume and processing time after the collider upgrade for the High Luminosity Runs (2027– 2036) an intelligent data placement based on data access pattern becomes even more crucial than at the beginning of LHC. In this study we provide a detailed exploration of data popularity using ATLAS data samples. In addition, we analyze the geolocations of computing sites where the data were processed, and the locality of the home institutes of users carrying out physics analysis. Cartography visualization, based on this data, allows the correlation of existing data placement with physics needs, providing a better understanding of data utilization by different categories of user’s tasks.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114823419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimation of Watermark Embedding Capacity with Line Space Shifting","authors":"A. Kozachok, S. Kopylov","doi":"10.1109/IVMEM51402.2020.00011","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00011","url":null,"abstract":"The article describes an analytical model of the maximum achievable embedding capacity evaluation for robust watermark based on the approach to information embedding in text data by line space shifting. The developed model allows to boundary values assessment of information amount that may contain a watermark embedded into text data printed. In the developing process of an analytical model, the dependence of maximum achievable embedding capacity on the lines amount of a text document and the used watermark embedding parameters was established. The relationship between the parameters of a text document and the lines number per page of a text document is mathematically described. Mathematical calculations of the obtained expressions and the corresponding experimental researches are conducted. The evaluation of obtained simulation results correspondence to the parameters of texts printed on paper is implemented. The simulation results are analyzed and a linear dependence of the results is established. The obtained values are approximated and analytical expressions that allow one to quantify the maximum achievable embedding capacity of the developed robust watermark depending on the embedding parameters used are received. The degree of contradictions between the following parameters of robust watermarks: embedding capacity, extractability and robustness is estimated. The relationship between the maximum achievable embedding capacity and the accuracy of the extraction of the developed watermark is determined. Quantitative estimates of the influence of the size of the watermark on the final extraction accuracy of embedded information are given. The further research directions are determined.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133987245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Possibilities of Computer Lexicography in Compiling Highly Specialized Terminological Printed and Electronic Dictionaries (Field of Aviation Engineering)","authors":"V. Ryzhkova","doi":"10.1109/IVMEM51402.2020.00013","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00013","url":null,"abstract":"The article covers the modern trends of compiling printed and electronic field-specific dictionaries of technical terms. It discloses both theoretical and practical aspects of compiling such dictionaries.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116800654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Kozachok, A. Spirin, Alexander I. Kozachok, Alexey N. Tsibulia
{"title":"Classification of pseudo-random sequences based on the random forest algorithm","authors":"A. Kozachok, A. Spirin, Alexander I. Kozachok, Alexey N. Tsibulia","doi":"10.1109/IVMEM51402.2020.00016","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00016","url":null,"abstract":"Due to the increased number of information leaks caused by internal violators and the lack of mechanisms in modern DLP systems to counter information leaks in encrypted or compressed form, was proposed a method for classifying sequences formed by encryption and data compression algorithms. An algorithm for constructing a random forest was proposed, and the choice of classifier hyper parameters was justified. The presented approach showed the accuracy of classification of the sequences specified in the work 0.98.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127685838","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Determining Soil Parameters","authors":"S. Zasukhin, E. Zasukhina","doi":"10.1109/IVMEM51402.2020.00020","DOIUrl":"https://doi.org/10.1109/IVMEM51402.2020.00020","url":null,"abstract":"The problem of determining soil parameters is considered. Their exact knowledge is of great importance for planning and managing water systems, assessing the possible size of catastrophic floods, etc. These parameters are proposed to be found by solving some optimal control problem, where the controlled process is described by the Richards equation. The objective function is mean-square deviation of the observed soil moisture values from its simulated values, which are obtained from the solution of the Richards equation with the selected parameters values. Numerical optimization is performed using Newton method. Derivatives of the objective function are calculated using fast automatic differentiation techniques.","PeriodicalId":325794,"journal":{"name":"2020 Ivannikov Memorial Workshop (IVMEM)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113944920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}