{"title":"Assessing Big Data SQL Frameworks for Analyzing Event Logs","authors":"Markku Hinkka, Teemu Lehto, Keijo Heljanko","doi":"10.1109/PDP.2016.26","DOIUrl":null,"url":null,"abstract":"Performing Process Mining by analyzing event logs generated by various systems is a very computation and I/O intensive task. Distributed computing and Big Data processing frameworks make it possible to distribute all kinds of computation tasks to multiple computers instead of performing the whole task in a single computer. This paper assesses whether contemporary structured query language (SQL) supporting Big Data processing frameworks are mature enough to be efficiently used to distribute computation of two central Process Mining tasks to two dissimilar clusters of computers providing BPM as a service in the cloud. Tests are performed by using a novel automatic testing framework detailed in this paper and its supporting materials. As a result, an assessment is made on how well selected Big Data processing frameworks manage to process and to parallelize the analysis work required by Process Mining tasks.","PeriodicalId":192273,"journal":{"name":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2016.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Performing Process Mining by analyzing event logs generated by various systems is a very computation and I/O intensive task. Distributed computing and Big Data processing frameworks make it possible to distribute all kinds of computation tasks to multiple computers instead of performing the whole task in a single computer. This paper assesses whether contemporary structured query language (SQL) supporting Big Data processing frameworks are mature enough to be efficiently used to distribute computation of two central Process Mining tasks to two dissimilar clusters of computers providing BPM as a service in the cloud. Tests are performed by using a novel automatic testing framework detailed in this paper and its supporting materials. As a result, an assessment is made on how well selected Big Data processing frameworks manage to process and to parallelize the analysis work required by Process Mining tasks.