Asha Rajbhoj, Prateek Dhawan, T. Vishnu, Pankaj Malhotra, V. Kulkarni
{"title":"Re-Imagining data analytics software development","authors":"Asha Rajbhoj, Prateek Dhawan, T. Vishnu, Pankaj Malhotra, V. Kulkarni","doi":"10.1145/3452383.3452403","DOIUrl":null,"url":null,"abstract":"Creation of data analytics pipeline is a tedious task. The algorithm search space for creating a suitable solution for a given goal in a given constrained infrastructure is generally very large. The exploratory work to choose the best possible solution is an effort-, time- and intellect-intensive process. The current industry practice largely relies on the domain experts for this work. To improve a domain expert’s productivity, we propose a model- and rule-based system to automate the process of creation of data analytics pipeline. The proposed system provides a mechanism to specify domain knowledge in the form of an object model and a set of rules defined over it. Recommendations are given to choose suitable algorithm/s for carrying out various data analytics tasks based on the problem context. On successful creation of the pipeline, the system generates pipeline code. Moreover, the system also generates a trace data to help in cognitive knowledge upgrade. We discuss the approach using case study of sensor data-based health monitoring system and showcase its efficacy and lesson learnt.","PeriodicalId":378352,"journal":{"name":"14th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"14th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3452383.3452403","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Creation of data analytics pipeline is a tedious task. The algorithm search space for creating a suitable solution for a given goal in a given constrained infrastructure is generally very large. The exploratory work to choose the best possible solution is an effort-, time- and intellect-intensive process. The current industry practice largely relies on the domain experts for this work. To improve a domain expert’s productivity, we propose a model- and rule-based system to automate the process of creation of data analytics pipeline. The proposed system provides a mechanism to specify domain knowledge in the form of an object model and a set of rules defined over it. Recommendations are given to choose suitable algorithm/s for carrying out various data analytics tasks based on the problem context. On successful creation of the pipeline, the system generates pipeline code. Moreover, the system also generates a trace data to help in cognitive knowledge upgrade. We discuss the approach using case study of sensor data-based health monitoring system and showcase its efficacy and lesson learnt.