Re-Imagining data analytics software development

14th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference) Pub Date : 2021-02-25 DOI:10.1145/3452383.3452403

Asha Rajbhoj, Prateek Dhawan, T. Vishnu, Pankaj Malhotra, V. Kulkarni

{"title":"Re-Imagining data analytics software development","authors":"Asha Rajbhoj, Prateek Dhawan, T. Vishnu, Pankaj Malhotra, V. Kulkarni","doi":"10.1145/3452383.3452403","DOIUrl":null,"url":null,"abstract":"Creation of data analytics pipeline is a tedious task. The algorithm search space for creating a suitable solution for a given goal in a given constrained infrastructure is generally very large. The exploratory work to choose the best possible solution is an effort-, time- and intellect-intensive process. The current industry practice largely relies on the domain experts for this work. To improve a domain expert’s productivity, we propose a model- and rule-based system to automate the process of creation of data analytics pipeline. The proposed system provides a mechanism to specify domain knowledge in the form of an object model and a set of rules defined over it. Recommendations are given to choose suitable algorithm/s for carrying out various data analytics tasks based on the problem context. On successful creation of the pipeline, the system generates pipeline code. Moreover, the system also generates a trace data to help in cognitive knowledge upgrade. We discuss the approach using case study of sensor data-based health monitoring system and showcase its efficacy and lesson learnt.","PeriodicalId":378352,"journal":{"name":"14th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"14th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3452383.3452403","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Creation of data analytics pipeline is a tedious task. The algorithm search space for creating a suitable solution for a given goal in a given constrained infrastructure is generally very large. The exploratory work to choose the best possible solution is an effort-, time- and intellect-intensive process. The current industry practice largely relies on the domain experts for this work. To improve a domain expert’s productivity, we propose a model- and rule-based system to automate the process of creation of data analytics pipeline. The proposed system provides a mechanism to specify domain knowledge in the form of an object model and a set of rules defined over it. Recommendations are given to choose suitable algorithm/s for carrying out various data analytics tasks based on the problem context. On successful creation of the pipeline, the system generates pipeline code. Moreover, the system also generates a trace data to help in cognitive knowledge upgrade. We discuss the approach using case study of sensor data-based health monitoring system and showcase its efficacy and lesson learnt.

查看原文本刊更多论文

重新构想数据分析软件开发

创建数据分析管道是一项繁琐的任务。在给定约束的基础设施中为给定目标创建合适的解决方案的算法搜索空间通常非常大。选择最佳可能解决方案的探索性工作是一个耗费精力、时间和智力的过程。当前的行业实践在很大程度上依赖于领域专家来完成这项工作。为了提高领域专家的工作效率，我们提出了一个基于模型和规则的系统来自动化数据分析管道的创建过程。提出的系统提供了一种机制，以对象模型和在其上定义的一组规则的形式指定领域知识。根据问题上下文，给出了选择合适算法来执行各种数据分析任务的建议。成功创建管道后，系统生成管道代码。此外，系统还生成跟踪数据，帮助认知知识升级。以基于传感器数据的健康监测系统为例，讨论了该方法的有效性和经验教训。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

14th Innovations in Software Engineering Conference (formerly known as India Software Engineering Conference)

自引率

0.00%

发文量