Software is data too: how should we deal with it?

IWPSE-EVOL '10 Pub Date : 2010-09-20 DOI:10.1145/1862372.1862374

Andrian Marcus

{"title":"Software is data too: how should we deal with it?","authors":"Andrian Marcus","doi":"10.1145/1862372.1862374","DOIUrl":null,"url":null,"abstract":"Software systems are designed and engineered to process data. However, software is data too. The size and variety of today's software artifacts and the multitude of stakeholder activities result in so much data that individuals can no longer reason about all of it. Software evolution is no longer just about writing code, it is becoming an information management problem.\n Analysis and management of the software data are activities that software engineers are not trained to do. We have to look for solutions outside software engineering, adopt them, and make them our own. These solutions can come from data mining, information retrieval, machine learning, statistical analysis, etc. This is not the first time software engineers are looking at such solutions. It has been going on for about two decades, in a form or another. The results so far indicate that software engineering is facing a paradigm shift, where more and more software engineering tasks are reinterpreted as optimization, search, retrieval, or classification problems. Despite this experience, applications of data analysis, data integration, and data mining in software engineering are in their infancy by comparison with other research fields. New research is needed to adapt existing algorithms and tools for software engineering data and processes, and new ones will have to be created. This research has to be supported by integration with software development processes and with education as well. More than that, in order for this type of research to succeed, it should be supported with new approaches to empirical work, where data and results are shared globally among researchers and practitioners.\n The talk will focus on arguing for and mapping out (part of) this research agenda, while looking back at (some of) the existing work in the area.","PeriodicalId":443035,"journal":{"name":"IWPSE-EVOL '10","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IWPSE-EVOL '10","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1862372.1862374","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Software systems are designed and engineered to process data. However, software is data too. The size and variety of today's software artifacts and the multitude of stakeholder activities result in so much data that individuals can no longer reason about all of it. Software evolution is no longer just about writing code, it is becoming an information management problem. Analysis and management of the software data are activities that software engineers are not trained to do. We have to look for solutions outside software engineering, adopt them, and make them our own. These solutions can come from data mining, information retrieval, machine learning, statistical analysis, etc. This is not the first time software engineers are looking at such solutions. It has been going on for about two decades, in a form or another. The results so far indicate that software engineering is facing a paradigm shift, where more and more software engineering tasks are reinterpreted as optimization, search, retrieval, or classification problems. Despite this experience, applications of data analysis, data integration, and data mining in software engineering are in their infancy by comparison with other research fields. New research is needed to adapt existing algorithms and tools for software engineering data and processes, and new ones will have to be created. This research has to be supported by integration with software development processes and with education as well. More than that, in order for this type of research to succeed, it should be supported with new approaches to empirical work, where data and results are shared globally among researchers and practitioners. The talk will focus on arguing for and mapping out (part of) this research agenda, while looking back at (some of) the existing work in the area.

查看原文本刊更多论文

软件也是数据:我们应该如何处理它?

软件系统是为处理数据而设计和设计的。然而，软件也是数据。当今软件工件的大小和种类以及涉众活动的数量导致了如此多的数据，以至于个人不再能够对所有这些数据进行推理。软件进化不再仅仅是编写代码，它正在成为一个信息管理问题。软件数据的分析和管理是软件工程师没有接受过培训的活动。我们必须在软件工程之外寻找解决方案，采用它们，并使它们成为我们自己的解决方案。这些解决方案可以来自数据挖掘、信息检索、机器学习、统计分析等。这并不是软件工程师第一次看到这样的解决方案。它已经以这样或那样的形式持续了大约二十年。到目前为止的结果表明，软件工程正面临着范式的转变，其中越来越多的软件工程任务被重新解释为优化、搜索、检索或分类问题。尽管有这样的经验，但与其他研究领域相比，数据分析、数据集成和数据挖掘在软件工程中的应用还处于起步阶段。需要进行新的研究，以适应软件工程数据和过程的现有算法和工具，并且必须创建新的算法和工具。这项研究必须得到与软件开发过程和教育的集成的支持。更重要的是，为了使这类研究取得成功，它应该得到新的实证工作方法的支持，在全球范围内，研究人员和从业者之间共享数据和结果。演讲将重点讨论和规划(部分)这一研究议程，同时回顾(部分)这一领域的现有工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IWPSE-EVOL '10

自引率

0.00%

发文量