通过变更集度量对模型的变更进行分类

Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings Pub Date : 2022-10-23 DOI:10.1145/3550356.3561563

S. Popoola, Xin Zhao, J. Gray, A. García-Domínguez

{"title":"通过变更集度量对模型的变更进行分类","authors":"S. Popoola, Xin Zhao, J. Gray, A. García-Domínguez","doi":"10.1145/3550356.3561563","DOIUrl":null,"url":null,"abstract":"Automated classification of software changes can help to understand the reason why a change was made, guide the adoption of quality control practices as bugfix trends are observed, and cluster related sets of changes for similar management of the changed artifacts, thereby reducing maintenance efforts. A number of change classification techniques have been developed based on information extracted from the change author, change message, change size, or changed file. However, most of these approaches have targeted textual general-purpose programming languages. Furthermore, some of these approaches are computationally expensive because they often require the analysis of the whole source code, while others rely on the developers' ability to describe a commit via a well-written message. In this paper, we present an approach to classify changes to models into the appropriate maintenance type via a set of metrics that are extracted from the version history of models. We developed seven metrics related to changes applied to models and model elements. We then conducted an empirical study involving 10 classifiers to determine the classifier that offers the best performance for automating the change classification process. These classifiers were trained on over 300 changesets extracted from the version history of 28 Simulink repositories. The results of the study show that the Random Forest classifier offers the best performance. The Random Forest classifier has also been evaluated by comparing its results with labels extracted from the discussions within the issues reported in a similar time frame. The evaluation results show that the Random Forest classifier is able to achieve an F-1 score of 0.83, thereby showing its ability to classify changes into the appropriate categories intended by the original developers.","PeriodicalId":182662,"journal":{"name":"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Classifying changes to models via changeset metrics\",\"authors\":\"S. Popoola, Xin Zhao, J. Gray, A. García-Domínguez\",\"doi\":\"10.1145/3550356.3561563\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automated classification of software changes can help to understand the reason why a change was made, guide the adoption of quality control practices as bugfix trends are observed, and cluster related sets of changes for similar management of the changed artifacts, thereby reducing maintenance efforts. A number of change classification techniques have been developed based on information extracted from the change author, change message, change size, or changed file. However, most of these approaches have targeted textual general-purpose programming languages. Furthermore, some of these approaches are computationally expensive because they often require the analysis of the whole source code, while others rely on the developers' ability to describe a commit via a well-written message. In this paper, we present an approach to classify changes to models into the appropriate maintenance type via a set of metrics that are extracted from the version history of models. We developed seven metrics related to changes applied to models and model elements. We then conducted an empirical study involving 10 classifiers to determine the classifier that offers the best performance for automating the change classification process. These classifiers were trained on over 300 changesets extracted from the version history of 28 Simulink repositories. The results of the study show that the Random Forest classifier offers the best performance. The Random Forest classifier has also been evaluated by comparing its results with labels extracted from the discussions within the issues reported in a similar time frame. The evaluation results show that the Random Forest classifier is able to achieve an F-1 score of 0.83, thereby showing its ability to classify changes into the appropriate categories intended by the original developers.\",\"PeriodicalId\":182662,\"journal\":{\"name\":\"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3550356.3561563\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3550356.3561563","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

软件变更的自动分类可以帮助理解做出变更的原因，在观察到错误修正趋势时指导质量控制实践的采用，并对变更工件的类似管理聚集相关的变更集，从而减少维护工作。基于从变更作者、变更消息、变更大小或变更文件中提取的信息，已经开发了许多变更分类技术。然而，这些方法中的大多数都针对文本通用编程语言。此外，其中一些方法在计算上很昂贵，因为它们通常需要分析整个源代码，而其他方法则依赖于开发人员通过编写良好的消息来描述提交的能力。在本文中，我们提出了一种方法，通过从模型的版本历史中提取的一组指标，将模型的更改分类为适当的维护类型。我们开发了七个与应用于模型和模型元素的变更相关的度量。然后，我们进行了一个涉及10个分类器的实证研究，以确定为自动化变更分类过程提供最佳性能的分类器。这些分类器在从28个Simulink存储库的版本历史中提取的300多个变更集上进行了训练。研究结果表明，随机森林分类器提供了最好的性能。通过将随机森林分类器的结果与在类似时间框架内报告的问题讨论中提取的标签进行比较，还对其进行了评估。评估结果表明，随机森林分类器能够达到F-1分数0.83，从而显示其将变化分类到原始开发人员想要的适当类别的能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Classifying changes to models via changeset metrics

Automated classification of software changes can help to understand the reason why a change was made, guide the adoption of quality control practices as bugfix trends are observed, and cluster related sets of changes for similar management of the changed artifacts, thereby reducing maintenance efforts. A number of change classification techniques have been developed based on information extracted from the change author, change message, change size, or changed file. However, most of these approaches have targeted textual general-purpose programming languages. Furthermore, some of these approaches are computationally expensive because they often require the analysis of the whole source code, while others rely on the developers' ability to describe a commit via a well-written message. In this paper, we present an approach to classify changes to models into the appropriate maintenance type via a set of metrics that are extracted from the version history of models. We developed seven metrics related to changes applied to models and model elements. We then conducted an empirical study involving 10 classifiers to determine the classifier that offers the best performance for automating the change classification process. These classifiers were trained on over 300 changesets extracted from the version history of 28 Simulink repositories. The results of the study show that the Random Forest classifier offers the best performance. The Random Forest classifier has also been evaluated by comparing its results with labels extracted from the discussions within the issues reported in a similar time frame. The evaluation results show that the Random Forest classifier is able to achieve an F-1 score of 0.83, thereby showing its ability to classify changes into the appropriate categories intended by the original developers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings

自引率

0.00%

发文量