基于扩展关联规则挖掘的缺陷数据分析

Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007) Pub Date : 2007-05-20 DOI:10.1109/MSR.2007.5

Shuji Morisaki, Akito Monden, Tomoko Matsumura, Haruaki Tamada, Ken-ichi Matsumoto

{"title":"基于扩展关联规则挖掘的缺陷数据分析","authors":"Shuji Morisaki, Akito Monden, Tomoko Matsumura, Haruaki Tamada, Ken-ichi Matsumoto","doi":"10.1109/MSR.2007.5","DOIUrl":null,"url":null,"abstract":"This paper describes an empirical study to reveal rules associated with defect correction effort. We defined defect correction effort as a quantitative (ratio scale) variable, and extended conventional (nominal scale based) association rule mining to directly handle such quantitative variables. An extended rule describes the statistical characteristic of a ratio or interval scale variable in the consequent part of the rule by its mean value and standard deviation so that conditions producing distinctive statistics can be discovered As an analysis target, we collected various attributes of about 1,200 defects found in a typical medium-scale, multi-vendor (distance development) information system development project in Japan. Our findings based on extracted rules include: (l)Defects detected in coding/unit testing were easily corrected (less than 7% of mean effort) when they are related to data output or validation of input data. (2)Nevertheless, they sometimes required much more effort (lift of standard deviation was 5.845) in case of low reproducibility, (i)Defects introduced in coding/unit testing often required large correction effort (mean was 12.596 staff-hours and standard deviation was 25.716) when they were related to data handing. From these findings, we confirmed that we need to pay attention to types of defects having large mean effort as well as those having large standard deviation of effort since such defects sometimes cause excess effort.","PeriodicalId":201749,"journal":{"name":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","volume":"22 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"Defect Data Analysis Based on Extended Association Rule Mining\",\"authors\":\"Shuji Morisaki, Akito Monden, Tomoko Matsumura, Haruaki Tamada, Ken-ichi Matsumoto\",\"doi\":\"10.1109/MSR.2007.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describes an empirical study to reveal rules associated with defect correction effort. We defined defect correction effort as a quantitative (ratio scale) variable, and extended conventional (nominal scale based) association rule mining to directly handle such quantitative variables. An extended rule describes the statistical characteristic of a ratio or interval scale variable in the consequent part of the rule by its mean value and standard deviation so that conditions producing distinctive statistics can be discovered As an analysis target, we collected various attributes of about 1,200 defects found in a typical medium-scale, multi-vendor (distance development) information system development project in Japan. Our findings based on extracted rules include: (l)Defects detected in coding/unit testing were easily corrected (less than 7% of mean effort) when they are related to data output or validation of input data. (2)Nevertheless, they sometimes required much more effort (lift of standard deviation was 5.845) in case of low reproducibility, (i)Defects introduced in coding/unit testing often required large correction effort (mean was 12.596 staff-hours and standard deviation was 25.716) when they were related to data handing. From these findings, we confirmed that we need to pay attention to types of defects having large mean effort as well as those having large standard deviation of effort since such defects sometimes cause excess effort.\",\"PeriodicalId\":201749,\"journal\":{\"name\":\"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)\",\"volume\":\"22 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MSR.2007.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSR.2007.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

摘要

本文描述了一项实证研究，以揭示与缺陷纠正工作相关的规则。我们将缺陷纠正工作定义为一个定量的(比例尺度)变量，并扩展了传统的(基于名义尺度的)关联规则挖掘来直接处理这些定量变量。扩展规则通过其平均值和标准偏差来描述规则后续部分中比率或间隔尺度变量的统计特征，从而可以发现产生独特统计的条件。作为分析目标，我们收集了在日本一个典型的中等规模、多供应商(远程开发)信息系统开发项目中发现的大约1200个缺陷的各种属性。我们基于提取规则的发现包括:(1)在编码/单元测试中检测到的缺陷，当它们与数据输出或输入数据的验证相关时，很容易纠正(不到平均工作量的7%)。(2)然而，在低再现性的情况下，它们有时需要更多的努力(标准偏差的提升率为5.845);(i)在编码/单元测试中引入的缺陷，当它们与数据处理相关时，通常需要大量的纠正工作(平均值为12.596员工小时，标准偏差为25.716)。从这些发现中，我们确认我们需要注意具有较大平均工作量的缺陷类型以及具有较大工作标准偏差的缺陷类型，因为这些缺陷有时会导致过多的工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Defect Data Analysis Based on Extended Association Rule Mining

This paper describes an empirical study to reveal rules associated with defect correction effort. We defined defect correction effort as a quantitative (ratio scale) variable, and extended conventional (nominal scale based) association rule mining to directly handle such quantitative variables. An extended rule describes the statistical characteristic of a ratio or interval scale variable in the consequent part of the rule by its mean value and standard deviation so that conditions producing distinctive statistics can be discovered As an analysis target, we collected various attributes of about 1,200 defects found in a typical medium-scale, multi-vendor (distance development) information system development project in Japan. Our findings based on extracted rules include: (l)Defects detected in coding/unit testing were easily corrected (less than 7% of mean effort) when they are related to data output or validation of input data. (2)Nevertheless, they sometimes required much more effort (lift of standard deviation was 5.845) in case of low reproducibility, (i)Defects introduced in coding/unit testing often required large correction effort (mean was 12.596 staff-hours and standard deviation was 25.716) when they were related to data handing. From these findings, we confirmed that we need to pay attention to types of defects having large mean effort as well as those having large standard deviation of effort since such defects sometimes cause excess effort.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007)

自引率

0.00%

发文量