Filter-INC: Handling Effort-Inconsistency in Software Effort Estimation Datasets

2016 23rd Asia-Pacific Software Engineering Conference (APSEC) Pub Date : 2016-07-02 DOI:10.1109/APSEC.2016.035

Passakorn Phannachitta, J. Keung, K. E. Bennin, Akito Monden, Ken-ichi Matsumoto

{"title":"Filter-INC: Handling Effort-Inconsistency in Software Effort Estimation Datasets","authors":"Passakorn Phannachitta, J. Keung, K. E. Bennin, Akito Monden, Ken-ichi Matsumoto","doi":"10.1109/APSEC.2016.035","DOIUrl":null,"url":null,"abstract":"Effort-inconsistency is a situation where historical software project data used for software effort estimation (SEE) are contaminated by many project cases with similar characteristics but are completed with significantly different amount of effort. Using these data for SEE generally produces inaccurate results; however, an effective technique for its handling is yet made to be available. This study approaches the problem differently from common solutions, where available techniques typically attempt to remove every project case they have detected as outliers. Instead, we hypothesize that data inconsistency is caused by only a few deviant project cases and any attempt to remove those other cases will result in reduced accuracy, largely due to loss of useful information and data diversity. Filter-INC (short for Filtering technique for handling effort-INConsistency in SEE datasets) implements the hypothesis to decide whether a project case being detected by any existing technique should be subject to removal. The evaluation is carried out by comparing the performance of 2 filtering techniques between before and after having Filter-INC applied. The results produced from 8 real-world datasets together with 3 machine-learning models, and evaluated by 4 performance measures show a significant accuracy improvement at the confident interval of 95%. Based on the results, we recommend our proposed hypothesis as an important instrument to design a data preprocessing technique for handling effort-inconsistency in SEE datasets, definitely an important step forward in preprocessing data for a more accurate SEE model.","PeriodicalId":339123,"journal":{"name":"2016 23rd Asia-Pacific Software Engineering Conference (APSEC)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 23rd Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC.2016.035","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Effort-inconsistency is a situation where historical software project data used for software effort estimation (SEE) are contaminated by many project cases with similar characteristics but are completed with significantly different amount of effort. Using these data for SEE generally produces inaccurate results; however, an effective technique for its handling is yet made to be available. This study approaches the problem differently from common solutions, where available techniques typically attempt to remove every project case they have detected as outliers. Instead, we hypothesize that data inconsistency is caused by only a few deviant project cases and any attempt to remove those other cases will result in reduced accuracy, largely due to loss of useful information and data diversity. Filter-INC (short for Filtering technique for handling effort-INConsistency in SEE datasets) implements the hypothesis to decide whether a project case being detected by any existing technique should be subject to removal. The evaluation is carried out by comparing the performance of 2 filtering techniques between before and after having Filter-INC applied. The results produced from 8 real-world datasets together with 3 machine-learning models, and evaluated by 4 performance measures show a significant accuracy improvement at the confident interval of 95%. Based on the results, we recommend our proposed hypothesis as an important instrument to design a data preprocessing technique for handling effort-inconsistency in SEE datasets, definitely an important step forward in preprocessing data for a more accurate SEE model.

查看原文本刊更多论文

Filter-INC:处理软件工作量估计数据集中的工作量不一致

工作量不一致是指用于软件工作量评估(SEE)的历史软件项目数据被许多具有相似特征的项目案例所污染，但完成这些项目的工作量却大不相同。将这些数据用于SEE通常会产生不准确的结果;然而，目前还没有一种有效的处理方法。本研究处理问题的方法不同于常见的解决方案，在常见的解决方案中，可用的技术通常试图删除它们检测到的异常值的每个项目案例。相反，我们假设数据不一致仅由少数偏离项目的情况引起，任何试图删除其他情况的尝试都将导致准确性降低，这主要是由于有用信息和数据多样性的丢失。Filter-INC(用于处理SEE数据集中的工作不一致性的过滤技术的简称)实现假设，以决定是否应该删除任何现有技术检测到的项目案例。通过比较应用Filter-INC前后两种滤波技术的性能来进行评价。从8个真实世界的数据集和3个机器学习模型中产生的结果，通过4个性能度量进行评估，在95%的置信区间内显示出显着的准确性提高。基于这些结果，我们推荐我们提出的假设作为设计一种处理SEE数据集中努力不一致的数据预处理技术的重要工具，这无疑是预处理数据以获得更准确的SEE模型的重要一步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 23rd Asia-Pacific Software Engineering Conference (APSEC)

自引率

0.00%

发文量