Utilizing cluster quality in hierarchical clustering for analogy-based software effort estimation

2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS) Pub Date : 2017-11-01 DOI:10.1109/ICSESS.2017.8342851

Jack H. C. Wu, J. Keung

{"title":"Utilizing cluster quality in hierarchical clustering for analogy-based software effort estimation","authors":"Jack H. C. Wu, J. Keung","doi":"10.1109/ICSESS.2017.8342851","DOIUrl":null,"url":null,"abstract":"Analogy-based software effort estimation is one of the most popular estimation methods. It is built upon the principle of case-based reasoning (CBR) based on the k-th similar projects completed in the past. Therefore the determination of the k value is crucial to the prediction performance. Various research have been carried out to use a single and fixed k value for experiments, and it is known that dynamically allocated k values in an experiment will produce the optimized performance. This paper proposes an interesting technique based on hierarchical clustering to produce a range for k through various cluster quality criteria. We find that complete linkage clustering is more suitable for large datasets while single linkage clustering is suitable for small datasets. The method searches for optimized k values based on the proposed heuristic optimization technique, which have the advantages of easy computation and optimized for the dataset being investigated. Datasets from the PROMISE repository have been used to evaluate the proposed technique. The results of the experiments show that the proposed method is able to determine an optimized set of k values for analogy-based prediction, and to give estimates that outperformed traditional models based on a fixed k value. The implication is significant in that the analogy-based model will be optimized according the dataset being used, without the need to ask an expert to determining a single, fixed k value.","PeriodicalId":179815,"journal":{"name":"2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSESS.2017.8342851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Analogy-based software effort estimation is one of the most popular estimation methods. It is built upon the principle of case-based reasoning (CBR) based on the k-th similar projects completed in the past. Therefore the determination of the k value is crucial to the prediction performance. Various research have been carried out to use a single and fixed k value for experiments, and it is known that dynamically allocated k values in an experiment will produce the optimized performance. This paper proposes an interesting technique based on hierarchical clustering to produce a range for k through various cluster quality criteria. We find that complete linkage clustering is more suitable for large datasets while single linkage clustering is suitable for small datasets. The method searches for optimized k values based on the proposed heuristic optimization technique, which have the advantages of easy computation and optimized for the dataset being investigated. Datasets from the PROMISE repository have been used to evaluate the proposed technique. The results of the experiments show that the proposed method is able to determine an optimized set of k values for analogy-based prediction, and to give estimates that outperformed traditional models based on a fixed k value. The implication is significant in that the analogy-based model will be optimized according the dataset being used, without the need to ask an expert to determining a single, fixed k value.

查看原文本刊更多论文

利用分层聚类中的聚类质量进行基于类比的软件工作量估计

基于类比的软件工作量估算是最流行的估算方法之一。它建立在基于案例推理(CBR)的原则之上，基于过去完成的第k个类似项目。因此，k值的确定对预测性能至关重要。已经进行了各种研究，使用单一固定的k值进行实验，并且已知在实验中动态分配k值将产生最佳性能。本文提出了一种基于分层聚类的有趣技术，通过各种聚类质量标准来产生k的范围。我们发现完全链接聚类更适合于大数据集，而单链接聚类更适合于小数据集。该方法在提出的启发式优化技术的基础上搜索最优k值，具有计算简单、针对所研究的数据集进行优化的优点。已经使用PROMISE存储库中的数据集来评估所建议的技术。实验结果表明，该方法能够为基于类比的预测确定一组优化的k值，并给出优于基于固定k值的传统模型的估计。这意味着基于类比的模型将根据所使用的数据集进行优化，而不需要请专家确定单个固定的k值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS)

自引率

0.00%

发文量