{"title":"Utilizing cluster quality in hierarchical clustering for analogy-based software effort estimation","authors":"Jack H. C. Wu, J. Keung","doi":"10.1109/ICSESS.2017.8342851","DOIUrl":null,"url":null,"abstract":"Analogy-based software effort estimation is one of the most popular estimation methods. It is built upon the principle of case-based reasoning (CBR) based on the k-th similar projects completed in the past. Therefore the determination of the k value is crucial to the prediction performance. Various research have been carried out to use a single and fixed k value for experiments, and it is known that dynamically allocated k values in an experiment will produce the optimized performance. This paper proposes an interesting technique based on hierarchical clustering to produce a range for k through various cluster quality criteria. We find that complete linkage clustering is more suitable for large datasets while single linkage clustering is suitable for small datasets. The method searches for optimized k values based on the proposed heuristic optimization technique, which have the advantages of easy computation and optimized for the dataset being investigated. Datasets from the PROMISE repository have been used to evaluate the proposed technique. The results of the experiments show that the proposed method is able to determine an optimized set of k values for analogy-based prediction, and to give estimates that outperformed traditional models based on a fixed k value. The implication is significant in that the analogy-based model will be optimized according the dataset being used, without the need to ask an expert to determining a single, fixed k value.","PeriodicalId":179815,"journal":{"name":"2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS)","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 8th IEEE International Conference on Software Engineering and Service Science (ICSESS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSESS.2017.8342851","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Analogy-based software effort estimation is one of the most popular estimation methods. It is built upon the principle of case-based reasoning (CBR) based on the k-th similar projects completed in the past. Therefore the determination of the k value is crucial to the prediction performance. Various research have been carried out to use a single and fixed k value for experiments, and it is known that dynamically allocated k values in an experiment will produce the optimized performance. This paper proposes an interesting technique based on hierarchical clustering to produce a range for k through various cluster quality criteria. We find that complete linkage clustering is more suitable for large datasets while single linkage clustering is suitable for small datasets. The method searches for optimized k values based on the proposed heuristic optimization technique, which have the advantages of easy computation and optimized for the dataset being investigated. Datasets from the PROMISE repository have been used to evaluate the proposed technique. The results of the experiments show that the proposed method is able to determine an optimized set of k values for analogy-based prediction, and to give estimates that outperformed traditional models based on a fixed k value. The implication is significant in that the analogy-based model will be optimized according the dataset being used, without the need to ask an expert to determining a single, fixed k value.