Classification and metaclassification in large scale data mining application for estimation of software projects

D. Dzega, W. Pietruszkiewicz
{"title":"Classification and metaclassification in large scale data mining application for estimation of software projects","authors":"D. Dzega, W. Pietruszkiewicz","doi":"10.1109/UKRICIS.2010.5898136","DOIUrl":null,"url":null,"abstract":"In this article we present an application of Artificial Intelligence for estimation of software projects. The research presented herein was based on several methods of classification and metaclassification. Due to increasing significance of Open Source, we have selected projects being hosted on the leading platform for Open Source projects — Sourceforge.net. In the first part of article, we describe steps of data extraction which was a large scale task because the datasource contained tens of tables and hundreds of fields, that were originally gathered to be used by project management web-based system. Therefore extraction of meaningful data required analysis of databases structure and transformation of sets of records into a four datasets. These datasets were used to predict four factors important to project management i.e skills, time, costs an effectiveness. Later, we present the results of experiments, that were performed using C4.5, RandomTree and CART algorithms. In the final part of this article, we describe how boosting and bagging metaclassifiers were applied to improve the results and we also analyse influence of their parameters on generalization abilities an prediction accuracy.","PeriodicalId":359942,"journal":{"name":"2010 IEEE 9th International Conference on Cyberntic Intelligent Systems","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 9th International Conference on Cyberntic Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UKRICIS.2010.5898136","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

In this article we present an application of Artificial Intelligence for estimation of software projects. The research presented herein was based on several methods of classification and metaclassification. Due to increasing significance of Open Source, we have selected projects being hosted on the leading platform for Open Source projects — Sourceforge.net. In the first part of article, we describe steps of data extraction which was a large scale task because the datasource contained tens of tables and hundreds of fields, that were originally gathered to be used by project management web-based system. Therefore extraction of meaningful data required analysis of databases structure and transformation of sets of records into a four datasets. These datasets were used to predict four factors important to project management i.e skills, time, costs an effectiveness. Later, we present the results of experiments, that were performed using C4.5, RandomTree and CART algorithms. In the final part of this article, we describe how boosting and bagging metaclassifiers were applied to improve the results and we also analyse influence of their parameters on generalization abilities an prediction accuracy.
分类与元分类在大规模数据挖掘软件项目评估中的应用
本文介绍了人工智能在软件项目评估中的应用。本文的研究是基于分类和元分类的几种方法。由于开源的重要性日益增加,我们选择了一些项目托管在开源项目的领先平台——Sourceforge.net上。在文章的第一部分中,我们描述了数据提取的步骤,这是一项大规模的任务,因为数据源包含数十个表和数百个字段,这些数据最初是为基于web的项目管理系统收集的。因此,提取有意义的数据需要分析数据库结构并将记录集转换为四个数据集。这些数据集用于预测项目管理的四个重要因素,即技能、时间、成本和效率。随后,我们给出了使用C4.5、RandomTree和CART算法进行的实验结果。在本文的最后一部分,我们描述了如何使用boosting和bagging元分类器来改进结果,并分析了它们的参数对泛化能力和预测精度的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信