{"title":"Sampling open source projects from portals: some preliminary investigations","authors":"A. Rainer, Stephen Gale","doi":"10.1109/METRICS.2005.41","DOIUrl":"https://doi.org/10.1109/METRICS.2005.41","url":null,"abstract":"In this paper, we provide a preliminary evaluation of the quality and quantity of data on 50000 open source (OS) projects hosted at the SourceForge.net portal. Using several indicators of project activity, we identify one sample from the entire dataset: the 'most-broadly-active' OS projects. The number of projects that are active across all of our main indicators of activity account for less than 1% of the projects on the portal. 75% of the projects currently hosted on the SourceForge.net portal are not, and have never really been, active on the portal. Furthermore, whilst there has been a substantial increase in the number of projects being added to SourceForge.net over time, the number of projects being added that then go on to become most-broadly-active projects seems to be decreasing over time. Finally, we recognise that care needs to be taken in defining samples, such as the most-broadly-active projects, as these definitions raise implications for the conclusions that one makes and the generalisations that one should draw","PeriodicalId":402415,"journal":{"name":"11th IEEE International Software Metrics Symposium (METRICS'05)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126092862","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A replicated comparison of cross-company and within-company effort estimation models using the ISBSG database","authors":"E. Mendes, C. Lokan, R. Harrison, Chris Triggs","doi":"10.1109/METRICS.2005.4","DOIUrl":"https://doi.org/10.1109/METRICS.2005.4","url":null,"abstract":"Four years ago was the last time the ISBSG database was used to compare the effort prediction accuracy between cross-company and within-company cost models. Since then more than 2,000 projects have been volunteered to this database, which may have changed the trends previously observed. This paper therefore replicates a previous study by investigating how successful a cross-company cost model is: i) to estimate effort for projects that belong to a single company and were not used to build the cross-company model; ii) compared to a within-company cost model. Our within-company data set had data on 184 software projects from a single company and our cross-company data set employed data on 672 software projects. Our results did not corroborate those from the previous study, showing that predictions based on the within-company model were not significantly more accurate than those based on the cross-company model. We analysed the data using forward stepwise regression","PeriodicalId":402415,"journal":{"name":"11th IEEE International Software Metrics Symposium (METRICS'05)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122511779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visualizing historical data using spectrographs","authors":"A. Hassan, Jingwei Wu, R. Holt","doi":"10.1109/METRICS.2005.54","DOIUrl":"https://doi.org/10.1109/METRICS.2005.54","url":null,"abstract":"Studying the evolution of long lived processes such as the development history of a software system or the publication history of a research community, requires the analysis of a vast amount of data. Aggregation techniques and data specific techniques are usually used to cope with the large amount of data. In this paper, we introduce a general technique to study historical data derived from tracking the evolution of long lived processes. We present a visualization approach (evolution spectrographs) to assist in identifying interesting patterns and events during evolutionary analysis of such historical data. We demonstrate the usefulness of spectrographs through several case studies. The data for the case studies are derived from the publication history of conferences in the area of software engineering and from the source control of several large open source projects. Our case studies reveal interesting patterns such as the increase of collaboration over time in the area of software engineering, and the emergence of new research topics. The spectrographs give an overview of the change activities for the subsystems in large software projects","PeriodicalId":402415,"journal":{"name":"11th IEEE International Software Metrics Symposium (METRICS'05)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115460669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using software development progress data to understand threats to project outcomes","authors":"T. Hall, A. Rainer, D. Jagielska","doi":"10.1109/METRICS.2005.52","DOIUrl":"https://doi.org/10.1109/METRICS.2005.52","url":null,"abstract":"In this paper we describe our on-going longitudinal study of a large complex software development project. We discuss how we used project metrics data collected by the development team to identify threats to project outcomes. Identifying and addressing threats to projects early in the development process should significantly reduce the chances of project failure. We have analysed project data to pinpoint the sources of threats to the project. The data we have used is embedded in the project's fortnightly progress reports produced by the project team. The progress reports are part of the software measurement program this company operates. The company has highly mature development processes which were assessed at CMM level 5 in 2004. Our analysis shows that standard project progress data can generate rich insights into the project; insights that go beyond those anticipated when the metrics were originally specified. Our results reveal a pattern of threats to the project that the project team can focus on mitigating. The project team is already aware of some threats, for example that communication with the customer is a significant threat to the project. But there are other threats the team is not aware of for example that people issues within the software team are not a significant threat to the project","PeriodicalId":402415,"journal":{"name":"11th IEEE International Software Metrics Symposium (METRICS'05)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123824904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ensemble imputation methods for missing software engineering data","authors":"Bhekisipho Twala, M. Cartwright","doi":"10.1109/METRICS.2005.21","DOIUrl":"https://doi.org/10.1109/METRICS.2005.21","url":null,"abstract":"One primary concern of software engineering is prediction accuracy. We use datasets to build and validate prediction systems of software development effort, for example. However it is not uncommon for datasets to contain missing values. When using machine learning techniques to build such prediction systems, handling of incomplete data is an important issue for classifier learning since missing values in either training or test set or in both sets can affect prediction accuracy. Many works in machine learning and statistics have shown that combining (ensemble) individual classifiers is an effective technique for improving accuracy of classification. The ensemble strategy is investigated in the context of incomplete data and software prediction. An ensemble Bayesian multiple imputation and nearest neighbour single imputation method, BAMINNSI, is proposed that constructs ensembles based on two imputation methods. Strong results on two benchmark industrial datasets using decision trees support the method","PeriodicalId":402415,"journal":{"name":"11th IEEE International Software Metrics Symposium (METRICS'05)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131274559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Measuring fine-grained change in software: towards modification-aware change metrics","authors":"D. Germán, Abram Hindle","doi":"10.1109/METRICS.2005.32","DOIUrl":"https://doi.org/10.1109/METRICS.2005.32","url":null,"abstract":"In this paper we propose the notion of change metrics, those that measure change in a project or its entities. In particular we are interested in measuring fine-grained changes, such as those stored by version control systems (such as CVS). A framework for the classification of change metrics is provided. We discuss the idea of change metrics which are modification aware, that is metrics which evaluate the change itself and not just the change in a measurement of the system before and after the change. We then provide examples of the use of these metrics on two mature projects","PeriodicalId":402415,"journal":{"name":"11th IEEE International Software Metrics Symposium (METRICS'05)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127642004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stéphane Ducasse, Michele Lanza, María Laura Ponisio
{"title":"Butterflies: a visual approach to characterize packages","authors":"Stéphane Ducasse, Michele Lanza, María Laura Ponisio","doi":"10.1109/METRICS.2005.15","DOIUrl":"https://doi.org/10.1109/METRICS.2005.15","url":null,"abstract":"Understanding sets of classes, or packages, is an important activity in the development and reengineering of large object-oriented systems. Packages represent the coarse grained structure of an application. They are artefacts to deploy and structure software, and therefore more than a simple generalization of classes. The relationships between packages and their contained classes are key in the decomposition of an application and its (re)-modularisation. However, it is difficult to quickly grasp the structure of a package and to understand how a package interacts with the rest of the system. We tackle this problem using butterfly visualizations, i.e., dedicated radar charts built from simple package metrics based on a language-independent meta-model. We illustrate our approach on two applications and show how we can retrieve the relevant characteristics of packages","PeriodicalId":402415,"journal":{"name":"11th IEEE International Software Metrics Symposium (METRICS'05)","volume":"17 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123659910","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An outsourcing model of software development","authors":"Rasvan Constantinescu","doi":"10.1109/METRICS.2005.11","DOIUrl":"https://doi.org/10.1109/METRICS.2005.11","url":null,"abstract":"Software engineering is concerned with the theories, methods and tools which are needed to develop software for computers. Brooks (1987) also pointed out that software engineering is a man-made discipline that does not have any universal constants or \"natural laws\" that would provide a clear theoretical platform or anchor points for the discipline. Many of the standards and practices in software engineering have been established or agreed upon by de facto market domination or by negotiation process by key players in the industry. As a result, these standards and \"laws\" are not necessarily compatible with each other or constant. The overall goal of this research is to improve industrial practice of software engineering by presenting a new model of software development. For this purpose, a framework, tentatively named WARP-X, will be developed","PeriodicalId":402415,"journal":{"name":"11th IEEE International Software Metrics Symposium (METRICS'05)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121541855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Survival analysis for the duration of software projects","authors":"Panagiotis Sentas, L. Angelis","doi":"10.1109/METRICS.2005.45","DOIUrl":"https://doi.org/10.1109/METRICS.2005.45","url":null,"abstract":"In the area of software engineering various methods have been proposed in order to predict the cost of a software project in terms of the effort or of the productivity. An important feature which is closely related to the cost is the duration of a software project. In this paper we deal with the problem of studying and modeling the distribution of the time from specification until delivery of a software product. Specifically, we investigate the use of a statistical methodology known from biostatistics as survival analysis. The purpose of such an analysis is to describe the distribution of the duration and also to identify important factors that affect it. The great advantage of survival analysis is that we can utilize information not only from the completed projects in a dataset but also from ongoing projects. The general principles of the methodology are described with examples from applications to known data sets","PeriodicalId":402415,"journal":{"name":"11th IEEE International Software Metrics Symposium (METRICS'05)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125531130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic code coverage metrics: a lognormal perspective","authors":"S. Gokhale, R. Mullen","doi":"10.1109/METRICS.2005.17","DOIUrl":"https://doi.org/10.1109/METRICS.2005.17","url":null,"abstract":"The logical interrelationship between different code coverage types has been well studied, but less so their evolution through time or test. We study the dynamic relationship of four coverage types, namely, block, decision, c-use and p-use by comparing their growth using empirical coverage data generated from extensive testing of a software application with 35 KLOC of code. Our results indicate that as testing increases, the growth trends for each coverage type are surprisingly similar. Not only is each trend consistent with an underlying lognormal distribution of event rate, but also the parameters of the fitted lognormal distributions are closely related. Within the limits of the data, we find quantitative relations between the four coverage types. The paper thus takes a significant step in linking concepts from prior studies of software test sufficiency, test efficiency, and reliability in the context of software execution","PeriodicalId":402415,"journal":{"name":"11th IEEE International Software Metrics Symposium (METRICS'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131188614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}