Towards mining informal online data to guide component-reuse decisions

Sanchit Karve, Christopher Scaffidi
{"title":"Towards mining informal online data to guide component-reuse decisions","authors":"Sanchit Karve, Christopher Scaffidi","doi":"10.1145/2465449.2465459","DOIUrl":null,"url":null,"abstract":"Online repositories provide components available for reuse, but this does not mean all such components are equally reusable. Components might be unreliable, overly specialized, or otherwise inappropriate for reuse. Repositories collect reviews, ratings, and other data intended to help software engineers choose components. But do these data actually provide any information related to reusability? If so, then how can such information be extracted from the data?\n To address this question, we analyzed online ratings, reviews and other data for nearly 1200 online components, computed statistics for each component based on these data, and used factor analysis to identify three groups of statistics (factors) that were each internally correlated. We then interviewed software engineers about the reusability of 36 other components and used linear regression to test how well the 3 factors actually corresponded to component reusability.\n We found that 2 of the 3 factors were indeed related to reusability. Specifically, the reusability of components could be predicted on the basis of component authors' prior work and the documentation provided about components. This result could be used in future work to develop enhanced search engines that highlight components which are potentially reusable and perhaps worthy of more time-consuming evaluation such as by applying formal methods. Additionally, our results reveal opportunities to improve online repositories through specific simplifications as well as enhancements.","PeriodicalId":399536,"journal":{"name":"International Symposium on Component-Based Software Engineering","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Symposium on Component-Based Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2465449.2465459","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Online repositories provide components available for reuse, but this does not mean all such components are equally reusable. Components might be unreliable, overly specialized, or otherwise inappropriate for reuse. Repositories collect reviews, ratings, and other data intended to help software engineers choose components. But do these data actually provide any information related to reusability? If so, then how can such information be extracted from the data? To address this question, we analyzed online ratings, reviews and other data for nearly 1200 online components, computed statistics for each component based on these data, and used factor analysis to identify three groups of statistics (factors) that were each internally correlated. We then interviewed software engineers about the reusability of 36 other components and used linear regression to test how well the 3 factors actually corresponded to component reusability. We found that 2 of the 3 factors were indeed related to reusability. Specifically, the reusability of components could be predicted on the basis of component authors' prior work and the documentation provided about components. This result could be used in future work to develop enhanced search engines that highlight components which are potentially reusable and perhaps worthy of more time-consuming evaluation such as by applying formal methods. Additionally, our results reveal opportunities to improve online repositories through specific simplifications as well as enhancements.
挖掘非正式的在线数据来指导组件重用决策
在线存储库提供了可重用的组件,但这并不意味着所有这些组件都同样可重用。组件可能不可靠、过于专门化,或者不适合重用。存储库收集评审、评级和其他旨在帮助软件工程师选择组件的数据。但是这些数据真的提供了与可重用性相关的信息吗?如果是这样,那么如何从数据中提取这些信息?为了解决这个问题,我们分析了近1200个在线组件的在线评分、评论和其他数据,根据这些数据计算每个组件的统计数据,并使用因子分析来确定三组内部相关的统计数据(因素)。然后,我们就36个其他组件的可重用性采访了软件工程师,并使用线性回归来测试这3个因素与组件可重用性的实际对应程度。我们发现3个因素中有2个确实与可重用性有关。具体来说,可以根据组件作者之前的工作和提供的有关组件的文档来预测组件的可重用性。该结果可用于未来开发增强的搜索引擎,以突出显示可能可重用的组件,并且可能值得通过应用形式化方法进行更耗时的评估。此外,我们的结果揭示了通过特定的简化和增强来改进在线存储库的机会。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信