2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)最新文献

筛选
英文 中文
A Study on the Use of IDE Features for Debugging IDE特性在调试中的应用研究
Afsoon Afzal, Claire Le Goues
{"title":"A Study on the Use of IDE Features for Debugging","authors":"Afsoon Afzal, Claire Le Goues","doi":"10.1145/3196398.3196468","DOIUrl":"https://doi.org/10.1145/3196398.3196468","url":null,"abstract":"Integrated development environments (IDEs) provide features to help developers both create and understand code. As maintenance and bug repair are time-consuming and costly activities, IDEs have long integrated debugging features to simplify these tasks. In this paper we investigate the impact of using IDE debugger features on different aspects of programming and debugging. Using the data set provided by MSR challenge track, we compared debugging tasks performed with or without the IDE debugger. We find, on average, that developers spend more time and effort on debugging when they use the debugger. Typically, developers start using the debugger early, at the beginning of a debugging session, and that their editing behavior does not appear to significantly change when they are debugging regardless of whether debugging features are in use.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"157 1","pages":"114-117"},"PeriodicalIF":0.0,"publicationDate":"2018-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89122864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Mining the Mind, Minding the Mine: Grand Challenges in Comprehension and Mining 挖掘心灵,看守矿山:理解与挖掘的重大挑战
Amy J. Ko
{"title":"Mining the Mind, Minding the Mine: Grand Challenges in Comprehension and Mining","authors":"Amy J. Ko","doi":"10.1145/3196398.3196477","DOIUrl":"https://doi.org/10.1145/3196398.3196477","url":null,"abstract":"The program comprehension and mining software repository communities are, in practice, two separate research endeavors. One is concerned with what's happening in a developer's mind, while the other is concerned with what's happening in a team. And yet, implicit in these fields is a common goal to make better software and the common approach of influencing developer decisions. In this keynote, I provide several examples of this overlap, suggesting several grand challenges in comprehension and mining.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"75 1","pages":"118-118"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77923254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Revisiting "Programmers' Build Errors" in the Visual Studio Context 在Visual Studio环境中重新审视“程序员的构建错误”
Noam Rabbani, Michael S. Harvey, Sadnan Saquif, Keheliya Gallaba, Shane McIntosh
{"title":"Revisiting \"Programmers' Build Errors\" in the Visual Studio Context","authors":"Noam Rabbani, Michael S. Harvey, Sadnan Saquif, Keheliya Gallaba, Shane McIntosh","doi":"10.1145/3196398.3196469","DOIUrl":"https://doi.org/10.1145/3196398.3196469","url":null,"abstract":"Build systems translate sources into deliverables. Developers execute builds on a regular basis in order to integrate their personal code changes into testable deliverables. Prior studies have evaluated the rate at which builds in large organizations fail. A recent study at Google has analyzed (among other things) the rate at which builds in developer workspaces fail. In this paper, we replicate the Google study in the Visual Studio context of the MSR challenge. We extract and analyze 13,300 build events, observing that builds are failing 67%-76% less frequently and are fixed 46%-78% faster in our study context. Our results suggest that build failure rates are highly sensitive to contextual factors. Given the large number of factors by which our study contexts differ (e.g., system size, team size, IDE tooling, programming languages), it is not possible to trace the root cause for the large differences in our results. Additional data is needed to arrive at more complete conclusions.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"100 1","pages":"98-101"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73214171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Bayesian Hierarchical Modelling for Tailoring Metric Thresholds 裁剪度量阈值的贝叶斯层次模型
Neil A. Ernst
{"title":"Bayesian Hierarchical Modelling for Tailoring Metric Thresholds","authors":"Neil A. Ernst","doi":"10.1145/3196398.3196443","DOIUrl":"https://doi.org/10.1145/3196398.3196443","url":null,"abstract":"Software is highly contextual. While there are cross-cutting 'global' lessons, individual software projects exhibit many 'local' properties. This data heterogeneity makes drawing local conclusions from global data dangerous. A key research challenge is to construct locally accurate prediction models that are informed by global characteristics and data volumes. Previous work has tackled this problem using clustering and transfer learning approaches, which identify locally similar characteristics. This paper applies a simpler approach known as Bayesian hierarchical modeling. We show that hierarchical modeling supports cross-project comparisons, while preserving local context. To demonstrate the approach, we conduct a conceptual replication of an existing study on setting software metrics thresholds. Our emerging results show our hierarchical model reduces model prediction error compared to a global approach by up to 50%.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"251 1","pages":"587-591"},"PeriodicalIF":0.0,"publicationDate":"2018-04-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80719815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Evaluating How Developers Use General-Purpose Web-Search for Code Retrieval 评估开发人员如何使用通用的web搜索进行代码检索
Md Masudur Rahman, J. Barson, Sydney Paul, Joshua Kayan, F. Lois, S. Quezada, Chris Parnin, Kathryn T. Stolee, Baishakhi Ray
{"title":"Evaluating How Developers Use General-Purpose Web-Search for Code Retrieval","authors":"Md Masudur Rahman, J. Barson, Sydney Paul, Joshua Kayan, F. Lois, S. Quezada, Chris Parnin, Kathryn T. Stolee, Baishakhi Ray","doi":"10.1145/3196398.3196425","DOIUrl":"https://doi.org/10.1145/3196398.3196425","url":null,"abstract":"Search is an integral part of a software development process. Developers often use search engines to look for information during development, including reusable code snippets, API understanding, and reference examples. Developers tend to prefer general-purpose search engines like Google, which are often not optimized for code related documents and use search strategies and ranking techniques that are more optimized for generic, non-code related information. In this paper, we explore whether a general purpose search engine like Google is an optimal choice for code-related searches. In particular, we investigate whether the performance of searching with Google varies for code vs. non-code related searches. To analyze this, we collect search logs from 310 developers that contains nearly 150,000 search queries from Google and the associated result clicks. To di?erentiate between code-related searches and non-code related searches, we build a model which identifies code intent of queries. Leveraging this model, we build an automatic classifier that detects a code and non-code related query. We confirm the e?ectiveness of the classifier on manually annotated queries where the classifier achieves a precision of 87%, a recall of 86%, and an F1-score of 87%. We apply this classifier to automatically annotate all the queries in the dataset. Analyzing this dataset, we observe that code related searching often requires more e?ort (e.g., time, result clicks, and query modifications) than general non-code search, which indicates code search performance with a general search engine is less effective.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"16 1","pages":"465-475"},"PeriodicalIF":0.0,"publicationDate":"2018-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77150431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 49
Natural Language or Not (NLoN) - A Package for Software Engineering Text Analysis Pipeline 是否使用自然语言(NLoN)——一个用于软件工程文本分析管道的软件包
M. Mäntylä, Fabio Calefato, Maëlick Claes
{"title":"Natural Language or Not (NLoN) - A Package for Software Engineering Text Analysis Pipeline","authors":"M. Mäntylä, Fabio Calefato, Maëlick Claes","doi":"10.1145/3196398.3196444","DOIUrl":"https://doi.org/10.1145/3196398.3196444","url":null,"abstract":"The use of natural language processing (NLP) is gaining popularity in software engineering. In order to correctly perform NLP, we must pre-process the textual information to separate natural language from other information, such as log messages, that are often part of the communication in software engineering. We present a simple approach for classifying whether some textual input is natural language or not. Although our NLoN package relies on only 11 language features and character tri-grams, we are able to achieve an area under the ROC curve performances between 0.976-0.987 on three different data sources, with Lasso regression from Glmnet as our learner and two human raters for providing ground truth. Cross-source prediction performance is lower and has more fluctuation with top ROC performances from 0.913 to 0.980. Compared with prior work, our approach offers similar performance but is considerably more lightweight, making it easier to apply in software engineering text mining pipelines. Our source code and data are provided as an R-package for further improvements.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"18 1","pages":"387-391"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88847628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Public Git Archive: A Big Code Dataset for All 公共Git归档:一个大的代码数据集
Vadim Markovtsev, Waren Long
{"title":"Public Git Archive: A Big Code Dataset for All","authors":"Vadim Markovtsev, Waren Long","doi":"10.1145/3196398.3196464","DOIUrl":"https://doi.org/10.1145/3196398.3196464","url":null,"abstract":"The number of open source software projects has been growing exponentially. The major online software repository host, GitHub, has accumulated tens of millions of publicly available Git version-controlled repositories. Although the research potential enabled by the available open source code is clearly substantial, no significant large-scale open source code datasets exist. In this paper, we present the Public Git Archive – dataset of 182,014 top-bookmarked Git repositories from GitHub. We describe the novel data retrieval pipeline to reproduce it. We also elaborate on the strategy for performing dataset updates and legal issues. The Public Git Archive occupies 3.0 TB on disk and is an order of magnitude larger than the current source code datasets. The dataset is made available through HTTP and provides the source code of the projects, the related metadata, and development history. The data retrieval pipeline employs an optimized worker queue model and an optimized archive format to efficiently store forked Git repositories, reducing the amount of data to download and persist. Public Git Archive aims to open a myriad of new opportunities for \"Big Code\" research.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"37 1","pages":"34-37"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89571683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
SOTorrent: Reconstructing and Analyzing the Evolution of Stack Overflow Posts SOTorrent:堆栈溢出帖子的重构与演变分析
Sebastian Baltes, Lorik Dumani, Christoph Treude, S. Diehl
{"title":"SOTorrent: Reconstructing and Analyzing the Evolution of Stack Overflow Posts","authors":"Sebastian Baltes, Lorik Dumani, Christoph Treude, S. Diehl","doi":"10.1145/3196398.3196430","DOIUrl":"https://doi.org/10.1145/3196398.3196430","url":null,"abstract":"Stack Overflow (SO) is the most popular question-and-answer website for software developers, providing a large amount of code snippets and free-form text on a wide variety of topics. Like other software artifacts, questions and answers on SO evolve over time, for example when bugs in code snippets are fixed, code is updated to work with a more recent library version, or text surrounding a code snippet is edited for clarity. To be able to analyze how content on SO evolves, we built SOTorrent, an open dataset based on the official SO data dump. SOTorrent provides access to the version history of SO content at the level of whole posts and individual text or code blocks. It connects SO posts to other platforms by aggregating URLs from text blocks and by collecting references from GitHub files to SO posts. In this paper, we describe how we built SOTorrent, and in particular how we evaluated 134 different string similarity metrics regarding their applicability for reconstructing the version history of text and code blocks. Based on a first analysis using the dataset, we present insights into the evolution of SO posts, e.g., that post edits are usually small, happen soon after the initial creation of the post, and that code is rarely changed without also updating the surrounding text. Further, our analysis revealed a close relationship between post edits and comments. Our vision is that researchers will use SOTorrent to investigate and understand the evolution of SO posts and their relation to other platforms such as GitHub.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"18 1","pages":"319-330"},"PeriodicalIF":0.0,"publicationDate":"2018-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81874278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 104
A Benchmark Study on Sentiment Analysis for Software Engineering Research 面向软件工程研究的情感分析基准研究
Nicole Novielli, Daniela Girardi, F. Lanubile
{"title":"A Benchmark Study on Sentiment Analysis for Software Engineering Research","authors":"Nicole Novielli, Daniela Girardi, F. Lanubile","doi":"10.1145/3196398.3196403","DOIUrl":"https://doi.org/10.1145/3196398.3196403","url":null,"abstract":"A recent research trend has emerged to identify developers' emotions, by applying sentiment analysis to the content of communication traces left in collaborative development environments. Trying to overcome the limitations posed by using off-the-shelf sentiment analysis tools, researchers recently started to develop their own tools for the software engineering domain. In this paper, we report a benchmark study to assess the performance and reliability of three sentiment analysis tools specifically customized for software engineering. Furthermore, we offer a reflection on the open challenges, as they emerge from a qualitative analysis of misclassified texts.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"1 1","pages":"364-375"},"PeriodicalIF":0.0,"publicationDate":"2018-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76813219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 96
A Gold Standard for Emotion Annotation in Stack Overflow 堆栈溢出中情感标注的黄金标准
Nicole Novielli, Fabio Calefato, F. Lanubile
{"title":"A Gold Standard for Emotion Annotation in Stack Overflow","authors":"Nicole Novielli, Fabio Calefato, F. Lanubile","doi":"10.1145/3196398.3196453","DOIUrl":"https://doi.org/10.1145/3196398.3196453","url":null,"abstract":"Software developers experience and share a wide range of emotions throughout a rich ecosystem of communication channels. A recent trend that has emerged in empirical software engineering studies is leveraging sentiment analysis of developers' communication traces. We release a dataset of 4,800 questions, answers, and comments from Stack Overflow, manually annotated for emotions. Our dataset contributes to the building of a shared corpus of annotated resources to support research on emotion awareness in software development.","PeriodicalId":6639,"journal":{"name":"2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR)","volume":"79 1","pages":"14-17"},"PeriodicalIF":0.0,"publicationDate":"2018-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81635557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信