2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)最新文献

筛选
英文 中文
Import2vec: Learning Embeddings for Software Libraries Import2vec:学习软件库的嵌入
B. Theeten, Frederik Vandeputte, T. V. Cutsem
{"title":"Import2vec: Learning Embeddings for Software Libraries","authors":"B. Theeten, Frederik Vandeputte, T. V. Cutsem","doi":"10.1109/MSR.2019.00014","DOIUrl":"https://doi.org/10.1109/MSR.2019.00014","url":null,"abstract":"We consider the problem of developing suitable learning representations (embeddings) for library packages that capture semantic similarity among libraries. Such representations are known to improve the performance of downstream learning tasks (e.g. classification) or applications such as contextual search and analogical reasoning. We apply word embedding techniques from natural language processing (NLP) to train embeddings for library packages (\"library vectors\"). Library vectors represent libraries by similar context of use as determined by import statements present in source code. Experimental results obtained from training such embeddings on three large open source software corpora reveals that library vectors capture semantically meaningful relationships among software libraries, such as the relationship between frameworks and their plug-ins and libraries commonly used together within ecosystems such as big data infrastructure projects (in Java), front-end and back-end web development frameworks (in JavaScript) and data science toolkits (in Python).","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"27 1","pages":"18-28"},"PeriodicalIF":0.0,"publicationDate":"2019-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84052019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories 从大型git存储库中挖掘带有时间戳的协同编辑网络
Christoph Gote, Ingo Scholtes, F. Schweitzer
{"title":"git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories","authors":"Christoph Gote, Ingo Scholtes, F. Schweitzer","doi":"10.1109/MSR.2019.00070","DOIUrl":"https://doi.org/10.1109/MSR.2019.00070","url":null,"abstract":"Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication, from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"32 2 1","pages":"433-444"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89916240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Identifying Experts in Software Libraries and Frameworks Among GitHub Users 在GitHub用户中识别软件库和框架专家
João Eduardo Montandon, L. L. Silva, M. T. Valente
{"title":"Identifying Experts in Software Libraries and Frameworks Among GitHub Users","authors":"João Eduardo Montandon, L. L. Silva, M. T. Valente","doi":"10.1109/MSR.2019.00054","DOIUrl":"https://doi.org/10.1109/MSR.2019.00054","url":null,"abstract":"Software development increasingly depends on libraries and frameworks to increase productivity and reduce time-to-market. Despite this fact, we still lack techniques to assess developers expertise in widely popular libraries and frameworks. In this paper, we evaluate the performance of unsupervised (based on clustering) and supervised machine learning classifiers (Random Forest and SVM) to identify experts in three popular JavaScript libraries: facebook/react, mongodb/node-mongodb, and socketio/socket.io. First, we collect 13 features about developers activity on GitHub projects, including commits on source code files that depend on these libraries. We also build a ground truth including the expertise of 575 developers on the studied libraries, as self-reported by them in a survey. Based on our findings, we document the challenges of using machine learning classifiers to predict expertise in software libraries, using features extracted from GitHub. Then, we propose a method to identify library experts based on clustering feature data from GitHub; by triangulating the results of this method with information available on Linkedin profiles, we show that it is able to recommend dozens of GitHub users with evidences of being experts in the studied JavaScript libraries. We also provide a public dataset with the expertise of 575 developers on the studied libraries.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"77 1","pages":"276-287"},"PeriodicalIF":0.0,"publicationDate":"2019-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79687192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Automatically Generating Documentation for Lambda Expressions in Java 在Java中自动生成Lambda表达式文档
Anwar Alqaimi, Patanamon Thongtanunam, Christoph Treude
{"title":"Automatically Generating Documentation for Lambda Expressions in Java","authors":"Anwar Alqaimi, Patanamon Thongtanunam, Christoph Treude","doi":"10.1109/MSR.2019.00057","DOIUrl":"https://doi.org/10.1109/MSR.2019.00057","url":null,"abstract":"When lambda expressions were introduced to the Java programming language as part of the release of Java 8 in 2014, they were the language's first step into functional programming. Since lambda expressions are still relatively new, not all developers use or understand them. In this paper, we first present the results of an empirical study to determine how frequently developers of GitHub repositories make use of lambda expressions and how they are documented. We find that 11% of Java GitHub repositories use lambda expressions, and that only 6% of the lambda expressions are accompanied by source code comments. We then present a tool called LambdaDoc which can automatically detect lambda expressions in a Java repository and generate natural language documentation for them. Our evaluation of LambdaDoc with 23 professional developers shows that they perceive the generated documentation to be complete, concise, and expressive, while the majority of the documentation produced by our participants without tool support was inadequate. Our contribution builds an important step towards automatically generating documentation for functional programming constructs in an object-oriented language.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"237 1","pages":"310-320"},"PeriodicalIF":0.0,"publicationDate":"2019-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77276069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
The Emergence of Software Diversity in Maven Central Maven Central中软件多样性的出现
César Soto-Valero, Amine Benelallam, Nicolas Harrand, Olivier Barais, B. Baudry
{"title":"The Emergence of Software Diversity in Maven Central","authors":"César Soto-Valero, Amine Benelallam, Nicolas Harrand, Olivier Barais, B. Baudry","doi":"10.1109/MSR.2019.00059","DOIUrl":"https://doi.org/10.1109/MSR.2019.00059","url":null,"abstract":"Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to release a new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a library can pick any version. In this work, we hypothesize that the immutability of Maven artifacts and the ability to choose any version naturally support the emergence of software diversity within Maven Central. We analyze 1,487,956 artifacts that represent all the versions of 73,653 libraries. We observe that more than 30% of libraries have multiple versions that are actively used by latest artifacts. In the case of popular libraries, more than 50% of their versions are used. We also observe that more than 17% of libraries have several versions that are significantly more used than the other versions. Our results indicate that the immutability of artifacts in Maven Central does support a sustained level of diversity among versions of libraries in the repository.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"27 1","pages":"333-343"},"PeriodicalIF":0.0,"publicationDate":"2019-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80100666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software 开源软件漏洞修复的人工管理数据集
Serena Elisa Ponta, H. Plate, A. Sabetta, M. Bezzi, Cédric Dangremont
{"title":"A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software","authors":"Serena Elisa Ponta, H. Plate, A. Sabetta, M. Bezzi, Cédric Dangremont","doi":"10.1109/MSR.2019.00064","DOIUrl":"https://doi.org/10.1109/MSR.2019.00064","url":null,"abstract":"Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of software that is more secure. While operating a vulnerability assessment tool, which we developed, and that is currently used by hundreds of development units at SAP, we manually collected and curated a dataset of vulnerabilities of open-source software, and the commits fixing them. The data were obtained both from the National Vulnerability Database (NVD), and from project-specific web resources, which we monitor on a continuous basis. From that data, we extracted a dataset that maps 624 publicly disclosed vulnerabilities affecting 205 distinct opensource Java projects, used in SAP products or internal tools, onto the 1282 commits that fix them. Out of 624 vulnerabilities, 29 do not have a CVE (Common Vulnerability and Exposure) identifier at all, and 46, which do have such identifier assigned by a numbering authority, are not available in the NVD yet. The dataset is released under an open-source license, together with supporting scripts that allow researchers to automatically retrieve the actual content of the commits from the corresponding repositories, and to augment the attributes available for each instance. Moreover, these scripts allow to complement the dataset with additional instances that are not security fixes (which is useful, for example, in machine learning applications). Our dataset has been successfully used to train classifiers that could automatically identify security-relevant commits in code repositories. The release of this dataset and the supporting code as open-source will allow future research to be based on data of industrial relevance; it also represents a concrete step towards making the maintenance of this dataset a shared effort involving open-source communities, academia, and the industry.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"92 1","pages":"383-387"},"PeriodicalIF":0.0,"publicationDate":"2019-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87287827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
The Maven Dependency Graph: A Temporal Graph-Based Representation of Maven Central Maven依赖图:Maven Central的临时图表示
Amine Benelallam, Nicolas Harrand, César Soto-Valero, B. Baudry, Olivier Barais
{"title":"The Maven Dependency Graph: A Temporal Graph-Based Representation of Maven Central","authors":"Amine Benelallam, Nicolas Harrand, César Soto-Valero, B. Baudry, Olivier Barais","doi":"10.1109/MSR.2019.00060","DOIUrl":"https://doi.org/10.1109/MSR.2019.00060","url":null,"abstract":"The Maven Central Repository provides an extraordinary source of data to understand complex architecture and evolution phenomena among Java applications. As of September 6, 2018, this repository includes 2.8M artifacts (compiled piece of code implemented in a JVM-based language), each of which is characterized with metadata such as exact version, date of upload and list of dependencies towards other artifacts. Today, one who wants to analyze the complete ecosystem of Maven artifacts and their dependencies faces two key challenges: (i) this is a huge data set; and (ii) dependency relationships among artifacts are not modeled explicitly and cannot be queried. In this paper, we present the Maven Dependency Graph. This open source data set provides two contributions: a snapshot of the whole Maven Central taken on September 6, 2018, stored in a graph database in which we explicitly model all dependencies; an open source infrastructure to query this huge dataset.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"79 1","pages":"344-348"},"PeriodicalIF":0.0,"publicationDate":"2019-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91242426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Recommending Energy-Efficient Java Collections 推荐节能的Java集合
Wellington de Oliveira Júnior, R. Oliveira dos Santos, Fernando José Castor de Lima Filho, Benito Fernandes de Araújo Neto, Gustavo Henrique Lima Pinto
{"title":"Recommending Energy-Efficient Java Collections","authors":"Wellington de Oliveira Júnior, R. Oliveira dos Santos, Fernando José Castor de Lima Filho, Benito Fernandes de Araújo Neto, Gustavo Henrique Lima Pinto","doi":"10.1109/MSR.2019.00033","DOIUrl":"https://doi.org/10.1109/MSR.2019.00033","url":null,"abstract":"Over the last years, increasing attention has been given to creating energy-efficient software systems. However, developers still lack the knowledge and the tools to support them in that task. In this work, we explore our vision that energy consumption non-specialists can build software that consumes less energy by alternating, at development time, between third-party, readily available, diversely-designed pieces of software, without increasing the development complexity. To support our vision, we propose an approach for energy-aware development that combines the construction of application-independent energy profiles of Java collections and static analysis to produce an estimate of in which ways and how intensively a system employs these collections. By combining these two pieces of information, it is possible to produce energy-saving recommendations for alternative collection implementations to be used in different parts of the system. We implement this approach in a tool named CT+ that works with both desktop and mobile Java systems, and is capable of analyzing 40 different collection implementations of lists, maps, and sets. We applied CT+ to twelve software systems: two mobile-based, seven desktop-based, and three that can run in both environments. Our evaluation infrastructure involved a high-end server, a notebook, and three mobile devices. When applying the (mostly trivial) recommendations, we achieved up to 17.34% reduction in energy consumption just by replacing collection implementations. Even for a real world, mature, highly-optimized system such as Xalan, CT+ could achieve a 5.81% reduction in energy consumption. Our results indicate that some widely used collections, e.g., ArrayList, HashMap, and HashTable, are not energy-efficient and sometimes should be avoided when energy consumption is a major concern.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"10 1","pages":"160-170"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79670762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Splitting APIs: An Exploratory Study of Software Unbundling 拆分api:软件解绑的探索性研究
Anderson Severo de Matos, João Bosco Ferreira Filho, Lincoln Souza Rocha
{"title":"Splitting APIs: An Exploratory Study of Software Unbundling","authors":"Anderson Severo de Matos, João Bosco Ferreira Filho, Lincoln Souza Rocha","doi":"10.1109/MSR.2019.00062","DOIUrl":"https://doi.org/10.1109/MSR.2019.00062","url":null,"abstract":"Software unbundling consists of dividing an existing software artifact into smaller ones. Unbundling can be useful for removing clutter from the original application or separating different features that may not share the same purpose, or simply for isolating an emergent functionality that merits to be an application on its own. This phenomenon is frequent with mobile apps and it is also propagating to APIs. This paper proposes a first empirical study on unbundling to understand its effects on popular APIs. We explore the possibilities of splitting libraries into 2 or more bundles based on the use that their client projects make of them. We mine over than 71,000 client projects of 10 open source APIs and automatically generate 2,090 sub-APIs to then study their properties. We find that it is possible to have sets of different ways of using a given API and to unbundle it accordingly; the bundles can vary their representativeness and uniqueness, which is analyzed thoroughly in this study.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"245 1","pages":"360-370"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74495839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Program Committee 项目委员会
Rui Abreu
{"title":"Program Committee","authors":"Rui Abreu","doi":"10.1109/eitt.2018.00007","DOIUrl":"https://doi.org/10.1109/eitt.2018.00007","url":null,"abstract":"Rui Abreu, University of Lisbon, Portugal Jun Ai, Beihang University, China Domenico Amalfitano, University of Naples Federico II, Italy Doo-Hwan Bae, Korea Advanced Institute of Science and Technology, Korea Xiaoying Bai, Tsinghua University, China Lingfeng Bao, Zhejiang University City College, China David Benavides, University of Seville, Spain Antonia Bertolino, Italian National Research Council, Italy Mario Bravetti, Università di Bologna, Italy Christof Budnik, Siemens, Germany Yan Cai, Chinese Academy of Sciences, China Emilia Cambronero, Universidad Castilla-La Mancha, Spain Ana Cavalli, IT SudParis, France Arun Chakrapani Rao, University of Warwick, UK W.K. Chan, City University of Hong Kong, Hong Kong Junjie Chen, Peking University, China Yue Chen, Palo Alto Networks, USA William Chu, Tunghai University, Taiwan Sunita Chulani, Cisco, USA Frederic Dadeau, University of Franche-Comté, France Yuanshun Dai, University of Electronic Science and Technology of China, China Junhua Ding, East Carolina University, USA Tadashi Dohi, Hiroshima University, Japan Wei Dong, National University of Defense Technology, China Yunwei Dong, Northwestern Polytechnical University, China Benedikt Eberhardinger, MHP — A Porsche Company, Germany Khaled El-Fakih, American University of Sharjah, UAE Sadik Esmelioglu, Middle East Technical University, Turkey Hugues Evrard, Imperial College London, UK Joao Pascoal Faria, University of Porto, Portugal Thoshitha Gamage, Southern Illinois University Edwardsville, USA Sudipto Ghosh, Colorado State University, USA Arnaud Gotlieb, Simula Research Laboratory, Norway Matthias Güdemann, Input Output Hong Kong, Hong Kong Rajiv Gupta, University of California, Riverside, USA Chin-Yu Huang, National Tsing-Hua University, Taiwan Song Huang, Army Engineering University, China Ali Hurson, Missouri University of Science and Technology, USA Bo Jiang, Beihang University, China He Jiang, Dalian University of Technology, China Yu Jiang, Tsinghua University, China Xiaoyuan Jing, Wuhan University, China Roland Jochem, TU Berlin, Germany Sun Jun, Singapore University of Technology and Design, Singapore Jacky Keung, City University of Hong Kong, Hong Kong Pavneet Kochhar, Microsoft, USA Xuan-Bach Le, Carnegie Mellon University, USA","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74509026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信