2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)最新文献

Improving Agile Planning for Reliable Software Delivery 改进敏捷计划，实现可靠的软件交付

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) Pub Date : 2023-05-01 DOI: 10.1109/MSR59073.2023.00017

Jirat Pasuksmit, Fan Jiang, Kemp Thornton, A. Friedman, Natalija Fuksmane, Isabelle Kohout, Julian Connor

引用次数: 0

Automating Arduino Programming: From Hardware Setups to Sample Source Code Generation 自动化Arduino编程:从硬件设置到示例源代码生成

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) Pub Date : 2023-05-01 DOI: 10.1109/MSR59073.2023.00069

Imam Nur Bani Yusuf, Diyanah Binte Abdul Jamal, Lingxiao Jiang

{"title":"Automating Arduino Programming: From Hardware Setups to Sample Source Code Generation","authors":"Imam Nur Bani Yusuf, Diyanah Binte Abdul Jamal, Lingxiao Jiang","doi":"10.1109/MSR59073.2023.00069","DOIUrl":"https://doi.org/10.1109/MSR59073.2023.00069","url":null,"abstract":"An embedded system is a system consisting of software code, controller hardware, and I/O (Input/Output) hardware that performs a specific task. Developing an embedded system presents several challenges. First, the development often involves configuring hardware that requires domain-specific knowledge. Second, the library for the hardware may have API usage patterns that must be followed. To overcome such challenges, we propose a framework called ArduinoProg towards the automatic generation of Arduino applications. ArduinoProg takes a natural language query as input and outputs the configuration and API usage pattern for the hardware described in the query. Motivated by our findings on the characteristics of real-world queries posted in the official Arduino forum, we formulate ArduinoProg as three components, i.e., Library Retriever, Configuration Classifier, and Pattern Generator. First, Library Retriever preprocesses the input query and retrieves a set of relevant libraries using either lexical matching or vector-based similarity. Second, given Library Retriever’s output, Configuration Classifier infers the hardware configuration by classifying the method definitions found in the library’s implementation files into a hardware configuration class. Third, Pattern Generator also takes Library Retriever’s output as input and leverages a sequence-to-sequence model to generate the API usage pattern. Having instantiated each component of ArduinoProg with various machine learning models, we have evaluated ArduinoProg on real-world queries. Library Retriever achieves a Precision@K range of 44.0%-97.1%; Configuration Classifier achieves an Area under the Receiver Operating Characteristics curve (AUC) of 0.79-0.95; Pattern Generator yields a Normalized Discounted Cumulative Gain (NDCG)@K of 0.45-0.73. Such results indicate that ArduinoProg can generate practical and useful hardware configurations and API usage patterns to guide developers in writing Arduino code.","PeriodicalId":317960,"journal":{"name":"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)","volume":"129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126879691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

MANDO-HGT: Heterogeneous Graph Transformers for Smart Contract Vulnerability Detection 面向智能合约漏洞检测的异构图转换器

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) Pub Date : 2023-05-01 DOI: 10.1109/MSR59073.2023.00052

Hoang H. Nguyen, Nhat-Minh Nguyen, Chunyao Xie, Zahra Ahmadi, Daniel Kudendo, Thanh-Nam Doan, Lingxiao Jiang

{"title":"MANDO-HGT: Heterogeneous Graph Transformers for Smart Contract Vulnerability Detection","authors":"Hoang H. Nguyen, Nhat-Minh Nguyen, Chunyao Xie, Zahra Ahmadi, Daniel Kudendo, Thanh-Nam Doan, Lingxiao Jiang","doi":"10.1109/MSR59073.2023.00052","DOIUrl":"https://doi.org/10.1109/MSR59073.2023.00052","url":null,"abstract":"Smart contracts in blockchains have been increasingly used for high-value business applications. It is essential to check smart contracts' reliability before and after deployment. Although various program analysis and deep learning techniques have been proposed to detect vulnerabilities in either Ethereum smart contract source code or bytecode, their detection accuracy and scalability are still limited. This paper presents a novel framework named MANDO-HGT for detecting smart contract vulnerabilities. Given Ethereum smart contracts, either in source code or bytecode form, and vulnerable or clean, MANDO-HGT custom-builds heterogeneous contract graphs (HCGs) to represent control-flow and/or function-call information of the code. It then adapts heterogeneous graph transformers (HGTs) with customized meta relations for graph nodes and edges to learn their embeddings and train classifiers for detecting various vulnerability types in the nodes and graphs of the contracts more accurately. We have collected more than 55K Ethereum smart contracts from various data sources and verified the labels for 423 buggy and 2,742 clean contracts to evaluate MANDO-HGT. Our empirical results show that MANDO-HGT can significantly improve the detection accuracy of other state-of-the-art vulnerability detection techniques that are based on either machine learning or conventional analysis techniques. The accuracy improvements in terms of F1-score range from 0.7% to more than 76% at either the coarse-grained contract level or the fine-grained line level for various vulnerability types in either source code or bytecode. Our method is general and can be retrained easily for different vulnerability types without the need for manually defined vulnerability patterns.","PeriodicalId":317960,"journal":{"name":"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)","volume":"239 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134192012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Message from the MSR 2023 Mining Challenge Co-Chairs 来自MSR 2023采矿挑战联合主席的信息

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) Pub Date : 2023-05-01 DOI: 10.1109/msr59073.2023.00008

引用次数: 0

An Empirical Study on the Performance of Individual Issue Label Prediction 个体问题标签预测效果的实证研究

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) Pub Date : 2023-05-01 DOI: 10.1109/MSR59073.2023.00041

Jueun Heo, Seonah Lee

引用次数: 0

UnGoML: Automated Classification of unsafe Usages in Go UnGoML:围棋中不安全用法的自动分类

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) Pub Date : 2023-05-01 DOI: 10.1109/MSR59073.2023.00050

A. Wickert, C. Damke, Lars Baumgärtner, E. Hüllermeier, M. Mezini

{"title":"UnGoML: Automated Classification of unsafe Usages in Go","authors":"A. Wickert, C. Damke, Lars Baumgärtner, E. Hüllermeier, M. Mezini","doi":"10.1109/MSR59073.2023.00050","DOIUrl":"https://doi.org/10.1109/MSR59073.2023.00050","url":null,"abstract":"The Go programming language offers strong protection from memory corruption. As an escape hatch of these protections, it provides the unsafe package. Previous studies identified that this unsafe package is frequently used in real-world code for several purposes, e.g., serialization or casting types. Due to the variety of these reasons, it may be possible to refactor specific usages to avoid potential vulnerabilities. However, the classification of unsafe usages is challenging and requires the context of the call and the program’s structure. In this paper, we present the first automated classifier for unsafe usages in Go, UnGoML, to identify what is done with the unsafe package and why it is used. For UnGoML, we built four custom deep learning classifiers trained on a manually labeled data set. We represent Go code as enriched control-flow graphs (CFGs) and solve the label prediction task with one single-vertex and three context-aware classifiers. All three context-aware classifiers achieve a top-1 accuracy of more than 86% for both dimensions, WHAT and WHY. Furthermore, in a set-valued conformal prediction setting, we achieve accuracies of more than 93% with mean label set sizes of 2 for both dimensions. Thus, UnGoML can be used to efficiently filter unsafe usages for use cases such as refactoring or a security audit. UnGoML: https://github.com/stg-tud/UnGoML Artifact: https://dx.doi.org/10.6084/m9.figshare.22293052","PeriodicalId":317960,"journal":{"name":"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116967423","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Dataset of Bot and Human Activities in GitHub GitHub中机器人和人类活动的数据集

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) Pub Date : 2023-05-01 DOI: 10.1109/MSR59073.2023.00070

Natarajan Chidambaram, Alexandre Decan, T. Mens

{"title":"A Dataset of Bot and Human Activities in GitHub","authors":"Natarajan Chidambaram, Alexandre Decan, T. Mens","doi":"10.1109/MSR59073.2023.00070","DOIUrl":"https://doi.org/10.1109/MSR59073.2023.00070","url":null,"abstract":"Software repositories hosted on GitHub frequently use development bots to automate repetitive, effort intensive and error-prone tasks. To understand and study how these bots are used, state-of-the-art bot identification tools have been developed to detect bots based on their comments in commits, issues and pull requests. Given that bots can be involved in many other activity types, there is a need to consider more activities that they are carrying out in the software repositories they are involved in. We therefore propose a curated dataset of such activities carried out by bots and humans involved in GitHub repositories. The dataset was constructed by identifying 24 high-level activity types that could be extracted from 15 lower-level event types that were queried from GitHub’s event stream API for all considered bots and humans. The proposed dataset contains around 834K activities performed by 385 bots and 616 humans involved in GitHub repositories, during an observation period ranging from 25 November 2022 to 9 March 2023. By analysing the activity patterns of bots and humans, this dataset could lead to better bot identification tools and empirical studies on how bots play a role in collaborative software development.","PeriodicalId":317960,"journal":{"name":"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125060388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

An Empirical Study to Investigate Collaboration Among Developers in Open Source Software (OSS) 开源软件开发人员协作的实证研究

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) Pub Date : 2023-05-01 DOI: 10.1109/MSR59073.2023.00054

Weijie Sun, Samuel Iwuchukwu, A. A. Bangash, Abram Hindle

{"title":"An Empirical Study to Investigate Collaboration Among Developers in Open Source Software (OSS)","authors":"Weijie Sun, Samuel Iwuchukwu, A. A. Bangash, Abram Hindle","doi":"10.1109/MSR59073.2023.00054","DOIUrl":"https://doi.org/10.1109/MSR59073.2023.00054","url":null,"abstract":"The value of teamwork is being recognized by project owners, resulting in an increased acknowledgement of collaboration among developers in software engineering. A good understanding of how developers work together could positively impact software development practices. In this paper, we investigate the collaboration habits of developers in project files by leveraging the World of Code (WoC) dataset and GitHub API. We first identify the collaboration level of developers within the project files, such as the source, test, documentation, and build files, using the Author Cross Entropy (ACE). From the results we find out that test files report the highest degree of collaboration among the developers, perhaps because collaboration is critical to ensure convergence of functionality tests. Furthermore, the source code files show the least degree of collaboration, perhaps because of code ownership and the complexity and difficulty in code modification. Secondly, given the widespread usage of the Python programming language, we investigate the Python code tokens that are more prone to change and collaboration. Our findings offer insights into the specific project files and Python code tokens that developers typically collaborate on in the open-source community. This information can be used by researchers and developers to enhance existing collaboration platforms and tools.","PeriodicalId":317960,"journal":{"name":"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133012103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

What Warnings Do Engineers Really Fix? The Compiler That Cried Wolf 工程师真正解决的警告是什么?叫狼来了的编译器

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) Pub Date : 2023-05-01 DOI: 10.1109/MSR59073.2023.00068

Gunnar Kudrjavets, Aditya Kumar, Ayushi Rastogi

引用次数: 0

Message from the MSR 2023 Industry Track Co-Chairs 来自MSR 2023行业跟踪联合主席的讲话

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) Pub Date : 2023-05-01 DOI: 10.1109/msr59073.2023.00007

引用次数: 0