UnGoML: Automated Classification of unsafe Usages in Go

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR) Pub Date : 2023-05-01 DOI:10.1109/MSR59073.2023.00050

A. Wickert, C. Damke, Lars Baumgärtner, E. Hüllermeier, M. Mezini

{"title":"UnGoML: Automated Classification of unsafe Usages in Go","authors":"A. Wickert, C. Damke, Lars Baumgärtner, E. Hüllermeier, M. Mezini","doi":"10.1109/MSR59073.2023.00050","DOIUrl":null,"url":null,"abstract":"The Go programming language offers strong protection from memory corruption. As an escape hatch of these protections, it provides the unsafe package. Previous studies identified that this unsafe package is frequently used in real-world code for several purposes, e.g., serialization or casting types. Due to the variety of these reasons, it may be possible to refactor specific usages to avoid potential vulnerabilities. However, the classification of unsafe usages is challenging and requires the context of the call and the program’s structure. In this paper, we present the first automated classifier for unsafe usages in Go, UnGoML, to identify what is done with the unsafe package and why it is used. For UnGoML, we built four custom deep learning classifiers trained on a manually labeled data set. We represent Go code as enriched control-flow graphs (CFGs) and solve the label prediction task with one single-vertex and three context-aware classifiers. All three context-aware classifiers achieve a top-1 accuracy of more than 86% for both dimensions, WHAT and WHY. Furthermore, in a set-valued conformal prediction setting, we achieve accuracies of more than 93% with mean label set sizes of 2 for both dimensions. Thus, UnGoML can be used to efficiently filter unsafe usages for use cases such as refactoring or a security audit. UnGoML: https://github.com/stg-tud/UnGoML Artifact: https://dx.doi.org/10.6084/m9.figshare.22293052","PeriodicalId":317960,"journal":{"name":"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSR59073.2023.00050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The Go programming language offers strong protection from memory corruption. As an escape hatch of these protections, it provides the unsafe package. Previous studies identified that this unsafe package is frequently used in real-world code for several purposes, e.g., serialization or casting types. Due to the variety of these reasons, it may be possible to refactor specific usages to avoid potential vulnerabilities. However, the classification of unsafe usages is challenging and requires the context of the call and the program’s structure. In this paper, we present the first automated classifier for unsafe usages in Go, UnGoML, to identify what is done with the unsafe package and why it is used. For UnGoML, we built four custom deep learning classifiers trained on a manually labeled data set. We represent Go code as enriched control-flow graphs (CFGs) and solve the label prediction task with one single-vertex and three context-aware classifiers. All three context-aware classifiers achieve a top-1 accuracy of more than 86% for both dimensions, WHAT and WHY. Furthermore, in a set-valued conformal prediction setting, we achieve accuracies of more than 93% with mean label set sizes of 2 for both dimensions. Thus, UnGoML can be used to efficiently filter unsafe usages for use cases such as refactoring or a security audit. UnGoML: https://github.com/stg-tud/UnGoML Artifact: https://dx.doi.org/10.6084/m9.figshare.22293052

查看原文本刊更多论文

UnGoML:围棋中不安全用法的自动分类

Go编程语言对内存损坏提供了强大的保护。作为这些保护的逃生舱口，它提供了不安全的包。以前的研究表明，这个不安全的包经常用于现实世界的代码中，有几个目的，例如，序列化或类型转换。由于这些原因的多样性，重构特定的用法以避免潜在的漏洞是可能的。然而，不安全用法的分类是具有挑战性的，并且需要调用的上下文和程序的结构。在本文中，我们提出了第一个用于Go语言中不安全用法的自动分类器，UnGoML，以确定使用不安全包做了什么以及为什么使用它。对于ungml，我们构建了四个自定义深度学习分类器，这些分类器是在手动标记的数据集上训练的。我们将Go代码表示为丰富的控制流图(CFGs)，并使用一个单顶点和三个上下文感知分类器解决标签预测任务。所有三个上下文感知分类器在“什么”和“为什么”两个维度上都达到了86%以上的最高准确率。此外，在集值共形预测设置中，我们在两个维度的平均标签集大小为2的情况下实现了超过93%的准确率。因此，ungml可以用于有效地过滤不安全的使用，例如重构或安全审计。UnGoML: https://github.com/stg-tud/UnGoML Artifact: https://dx.doi.org/10.6084/m9.figshare.22293052

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)

自引率

0.00%

发文量