Analyzing Hot Bugs in the Linux Kernel by Clustering Fixing Commit Messages

Trudy Instituta sistemnogo programmirovaniia RAN Pub Date : 2023-01-01 DOI:10.15514/ispras-2023-35(3)-16

Nikita Alexandrovich Starovoytov, Nikolay Andreevich Golovnev, Sergey Mikhailovich Staroletov

{"title":"Analyzing Hot Bugs in the Linux Kernel by Clustering Fixing Commit Messages","authors":"Nikita Alexandrovich Starovoytov, Nikolay Andreevich Golovnev, Sergey Mikhailovich Staroletov","doi":"10.15514/ispras-2023-35(3)-16","DOIUrl":null,"url":null,"abstract":"In system software environments, a vast amount of information circulates, making it crucial to utilize this information in order to enhance the operation of such systems. One such system is the Linux kernel, which not only boasts a completely open-source nature, but also provides a comprehensive history through its git repository. Here, every logical code change is accompanied by a message written by the developer in natural language. Within this expansive repository, our focus lies on error correction messages from fixing commits, as analyzing their text can help identify the most common types of errors. Building upon our previous works, this paper proposes the utilization of data analysis methods for this purpose. To achieve our objective, we explore various techniques for processing repository messages and employing automated methods to pinpoint the prevalent bugs within them. By calculating distances between vectorizations of bug fixing messages and grouping them into clusters, we can effectively categorize and isolate the most frequently occurring errors. Our approach is applied to multiple prominent parts within the Linux kernel, allowing for comprehensive results and insights into what is going on with bugs in different subsystems. As a result, we show a summary of bug fixes in such parts of the Linux kernel as kernel, sched, mm, net, irq, x86 and arm64.","PeriodicalId":33459,"journal":{"name":"Trudy Instituta sistemnogo programmirovaniia RAN","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Trudy Instituta sistemnogo programmirovaniia RAN","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15514/ispras-2023-35(3)-16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In system software environments, a vast amount of information circulates, making it crucial to utilize this information in order to enhance the operation of such systems. One such system is the Linux kernel, which not only boasts a completely open-source nature, but also provides a comprehensive history through its git repository. Here, every logical code change is accompanied by a message written by the developer in natural language. Within this expansive repository, our focus lies on error correction messages from fixing commits, as analyzing their text can help identify the most common types of errors. Building upon our previous works, this paper proposes the utilization of data analysis methods for this purpose. To achieve our objective, we explore various techniques for processing repository messages and employing automated methods to pinpoint the prevalent bugs within them. By calculating distances between vectorizations of bug fixing messages and grouping them into clusters, we can effectively categorize and isolate the most frequently occurring errors. Our approach is applied to multiple prominent parts within the Linux kernel, allowing for comprehensive results and insights into what is going on with bugs in different subsystems. As a result, we show a summary of bug fixes in such parts of the Linux kernel as kernel, sched, mm, net, irq, x86 and arm64.

查看原文本刊更多论文

通过集群修复提交消息来分析Linux内核中的热点bug

在系统软件环境中，有大量的信息在流通，因此利用这些信息来提高系统的运行是至关重要的。Linux内核就是这样一个系统，它不仅拥有完全开源的特性，而且还通过其git存储库提供了全面的历史记录。在这里，每个逻辑代码更改都伴随着由开发人员用自然语言编写的消息。在这个扩展的存储库中，我们的重点放在修复提交的错误更正消息上，因为分析它们的文本可以帮助识别最常见的错误类型。在我们以前工作的基础上，本文提出利用数据分析方法来实现这一目的。为了实现我们的目标，我们探索了处理存储库消息的各种技术，并使用自动化方法来查明其中的普遍错误。通过计算错误修复消息的向量化之间的距离并将它们分组到集群中，我们可以有效地对最频繁发生的错误进行分类和隔离。我们的方法应用于Linux内核中的多个突出部分，从而获得全面的结果，并深入了解不同子系统中的错误情况。因此，我们总结了Linux内核中kernel、sched、mm、net、irq、x86和arm64等部分的bug修复。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Trudy Instituta sistemnogo programmirovaniia RAN

自引率

0.00%

发文量

审稿时长

4 weeks