Bo Wang;Chong Chen;Junjie Chen;Bowen Xu;Chen Ye;Youfang Lin;Guoliang Dong;Jun Sun
{"title":"c++编译器中oop相关bug的综合研究","authors":"Bo Wang;Chong Chen;Junjie Chen;Bowen Xu;Chen Ye;Youfang Lin;Guoliang Dong;Jun Sun","doi":"10.1109/TSE.2025.3566490","DOIUrl":null,"url":null,"abstract":"Modern C++, a programming language characterized by its extensive use of object-oriented programming (OOP) features, is widely used for system programming. However, C++ compilers often struggle to correctly handle these sophisticated OOP features, resulting in numerous high-profile compiler bugs that can lead to crashes or miscompilation. Despite the significance of OOP-related bugs, existing studies largely overlook OOP features, hindering their ability to discover such bugs. To assist both compiler fuzzer designers and compiler developers, we conduct a comprehensive study of the compiler bugs caused by incorrectly handling C++ OOP-related features. First, we systematically extract 788 OOP-related C++ compiler bugs from GCC and LLVM. Second, derived from the core concepts of OOP and C++, we manually identified a two-level taxonomy of the OOP-related features leading to compiler bugs, which consists of 6 primary categories (e.g., <italic>Abstraction & Encapsulation</i>, <italic>Inheritance</i>, and <italic>Runtime Polymorphism</i>), along with 17 secondary categories (e.g., <italic>Constructors & Destructors</i> and <italic>Multiple Inheritance</i>). Third, we systematically analyze the root causes, symptoms, fixes, options, and C++ standard versions of these bugs. Our analysis yields 13 key findings, highlighting that features related to the construction and destruction of objects lead to the highest number of bugs, crashes are the most frequent symptom, and while the average time from bug introduction to discovery is 1856 days, fixing the bug once discovered takes only 174 days on average. Additionally, more than half of the bugs can be triggered without any compiler options. These findings offer valuable insights not only for developing new compiler testing approaches but also for improving language design and compiler engineering. Inspired by these findings, we developed a proof-of-concept compiler fuzzer OOPFuzz, specifically targeting OOP-related bugs in C++ compilers. We applied it against the newest release versions of GCC and LLVM. In about 3 hours, it detected 9 bugs, of which 3 have been confirmed by the developers, including a bug of LLVM that had persisted for 13 years. The results indicate our taxonomy and analysis provide valuable insights for future research in compiler testing.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 6","pages":"1762-1782"},"PeriodicalIF":6.5000,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Comprehensive Study of OOP-Related Bugs in C++ Compilers\",\"authors\":\"Bo Wang;Chong Chen;Junjie Chen;Bowen Xu;Chen Ye;Youfang Lin;Guoliang Dong;Jun Sun\",\"doi\":\"10.1109/TSE.2025.3566490\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern C++, a programming language characterized by its extensive use of object-oriented programming (OOP) features, is widely used for system programming. However, C++ compilers often struggle to correctly handle these sophisticated OOP features, resulting in numerous high-profile compiler bugs that can lead to crashes or miscompilation. Despite the significance of OOP-related bugs, existing studies largely overlook OOP features, hindering their ability to discover such bugs. To assist both compiler fuzzer designers and compiler developers, we conduct a comprehensive study of the compiler bugs caused by incorrectly handling C++ OOP-related features. First, we systematically extract 788 OOP-related C++ compiler bugs from GCC and LLVM. Second, derived from the core concepts of OOP and C++, we manually identified a two-level taxonomy of the OOP-related features leading to compiler bugs, which consists of 6 primary categories (e.g., <italic>Abstraction & Encapsulation</i>, <italic>Inheritance</i>, and <italic>Runtime Polymorphism</i>), along with 17 secondary categories (e.g., <italic>Constructors & Destructors</i> and <italic>Multiple Inheritance</i>). Third, we systematically analyze the root causes, symptoms, fixes, options, and C++ standard versions of these bugs. Our analysis yields 13 key findings, highlighting that features related to the construction and destruction of objects lead to the highest number of bugs, crashes are the most frequent symptom, and while the average time from bug introduction to discovery is 1856 days, fixing the bug once discovered takes only 174 days on average. Additionally, more than half of the bugs can be triggered without any compiler options. These findings offer valuable insights not only for developing new compiler testing approaches but also for improving language design and compiler engineering. Inspired by these findings, we developed a proof-of-concept compiler fuzzer OOPFuzz, specifically targeting OOP-related bugs in C++ compilers. We applied it against the newest release versions of GCC and LLVM. In about 3 hours, it detected 9 bugs, of which 3 have been confirmed by the developers, including a bug of LLVM that had persisted for 13 years. The results indicate our taxonomy and analysis provide valuable insights for future research in compiler testing.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"51 6\",\"pages\":\"1762-1782\"},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2025-03-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10985855/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10985855/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
A Comprehensive Study of OOP-Related Bugs in C++ Compilers
Modern C++, a programming language characterized by its extensive use of object-oriented programming (OOP) features, is widely used for system programming. However, C++ compilers often struggle to correctly handle these sophisticated OOP features, resulting in numerous high-profile compiler bugs that can lead to crashes or miscompilation. Despite the significance of OOP-related bugs, existing studies largely overlook OOP features, hindering their ability to discover such bugs. To assist both compiler fuzzer designers and compiler developers, we conduct a comprehensive study of the compiler bugs caused by incorrectly handling C++ OOP-related features. First, we systematically extract 788 OOP-related C++ compiler bugs from GCC and LLVM. Second, derived from the core concepts of OOP and C++, we manually identified a two-level taxonomy of the OOP-related features leading to compiler bugs, which consists of 6 primary categories (e.g., Abstraction & Encapsulation, Inheritance, and Runtime Polymorphism), along with 17 secondary categories (e.g., Constructors & Destructors and Multiple Inheritance). Third, we systematically analyze the root causes, symptoms, fixes, options, and C++ standard versions of these bugs. Our analysis yields 13 key findings, highlighting that features related to the construction and destruction of objects lead to the highest number of bugs, crashes are the most frequent symptom, and while the average time from bug introduction to discovery is 1856 days, fixing the bug once discovered takes only 174 days on average. Additionally, more than half of the bugs can be triggered without any compiler options. These findings offer valuable insights not only for developing new compiler testing approaches but also for improving language design and compiler engineering. Inspired by these findings, we developed a proof-of-concept compiler fuzzer OOPFuzz, specifically targeting OOP-related bugs in C++ compilers. We applied it against the newest release versions of GCC and LLVM. In about 3 hours, it detected 9 bugs, of which 3 have been confirmed by the developers, including a bug of LLVM that had persisted for 13 years. The results indicate our taxonomy and analysis provide valuable insights for future research in compiler testing.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.