{"title":"MoCo: Fuzzing Deep Learning Libraries via Assembling Code","authors":"Pin Ji;Yang Feng;Duo Wu;Lingyue Yan;Penglin Chen;Jia Liu;Zhihong Zhao","doi":"10.1109/TSE.2024.3509975","DOIUrl":null,"url":null,"abstract":"The rapidly developing Deep Learning (DL) techniques have been applied in software systems of various types. However, they can also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors of DL systems. Previous research on fuzzing DL libraries still has limitations in generating tests corresponding to crucial testing scenarios and constructing test oracles. In this paper, we propose <monospace>MoCo</monospace>, a novel fuzzing testing method for DL libraries via assembling code. The seed tests used by <monospace>MoCo</monospace> are code files that implement DL models, covering both model construction and training in the most common real-world application scenarios for DL libraries. <monospace>MoCo</monospace> first disassembles the seed code files to extract templates and code blocks, then applies code block mutation operators (e.g., API replacement, random generation, and boundary checking) to generate new code blocks that fit the template. To ensure the correctness of the code block mutation, we employ the Large Language Model to parse the official documents of DL libraries for information about the parameters and the constraints between them. By inserting context-appropriate code blocks into the template, <monospace>MoCo</monospace> can generate a tree of code files with intergenerational relations. According to the derivation relations in this tree, we construct the test oracle based on the execution state consistency and the calculation result consistency. Since the granularity of code assembly is controlled rather than randomly divergent, we can quickly pinpoint the lines of code where the bugs are located and the corresponding triggering conditions. We conduct a comprehensive experiment to evaluate the efficiency and effectiveness of <monospace>MoCo</monospace> using three widely-used DL libraries (i.e., TensorFlow, PyTorch, and Jittor). During the experiments, <monospace>MoCo</monospace> detects 77 new bugs of four types in three DL libraries, where 55 bugs have been confirmed, and 39 bugs have been fixed by developers. The experimental results demonstrate that <monospace>MoCo</monospace> can generate high-quality tests that cover crucial testing scenarios and detect different types of bugs, which helps developers improve the reliability of DL libraries.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 2","pages":"371-388"},"PeriodicalIF":6.5000,"publicationDate":"2024-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10772249/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
The rapidly developing Deep Learning (DL) techniques have been applied in software systems of various types. However, they can also pose new safety threats with potentially serious consequences, especially in safety-critical domains. DL libraries serve as the underlying foundation for DL systems, and bugs in them can have unpredictable impacts that directly affect the behaviors of DL systems. Previous research on fuzzing DL libraries still has limitations in generating tests corresponding to crucial testing scenarios and constructing test oracles. In this paper, we propose MoCo, a novel fuzzing testing method for DL libraries via assembling code. The seed tests used by MoCo are code files that implement DL models, covering both model construction and training in the most common real-world application scenarios for DL libraries. MoCo first disassembles the seed code files to extract templates and code blocks, then applies code block mutation operators (e.g., API replacement, random generation, and boundary checking) to generate new code blocks that fit the template. To ensure the correctness of the code block mutation, we employ the Large Language Model to parse the official documents of DL libraries for information about the parameters and the constraints between them. By inserting context-appropriate code blocks into the template, MoCo can generate a tree of code files with intergenerational relations. According to the derivation relations in this tree, we construct the test oracle based on the execution state consistency and the calculation result consistency. Since the granularity of code assembly is controlled rather than randomly divergent, we can quickly pinpoint the lines of code where the bugs are located and the corresponding triggering conditions. We conduct a comprehensive experiment to evaluate the efficiency and effectiveness of MoCo using three widely-used DL libraries (i.e., TensorFlow, PyTorch, and Jittor). During the experiments, MoCo detects 77 new bugs of four types in three DL libraries, where 55 bugs have been confirmed, and 39 bugs have been fixed by developers. The experimental results demonstrate that MoCo can generate high-quality tests that cover crucial testing scenarios and detect different types of bugs, which helps developers improve the reliability of DL libraries.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.