Ruijie Cai , Zhaowei Zhang , Xiaoya Zhu , Yongguang Zhang , Xiaokang Yin , Shengli Liu
{"title":"编码风格很重要:在单片固件中可扩展和有效地识别内存管理功能","authors":"Ruijie Cai , Zhaowei Zhang , Xiaoya Zhu , Yongguang Zhang , Xiaokang Yin , Shengli Liu","doi":"10.1016/j.jss.2025.112472","DOIUrl":null,"url":null,"abstract":"<div><div>The occurrence of memory corruption vulnerabilities is often closely associated with improper use or implementation of memory management functions. Monolithic firmware typically uses custom memory management functions and lacks information such as function names, which poses significant challenges for vulnerability detection. Therefore, it is crucial for the identification of memory management functions. Existing methods are rendered ineffective due to the absence of metadata, and the diversity in implementation across different firmware images further complicates the identification process. To address the above problem, we introduce MemIdent, a new method leveraging the coding style inherent in identifying memory management functions. MemIdent is engineered to be scalable and efficient, capable of discerning consistent call features across various compiler optimizations and instruction architectures. It leverages three key observations derived from an in-depth analysis of monolithic firmware: the regularity in memory allocation calls, the co-occurrence of allocation and deallocation functions, and the statistical prominence of these features. MemIdent extracts features of call site such as function parameter types and return values using data flow analysis, which are then analyzed through statistical patterns to identify memory allocation and deallocation functions. We evaluate MemIdent’s performance using 44 firmware images covering 6 vendors (i.e., Tenda, Cisco, SonicWall, D-Link, TP-Link, and Comtech) across 3 architectures (MIPS, ARM, and PPC). The experimental results demonstrate that MemIdent has higher accuracy, greater efficiency, and better generality than state-of-the-art (SOTA) approaches, including Heapster, IDA Lumina, and MLM, which offers a significant advancement in memory management function identification methods for monolithic firmware.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"228 ","pages":"Article 112472"},"PeriodicalIF":3.7000,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Coding style matters: Scalable and efficient identification of memory management functions in monolithic firmware\",\"authors\":\"Ruijie Cai , Zhaowei Zhang , Xiaoya Zhu , Yongguang Zhang , Xiaokang Yin , Shengli Liu\",\"doi\":\"10.1016/j.jss.2025.112472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The occurrence of memory corruption vulnerabilities is often closely associated with improper use or implementation of memory management functions. Monolithic firmware typically uses custom memory management functions and lacks information such as function names, which poses significant challenges for vulnerability detection. Therefore, it is crucial for the identification of memory management functions. Existing methods are rendered ineffective due to the absence of metadata, and the diversity in implementation across different firmware images further complicates the identification process. To address the above problem, we introduce MemIdent, a new method leveraging the coding style inherent in identifying memory management functions. MemIdent is engineered to be scalable and efficient, capable of discerning consistent call features across various compiler optimizations and instruction architectures. It leverages three key observations derived from an in-depth analysis of monolithic firmware: the regularity in memory allocation calls, the co-occurrence of allocation and deallocation functions, and the statistical prominence of these features. MemIdent extracts features of call site such as function parameter types and return values using data flow analysis, which are then analyzed through statistical patterns to identify memory allocation and deallocation functions. We evaluate MemIdent’s performance using 44 firmware images covering 6 vendors (i.e., Tenda, Cisco, SonicWall, D-Link, TP-Link, and Comtech) across 3 architectures (MIPS, ARM, and PPC). The experimental results demonstrate that MemIdent has higher accuracy, greater efficiency, and better generality than state-of-the-art (SOTA) approaches, including Heapster, IDA Lumina, and MLM, which offers a significant advancement in memory management function identification methods for monolithic firmware.</div></div>\",\"PeriodicalId\":51099,\"journal\":{\"name\":\"Journal of Systems and Software\",\"volume\":\"228 \",\"pages\":\"Article 112472\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems and Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0164121225001402\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225001402","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
Coding style matters: Scalable and efficient identification of memory management functions in monolithic firmware
The occurrence of memory corruption vulnerabilities is often closely associated with improper use or implementation of memory management functions. Monolithic firmware typically uses custom memory management functions and lacks information such as function names, which poses significant challenges for vulnerability detection. Therefore, it is crucial for the identification of memory management functions. Existing methods are rendered ineffective due to the absence of metadata, and the diversity in implementation across different firmware images further complicates the identification process. To address the above problem, we introduce MemIdent, a new method leveraging the coding style inherent in identifying memory management functions. MemIdent is engineered to be scalable and efficient, capable of discerning consistent call features across various compiler optimizations and instruction architectures. It leverages three key observations derived from an in-depth analysis of monolithic firmware: the regularity in memory allocation calls, the co-occurrence of allocation and deallocation functions, and the statistical prominence of these features. MemIdent extracts features of call site such as function parameter types and return values using data flow analysis, which are then analyzed through statistical patterns to identify memory allocation and deallocation functions. We evaluate MemIdent’s performance using 44 firmware images covering 6 vendors (i.e., Tenda, Cisco, SonicWall, D-Link, TP-Link, and Comtech) across 3 architectures (MIPS, ARM, and PPC). The experimental results demonstrate that MemIdent has higher accuracy, greater efficiency, and better generality than state-of-the-art (SOTA) approaches, including Heapster, IDA Lumina, and MLM, which offers a significant advancement in memory management function identification methods for monolithic firmware.
期刊介绍:
The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to:
•Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution
•Agile, model-driven, service-oriented, open source and global software development
•Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems
•Human factors and management concerns of software development
•Data management and big data issues of software systems
•Metrics and evaluation, data mining of software development resources
•Business and economic aspects of software development processes
The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.