{"title":"Thumb: A forensic automation framework leveraging MLLMs and OCR on Android device","authors":"Dingjie Shang, Amin Sakzad, Stuart W. Hall","doi":"10.1016/j.fsidi.2025.301949","DOIUrl":null,"url":null,"abstract":"<div><div>The forensic of Android devices is challenging due to automated thumbnail generation by applications and the operating system, complicating attribution to specific user actions. This paper presents the design, implementation, and evaluation of a forensic framework, Thumb, which performs real-time experiments on physical Android devices. Thumb integrates multimodal large language models (MLLM) and Optical Character Recognition (OCR) to capture on-screen information and simulate user interactions, while extracting data from internal storage to monitor changes in cached and thumbnail files. A proof-of-concept implementation demonstrates the framework's accuracy across various applications, highlighting its potential to simplify Android forensic analysis. However, current MLLM limitations and the framework's structure pose challenges in complex scenarios and detailed data analysis.</div></div>","PeriodicalId":48481,"journal":{"name":"Forensic Science International-Digital Investigation","volume":"54 ","pages":"Article 301949"},"PeriodicalIF":2.0000,"publicationDate":"2025-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Forensic Science International-Digital Investigation","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666281725000885","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The forensic of Android devices is challenging due to automated thumbnail generation by applications and the operating system, complicating attribution to specific user actions. This paper presents the design, implementation, and evaluation of a forensic framework, Thumb, which performs real-time experiments on physical Android devices. Thumb integrates multimodal large language models (MLLM) and Optical Character Recognition (OCR) to capture on-screen information and simulate user interactions, while extracting data from internal storage to monitor changes in cached and thumbnail files. A proof-of-concept implementation demonstrates the framework's accuracy across various applications, highlighting its potential to simplify Android forensic analysis. However, current MLLM limitations and the framework's structure pose challenges in complex scenarios and detailed data analysis.