Hengjian Gao , Jianfeng Chen , Mohan He , Jingqi Wang , Shukun Wu , Hua Zhong , Bo Jin , Yuan Zhou , Lei Fan
{"title":"UTTBench:对水下热湍流环境中文本识别的大型多模态模型进行基准测试","authors":"Hengjian Gao , Jianfeng Chen , Mohan He , Jingqi Wang , Shukun Wu , Hua Zhong , Bo Jin , Yuan Zhou , Lei Fan","doi":"10.1016/j.displa.2025.103181","DOIUrl":null,"url":null,"abstract":"<div><div>The rapid advancement of Large Multimodal Models (LMMs) has significantly expanded their potential for complex real-world applications. However, their effectiveness in extreme physical conditions, such as underwater thermal turbulence, remains understudied due to the lack of standardized evaluation benchmarks. To address this gap, we introduce the Underwater Thermal Turbulence Benchmark (UTTBench), the first comprehensive benchmark designed to evaluate text recognition in underwater thermal turbulent environments. We conduct a detailed evaluation of four popular LMMs, including LLaVA-Onevision, Qwen2.5-VL, InternVL 2.5, and DeepSeek-VL2, on this benchmark. Our experiments reveal that even advanced LMMs face substantial challenges in accurately recognizing text under thermal turbulence. This study underscores the critical need for further research to enhance the robustness and reliability of LMMs in such challenging environments.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"91 ","pages":"Article 103181"},"PeriodicalIF":3.4000,"publicationDate":"2025-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UTTBench: Benchmarking Large Multimodal Models for text recognition in underwater thermal turbulent environments\",\"authors\":\"Hengjian Gao , Jianfeng Chen , Mohan He , Jingqi Wang , Shukun Wu , Hua Zhong , Bo Jin , Yuan Zhou , Lei Fan\",\"doi\":\"10.1016/j.displa.2025.103181\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The rapid advancement of Large Multimodal Models (LMMs) has significantly expanded their potential for complex real-world applications. However, their effectiveness in extreme physical conditions, such as underwater thermal turbulence, remains understudied due to the lack of standardized evaluation benchmarks. To address this gap, we introduce the Underwater Thermal Turbulence Benchmark (UTTBench), the first comprehensive benchmark designed to evaluate text recognition in underwater thermal turbulent environments. We conduct a detailed evaluation of four popular LMMs, including LLaVA-Onevision, Qwen2.5-VL, InternVL 2.5, and DeepSeek-VL2, on this benchmark. Our experiments reveal that even advanced LMMs face substantial challenges in accurately recognizing text under thermal turbulence. This study underscores the critical need for further research to enhance the robustness and reliability of LMMs in such challenging environments.</div></div>\",\"PeriodicalId\":50570,\"journal\":{\"name\":\"Displays\",\"volume\":\"91 \",\"pages\":\"Article 103181\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2025-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Displays\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0141938225002185\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Displays","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141938225002185","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
UTTBench: Benchmarking Large Multimodal Models for text recognition in underwater thermal turbulent environments
The rapid advancement of Large Multimodal Models (LMMs) has significantly expanded their potential for complex real-world applications. However, their effectiveness in extreme physical conditions, such as underwater thermal turbulence, remains understudied due to the lack of standardized evaluation benchmarks. To address this gap, we introduce the Underwater Thermal Turbulence Benchmark (UTTBench), the first comprehensive benchmark designed to evaluate text recognition in underwater thermal turbulent environments. We conduct a detailed evaluation of four popular LMMs, including LLaVA-Onevision, Qwen2.5-VL, InternVL 2.5, and DeepSeek-VL2, on this benchmark. Our experiments reveal that even advanced LMMs face substantial challenges in accurately recognizing text under thermal turbulence. This study underscores the critical need for further research to enhance the robustness and reliability of LMMs in such challenging environments.
期刊介绍:
Displays is the international journal covering the research and development of display technology, its effective presentation and perception of information, and applications and systems including display-human interface.
Technical papers on practical developments in Displays technology provide an effective channel to promote greater understanding and cross-fertilization across the diverse disciplines of the Displays community. Original research papers solving ergonomics issues at the display-human interface advance effective presentation of information. Tutorial papers covering fundamentals intended for display technologies and human factor engineers new to the field will also occasionally featured.