Multitasking vision language models for vehicle plate recognition with VehiclePaliGemma.

IF 3.9 2区综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES

Scientific Reports Pub Date : 2025-07-18 DOI:10.1038/s41598-025-10774-9

Nouar AlDahoul, Myles Joshua Toledo Tan, Raghava Reddy Tera, Hezerul Abdul Karim, Chee How Lim, Manish Kumar Mishra, Yasir Zaki

{"title":"Multitasking vision language models for vehicle plate recognition with VehiclePaliGemma.","authors":"Nouar AlDahoul, Myles Joshua Toledo Tan, Raghava Reddy Tera, Hezerul Abdul Karim, Chee How Lim, Manish Kumar Mishra, Yasir Zaki","doi":"10.1038/s41598-025-10774-9","DOIUrl":null,"url":null,"abstract":"<p><p>License Plate Recognition (LPR) automates vehicle identification using cameras and computer vision. It compares captured plates against databases to detect stolen vehicles, uninsured drivers, and crime suspects. Traditionally reliant on Optical Character Recognition (OCR), LPR faces challenges like noise, blurring, weather effects, and closely spaced characters, complicating accurate recognition. Existing LPR methods still require significant improvement, especially for distorted images. To fill this gap, we propose utilizing visual language models (VLMs) such as OpenAI GPT-4o (Generative Pre-trained Transformer 4 Omni), Google Gemini 1.5, Google PaliGemma (Pathways Language and Image model + Gemma model), Meta Llama (Large Language Model Meta AI) 3.2, Anthropic Claude 3.5 Sonnet, LLaVA (Large Language and Vision Assistant), NVIDIA VILA (Visual Language), and moondream2 to recognize such unclear plates with close characters. This paper evaluates the VLM's capability to address the aforementioned problems. Additionally, we introduce \"VehiclePaliGemma\", a fine-tuned Open-sourced PaliGemma VLM designed to recognize plates under challenging conditions. We compared our proposed VehiclePaliGemma with state-of-the-art methods and other VLMs using a dataset of Malaysian license plates collected under complex conditions. The results indicate that VehiclePaliGemma achieved superior performance with an accuracy of 87.6%. Moreover, it is able to predict the car's plate at a speed of 7 frames per second using A100-80GB GPU. Finally, we explored the multitasking capability of VehiclePaliGemma model to accurately identify plates containing multiple cars of various models and colors, with plates positioned and oriented in different directions.</p>","PeriodicalId":21811,"journal":{"name":"Scientific Reports","volume":"15 1","pages":"26189"},"PeriodicalIF":3.9000,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12274586/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Reports","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1038/s41598-025-10774-9","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

License Plate Recognition (LPR) automates vehicle identification using cameras and computer vision. It compares captured plates against databases to detect stolen vehicles, uninsured drivers, and crime suspects. Traditionally reliant on Optical Character Recognition (OCR), LPR faces challenges like noise, blurring, weather effects, and closely spaced characters, complicating accurate recognition. Existing LPR methods still require significant improvement, especially for distorted images. To fill this gap, we propose utilizing visual language models (VLMs) such as OpenAI GPT-4o (Generative Pre-trained Transformer 4 Omni), Google Gemini 1.5, Google PaliGemma (Pathways Language and Image model + Gemma model), Meta Llama (Large Language Model Meta AI) 3.2, Anthropic Claude 3.5 Sonnet, LLaVA (Large Language and Vision Assistant), NVIDIA VILA (Visual Language), and moondream2 to recognize such unclear plates with close characters. This paper evaluates the VLM's capability to address the aforementioned problems. Additionally, we introduce "VehiclePaliGemma", a fine-tuned Open-sourced PaliGemma VLM designed to recognize plates under challenging conditions. We compared our proposed VehiclePaliGemma with state-of-the-art methods and other VLMs using a dataset of Malaysian license plates collected under complex conditions. The results indicate that VehiclePaliGemma achieved superior performance with an accuracy of 87.6%. Moreover, it is able to predict the car's plate at a speed of 7 frames per second using A100-80GB GPU. Finally, we explored the multitasking capability of VehiclePaliGemma model to accurately identify plates containing multiple cars of various models and colors, with plates positioned and oriented in different directions.

Abstract Image

查看原文本刊更多论文

基于车辆paligemma的车牌多任务视觉语言模型。

车牌识别（LPR）利用摄像头和计算机视觉自动识别车辆。它将捕获的车牌与数据库进行比较，以检测被盗车辆、未投保的司机和犯罪嫌疑人。传统上依赖于光学字符识别（OCR）， LPR面临着诸如噪声、模糊、天气影响和紧密间隔字符等挑战，使准确识别复杂化。现有的LPR方法仍然需要很大的改进，特别是对于扭曲的图像。为了填补这一空白，我们建议利用OpenAI gpt - 40（生成式预训练的Transformer 4 Omni）、谷歌Gemini 1.5、谷歌PaliGemma （Pathways语言和图像模型+ Gemma模型）、Meta Llama（大型语言模型Meta AI） 3.2、Anthropic Claude 3.5 Sonnet、LLaVA（大型语言和视觉助理）、NVIDIA VILA（视觉语言）和moondream2等视觉语言模型（VLMs）来识别这些字符接近的模糊板。本文评估了VLM解决上述问题的能力。此外，我们还介绍了“VehiclePaliGemma”，这是一个经过微调的开源PaliGemma VLM，旨在识别具有挑战性的条件下的车牌。我们使用在复杂条件下收集的马来西亚车牌数据集，将我们提出的车辆paligemma与最先进的方法和其他VLMs进行了比较。结果表明，车辆paligemma的准确率达到了87.6%。此外，它还能够使用A100-80GB的GPU以每秒7帧的速度预测汽车的车牌。最后，我们探索了VehiclePaliGemma模型的多任务处理能力，以准确识别包含多辆不同型号和颜色的汽车的车牌，车牌的位置和方向不同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Scientific Reports Natural Science Disciplines-

CiteScore

7.50

自引率

4.30%

发文量

19567

审稿时长

3.9 months

期刊介绍： We publish original research from all areas of the natural sciences, psychology, medicine and engineering. You can learn more about what we publish by browsing our specific scientific subject areas below or explore Scientific Reports by browsing all articles and collections. Scientific Reports has a 2-year impact factor: 4.380 (2021), and is the 6th most-cited journal in the world, with more than 540,000 citations in 2020 (Clarivate Analytics, 2021). •Engineering Engineering covers all aspects of engineering, technology, and applied science. It plays a crucial role in the development of technologies to address some of the world''s biggest challenges, helping to save lives and improve the way we live. •Physical sciences Physical sciences are those academic disciplines that aim to uncover the underlying laws of nature — often written in the language of mathematics. It is a collective term for areas of study including astronomy, chemistry, materials science and physics. •Earth and environmental sciences Earth and environmental sciences cover all aspects of Earth and planetary science and broadly encompass solid Earth processes, surface and atmospheric dynamics, Earth system history, climate and climate change, marine and freshwater systems, and ecology. It also considers the interactions between humans and these systems. •Biological sciences Biological sciences encompass all the divisions of natural sciences examining various aspects of vital processes. The concept includes anatomy, physiology, cell biology, biochemistry and biophysics, and covers all organisms from microorganisms, animals to plants. •Health sciences The health sciences study health, disease and healthcare. This field of study aims to develop knowledge, interventions and technology for use in healthcare to improve the treatment of patients.