FaceXBench: Evaluating Multimodal LLMs on Face Understanding

IF 5

IEEE transactions on biometrics, behavior, and identity science Pub Date : 2026-03-01 Epub Date: 2026-01-19 DOI:10.1109/TBIOM.2026.3655668

Kartik Narayan;V. S. Vibashan;Vishal M. Patel

{"title":"FaceXBench: Evaluating Multimodal LLMs on Face Understanding","authors":"Kartik Narayan;V. S. Vibashan;Vishal M. Patel","doi":"10.1109/TBIOM.2026.3655668","DOIUrl":null,"url":null,"abstract":"Multimodal Large Language Models (MLLMs) demonstrate impressive problem-solving abilities across a wide range of tasks and domains. However, their capacity for face understanding has not been systematically studied. To address this gap, we introduce FaceXBench, a comprehensive benchmark designed to evaluate MLLMs on complex face understanding tasks. FaceXBench includes 5,000 multimodal multiple-choice questions derived from 25 public datasets and a newly created dataset, FaceXAPI. These questions cover 14 tasks across 6 broad categories, assessing MLLMs’ face understanding abilities in bias and fairness, face authentication, recognition, analysis, localization and tool retrieval. Using FaceXBench, we conduct an extensive evaluation of 26 open-source MLLMs alongside 2 proprietary models, revealing the unique challenges in complex face understanding tasks. We analyze the models across three evaluation settings: zero-shot, in-context task description, and chain-of-thought prompting. Our detailed analysis reveals that current MLLMs, including advanced models like GPT-4o, and GeminiPro 1.5, show significant room for improvement. We believe FaceXBench will be a crucial resource for developing MLLMs equipped to perform sophisticated face understanding.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"8 3","pages":"354-364"},"PeriodicalIF":5.0000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11358941/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/19 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Multimodal Large Language Models (MLLMs) demonstrate impressive problem-solving abilities across a wide range of tasks and domains. However, their capacity for face understanding has not been systematically studied. To address this gap, we introduce FaceXBench, a comprehensive benchmark designed to evaluate MLLMs on complex face understanding tasks. FaceXBench includes 5,000 multimodal multiple-choice questions derived from 25 public datasets and a newly created dataset, FaceXAPI. These questions cover 14 tasks across 6 broad categories, assessing MLLMs’ face understanding abilities in bias and fairness, face authentication, recognition, analysis, localization and tool retrieval. Using FaceXBench, we conduct an extensive evaluation of 26 open-source MLLMs alongside 2 proprietary models, revealing the unique challenges in complex face understanding tasks. We analyze the models across three evaluation settings: zero-shot, in-context task description, and chain-of-thought prompting. Our detailed analysis reveals that current MLLMs, including advanced models like GPT-4o, and GeminiPro 1.5, show significant room for improvement. We believe FaceXBench will be a crucial resource for developing MLLMs equipped to perform sophisticated face understanding.

查看原文本刊更多论文

FaceXBench：评估面部理解的多模态法学硕士

多模态大型语言模型（mllm）在广泛的任务和领域中展示了令人印象深刻的问题解决能力。然而，他们的面部理解能力尚未得到系统的研究。为了解决这一差距，我们引入了FaceXBench，这是一个全面的基准，旨在评估mllm在复杂面部理解任务上的表现。FaceXBench包括5000个多模态选择题，来自25个公共数据集和一个新创建的数据集FaceXAPI。这些问题涵盖6大类14个任务，评估mlm在偏见和公平性、人脸认证、识别、分析、定位和工具检索方面的人脸理解能力。使用FaceXBench，我们对26个开源mlm和2个专有模型进行了广泛的评估，揭示了复杂面部理解任务中的独特挑战。我们通过三种评估设置来分析模型：零射击、上下文任务描述和思维链提示。我们的详细分析表明，目前的mlm，包括gpt - 40和GeminiPro 1.5等先进型号，都有很大的改进空间。我们相信FaceXBench将成为开发能够执行复杂面部识别的mlm的关键资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE transactions on biometrics, behavior, and identity science

CiteScore

10.90

自引率

0.00%

发文量