Exploring Generative Pre-Trained Transformer-4-Vision for Nystagmus Classification: Development and Validation of a Pupil-Tracking Process.

IF 2 Q3 HEALTH CARE SCIENCES & SERVICES

JMIR Formative Research Pub Date : 2025-06-06 DOI:10.2196/70070

Masao Noda, Ryota Koshu, Reiko Tsunoda, Hirofumi Ogihara, Tomohiko Kamo, Makoto Ito, Hiroaki Fushiki

{"title":"Exploring Generative Pre-Trained Transformer-4-Vision for Nystagmus Classification: Development and Validation of a Pupil-Tracking Process.","authors":"Masao Noda, Ryota Koshu, Reiko Tsunoda, Hirofumi Ogihara, Tomohiko Kamo, Makoto Ito, Hiroaki Fushiki","doi":"10.2196/70070","DOIUrl":null,"url":null,"abstract":"Background: Conventional nystagmus classification methods often rely on subjective observation by specialists, which is time-consuming and variable among clinicians. Recently, deep learning techniques have been used to automate nystagmus classification using convolutional and recurrent neural networks. These networks can accurately classify nystagmus patterns using video data. However, associated challenges including the need for large datasets when creating models, limited applicability to address specific image conditions, and the complexity associated with using these models.Objective: This study aimed to evaluate a novel approach for nystagmus classification that used the Generative Pre-trained Transformer 4 Vision (GPT-4V) model, which is a state-of-the-art large-scale language model with powerful image recognition capabilities.Methods: We developed a pupil-tracking process using a nystagmus-recording video and verified the optimization model's accuracy using GPT-4V classification and nystagmus recording. We tested whether the created optimization model could be evaluated in six categories of nystagmus: right horizontal, left horizontal, upward, downward, right torsional, and left torsional. The traced trajectory was input as two-dimensional coordinate data or an image, and multiple in-context learning methods were evaluated.Results: The developed model showed an overall classification accuracy of 37% when using pupil-traced images and a maximum accuracy of 24.6% when pupil coordinates were used as input. Regarding orientation, we achieved a maximum accuracy of 69% for the classification of horizontal nystagmus patterns but a lower accuracy for the vertical and torsional components.Conclusions: We demonstrated the potential of versatile vertigo management in a generative artificial intelligence model that improves the accuracy and efficiency of nystagmus classification. We also highlighted areas for further improvement, such as expanding the dataset size and enhancing input modalities, to improve classification performance across all nystagmus types. The GPT-4V model validated only for recognizing still images can be linked to video classification and proposed as a novel method.","PeriodicalId":14841,"journal":{"name":"JMIR Formative Research","volume":"9 ","pages":"e70070"},"PeriodicalIF":2.0000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12164947/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR Formative Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/70070","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Conventional nystagmus classification methods often rely on subjective observation by specialists, which is time-consuming and variable among clinicians. Recently, deep learning techniques have been used to automate nystagmus classification using convolutional and recurrent neural networks. These networks can accurately classify nystagmus patterns using video data. However, associated challenges including the need for large datasets when creating models, limited applicability to address specific image conditions, and the complexity associated with using these models.

Objective: This study aimed to evaluate a novel approach for nystagmus classification that used the Generative Pre-trained Transformer 4 Vision (GPT-4V) model, which is a state-of-the-art large-scale language model with powerful image recognition capabilities.

Methods: We developed a pupil-tracking process using a nystagmus-recording video and verified the optimization model's accuracy using GPT-4V classification and nystagmus recording. We tested whether the created optimization model could be evaluated in six categories of nystagmus: right horizontal, left horizontal, upward, downward, right torsional, and left torsional. The traced trajectory was input as two-dimensional coordinate data or an image, and multiple in-context learning methods were evaluated.

Results: The developed model showed an overall classification accuracy of 37% when using pupil-traced images and a maximum accuracy of 24.6% when pupil coordinates were used as input. Regarding orientation, we achieved a maximum accuracy of 69% for the classification of horizontal nystagmus patterns but a lower accuracy for the vertical and torsional components.

Conclusions: We demonstrated the potential of versatile vertigo management in a generative artificial intelligence model that improves the accuracy and efficiency of nystagmus classification. We also highlighted areas for further improvement, such as expanding the dataset size and enhancing input modalities, to improve classification performance across all nystagmus types. The GPT-4V model validated only for recognizing still images can be linked to video classification and proposed as a novel method.

查看原文本刊更多论文

探索生成式预训练变形-4视觉用于眼球震颤分类：瞳孔跟踪过程的开发和验证。

背景：传统的眼球震颤分类方法往往依赖于专家的主观观察，这是耗时的和临床医生之间的变化。最近，深度学习技术已被用于使用卷积和循环神经网络自动分类眼球震颤。这些网络可以利用视频数据准确地对眼球震颤模式进行分类。然而，相关的挑战包括在创建模型时需要大型数据集，解决特定图像条件的有限适用性，以及使用这些模型相关的复杂性。目的：本研究旨在评估一种使用生成式预训练变形视觉（GPT-4V）模型进行眼球震颤分类的新方法，该模型是一种具有强大图像识别能力的先进大规模语言模型。方法：利用眼球震颤记录视频开发瞳孔跟踪过程，并利用GPT-4V分类和眼球震颤记录验证优化模型的准确性。我们测试了所创建的优化模型是否可以在6类眼球震颤中进行评估：右水平、左水平、向上、向下、右扭转和左扭转。将跟踪的轨迹以二维坐标数据或图像的形式输入，并对多种情境学习方法进行评估。结果：所开发的模型在使用瞳孔跟踪图像时的总体分类准确率为37%，在使用瞳孔坐标作为输入时的最高准确率为24.6%。在定位方面，我们对水平型眼球震颤模式的分类达到了69%的最高准确度，但对垂直和扭转型眼球震颤模式的分类精度较低。结论：我们展示了在生成式人工智能模型中多功能眩晕管理的潜力，该模型提高了眼球震颤分类的准确性和效率。我们还强调了需要进一步改进的领域，例如扩大数据集大小和增强输入方式，以提高所有眼球震颤类型的分类性能。仅用于识别静止图像的GPT-4V模型可以与视频分类联系起来，并作为一种新方法提出。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊