大型语言模型可以提取元数据用于人类神经影像学出版物的注释。

IF 2.5 4区医学 Q2 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Frontiers in Neuroinformatics Pub Date : 2025-08-20 eCollection Date: 2025-01-01 DOI:10.3389/fninf.2025.1609077

Matthew D Turner, Abhishek Appaji, Nibras Ar Rakib, Pedram Golnari, Arcot K Rajasekar, Anitha Rathnam K V, Satya S Sahoo, Yue Wang, Lei Wang, Jessica A Turner

{"title":"大型语言模型可以提取元数据用于人类神经影像学出版物的注释。","authors":"Matthew D Turner, Abhishek Appaji, Nibras Ar Rakib, Pedram Golnari, Arcot K Rajasekar, Anitha Rathnam K V, Satya S Sahoo, Yue Wang, Lei Wang, Jessica A Turner","doi":"10.3389/fninf.2025.1609077","DOIUrl":null,"url":null,"abstract":"We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized \"micro-benchmarks,\" like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks.","PeriodicalId":12462,"journal":{"name":"Frontiers in Neuroinformatics","volume":"19 ","pages":"1609077"},"PeriodicalIF":2.5000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405296/pdf/","citationCount":"0","resultStr":"{\"title\":\"Large language models can extract metadata for annotation of human neuroimaging publications.\",\"authors\":\"Matthew D Turner, Abhishek Appaji, Nibras Ar Rakib, Pedram Golnari, Arcot K Rajasekar, Anitha Rathnam K V, Satya S Sahoo, Yue Wang, Lei Wang, Jessica A Turner\",\"doi\":\"10.3389/fninf.2025.1609077\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized \\\"micro-benchmarks,\\\" like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks.\",\"PeriodicalId\":12462,\"journal\":{\"name\":\"Frontiers in Neuroinformatics\",\"volume\":\"19 \",\"pages\":\"1609077\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12405296/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Neuroinformatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.3389/fninf.2025.1609077\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Neuroinformatics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.3389/fninf.2025.1609077","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

我们表明，最近（2024年中后期）商业大型语言模型（llm）能够进行高质量的元数据提取和注释，而研究者在神经影像学文献中的几个示例现实世界注释任务中只需要很少的工作。我们调查了OpenAI的gpt - 40 LLM，它与几组经过专门训练和监督的人类注释器表现相当。在没有反馈给LLM的情况下，LLM实现了与人类相似的性能，在0.91到0.97之间。回顾LLM和黄金标准人工注释之间的分歧，我们注意到，在大多数情况下，LLM的实际错误与人为错误相当，而且在许多情况下，这些分歧并不是错误。根据我们测试的特定类型的注释，使用特别审查的金标准正确值，LLM性能可用于大规模的元数据注释。我们鼓励其他研究小组开发和提供更专业的“微基准测试”，就像我们在这里提供的那样，用于测试llm和更复杂的代理系统在实际元数据注释任务中的注释性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Large language models can extract metadata for annotation of human neuroimaging publications.

查看原文本刊更多论文

Large language models can extract metadata for annotation of human neuroimaging publications.

We show that recent (mid-to-late 2024) commercial large language models (LLMs) are capable of good quality metadata extraction and annotation with very little work on the part of investigators for several exemplar real-world annotation tasks in the neuroimaging literature. We investigated the GPT-4o LLM from OpenAI which performed comparably with several groups of specially trained and supervised human annotators. The LLM achieves similar performance to humans, between 0.91 and 0.97 on zero-shot prompts without feedback to the LLM. Reviewing the disagreements between LLM and gold standard human annotations we note that actual LLM errors are comparable to human errors in most cases, and in many cases these disagreements are not errors. Based on the specific types of annotations we tested, with exceptionally reviewed gold-standard correct values, the LLM performance is usable for metadata annotation at scale. We encourage other research groups to develop and make available more specialized "micro-benchmarks," like the ones we provide here, for testing both LLMs, and more complex agent systems annotation performance in real-world metadata annotation tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Frontiers in Neuroinformatics MATHEMATICAL & COMPUTATIONAL BIOLOGY-NEUROSCIENCES

CiteScore

4.80

自引率

5.70%

发文量

132

审稿时长

14 weeks

期刊介绍： Frontiers in Neuroinformatics publishes rigorously peer-reviewed research on the development and implementation of numerical/computational models and analytical tools used to share, integrate and analyze experimental data and advance theories of the nervous system functions. Specialty Chief Editors Jan G. Bjaalie at the University of Oslo and Sean L. Hill at the École Polytechnique Fédérale de Lausanne are supported by an outstanding Editorial Board of international experts. This multidisciplinary open-access journal is at the forefront of disseminating and communicating scientific knowledge and impactful discoveries to researchers, academics and the public worldwide. Neuroscience is being propelled into the information age as the volume of information explodes, demanding organization and synthesis. Novel synthesis approaches are opening up a new dimension for the exploration of the components of brain elements and systems and the vast number of variables that underlie their functions. Neural data is highly heterogeneous with complex inter-relations across multiple levels, driving the need for innovative organizing and synthesizing approaches from genes to cognition, and covering a range of species and disease states. Frontiers in Neuroinformatics therefore welcomes submissions on existing neuroscience databases, development of data and knowledge bases for all levels of neuroscience, applications and technologies that can facilitate data sharing (interoperability, formats, terminologies, and ontologies), and novel tools for data acquisition, analyses, visualization, and dissemination of nervous system data. Our journal welcomes submissions on new tools (software and hardware) that support brain modeling, and the merging of neuroscience databases with brain models used for simulation and visualization.