Automating literature screening and curation with applications to computational neuroscience.

IF 4.7 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association Pub Date : 2024-06-20 DOI:10.1093/jamia/ocae097

Ziqing Ji, Siyan Guo, Yujie Qiao, Robert A McDougal

{"title":"Automating literature screening and curation with applications to computational neuroscience.","authors":"Ziqing Ji, Siyan Guo, Yujie Qiao, Robert A McDougal","doi":"10.1093/jamia/ocae097","DOIUrl":null,"url":null,"abstract":"Objective: ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For example, we estimate we have captured only around one-third of NEURON models, the most common type of models in ModelDB. To more completely characterize the state of computational neuroscience modeling work, we aim to identify works containing results derived from computational neuroscience approaches and their standardized associated metadata (eg, cell types, research topics).Materials and methods: Known computational neuroscience work from ModelDB and identified neuroscience work queried from PubMed were included in our study. After pre-screening with SPECTER2 (a free document embedding method), GPT-3.5, and GPT-4 were used to identify likely computational neuroscience work and relevant metadata.Results: SPECTER2, GPT-4, and GPT-3.5 demonstrated varied but high abilities in identification of computational neuroscience work. GPT-4 achieved 96.9% accuracy and GPT-3.5 improved from 54.2% to 85.5% through instruction-tuning and Chain of Thought. GPT-4 also showed high potential in identifying relevant metadata annotations.Discussion: Accuracy in identification and extraction might further be improved by dealing with ambiguity of what are computational elements, including more information from papers (eg, Methods section), improving prompts, etc.Conclusion: Natural language processing and large language model techniques can be added to ModelDB to facilitate further model discovery, and will contribute to a more standardized and comprehensive framework for establishing domain-specific resources.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":"1463-1470"},"PeriodicalIF":4.7000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11187430/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae097","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Objective: ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For example, we estimate we have captured only around one-third of NEURON models, the most common type of models in ModelDB. To more completely characterize the state of computational neuroscience modeling work, we aim to identify works containing results derived from computational neuroscience approaches and their standardized associated metadata (eg, cell types, research topics).

Materials and methods: Known computational neuroscience work from ModelDB and identified neuroscience work queried from PubMed were included in our study. After pre-screening with SPECTER2 (a free document embedding method), GPT-3.5, and GPT-4 were used to identify likely computational neuroscience work and relevant metadata.

Results: SPECTER2, GPT-4, and GPT-3.5 demonstrated varied but high abilities in identification of computational neuroscience work. GPT-4 achieved 96.9% accuracy and GPT-3.5 improved from 54.2% to 85.5% through instruction-tuning and Chain of Thought. GPT-4 also showed high potential in identifying relevant metadata annotations.

Discussion: Accuracy in identification and extraction might further be improved by dealing with ambiguity of what are computational elements, including more information from papers (eg, Methods section), improving prompts, etc.

Conclusion: Natural language processing and large language model techniques can be added to ModelDB to facilitate further model discovery, and will contribute to a more standardized and comprehensive framework for establishing domain-specific resources.

查看原文本刊更多论文

将文献筛选和整理自动化，并应用于计算神经科学。

目的：ModelDB (https://modeldb.science) 是一个用于计算神经科学的发现平台，包含 1850 多个已发布的带有标准化元数据的模型代码。这些代码主要来自模型作者的主动提交，但这种方法本身存在局限性。例如，我们估计只捕获了神经元模型的三分之一左右，而神经元模型是 ModelDB 中最常见的模型类型。为了更全面地描述计算神经科学建模工作的现状，我们的目标是识别包含计算神经科学方法衍生结果的作品及其标准化的相关元数据（如细胞类型、研究课题）：我们的研究包括从 ModelDB 中已知的计算神经科学作品和从 PubMed 中查询到的神经科学作品。经过 SPECTER2（一种免费的文档嵌入方法）、GPT-3.5 和 GPT-4 的预筛选，我们确定了可能的计算神经科学工作和相关元数据：结果：SPECTER2、GPT-4 和 GPT-3.5 在识别计算神经科学作品方面表现出了不同但很高的能力。GPT-4 的准确率达到 96.9%，GPT-3.5 通过指令调整和思维链从 54.2% 提高到 85.5%。GPT-4 在识别相关元数据注释方面也表现出了很高的潜力：讨论：通过处理计算要素的模糊性、从论文中纳入更多信息（如方法部分）、改进提示等，识别和提取的准确性可能会进一步提高：结论：自然语言处理和大型语言模型技术可以添加到 ModelDB 中，以促进模型的进一步发现，并将有助于建立一个更加标准化和全面的框架，用于建立特定领域的资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of the American Medical Informatics Association 医学-计算机：跨学科应用

CiteScore

14.50

自引率

7.80%

发文量

230

审稿时长

3-8 weeks

期刊介绍： JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.