Automating literature screening and curation with applications to computational neuroscience.

IF 4.7 2区 医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Ziqing Ji, Siyan Guo, Yujie Qiao, Robert A McDougal
{"title":"Automating literature screening and curation with applications to computational neuroscience.","authors":"Ziqing Ji, Siyan Guo, Yujie Qiao, Robert A McDougal","doi":"10.1093/jamia/ocae097","DOIUrl":null,"url":null,"abstract":"<p><strong>Objective: </strong>ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For example, we estimate we have captured only around one-third of NEURON models, the most common type of models in ModelDB. To more completely characterize the state of computational neuroscience modeling work, we aim to identify works containing results derived from computational neuroscience approaches and their standardized associated metadata (eg, cell types, research topics).</p><p><strong>Materials and methods: </strong>Known computational neuroscience work from ModelDB and identified neuroscience work queried from PubMed were included in our study. After pre-screening with SPECTER2 (a free document embedding method), GPT-3.5, and GPT-4 were used to identify likely computational neuroscience work and relevant metadata.</p><p><strong>Results: </strong>SPECTER2, GPT-4, and GPT-3.5 demonstrated varied but high abilities in identification of computational neuroscience work. GPT-4 achieved 96.9% accuracy and GPT-3.5 improved from 54.2% to 85.5% through instruction-tuning and Chain of Thought. GPT-4 also showed high potential in identifying relevant metadata annotations.</p><p><strong>Discussion: </strong>Accuracy in identification and extraction might further be improved by dealing with ambiguity of what are computational elements, including more information from papers (eg, Methods section), improving prompts, etc.</p><p><strong>Conclusion: </strong>Natural language processing and large language model techniques can be added to ModelDB to facilitate further model discovery, and will contribute to a more standardized and comprehensive framework for establishing domain-specific resources.</p>","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":null,"pages":null},"PeriodicalIF":4.7000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11187430/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the American Medical Informatics Association","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1093/jamia/ocae097","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Objective: ModelDB (https://modeldb.science) is a discovery platform for computational neuroscience, containing over 1850 published model codes with standardized metadata. These codes were mainly supplied from unsolicited model author submissions, but this approach is inherently limited. For example, we estimate we have captured only around one-third of NEURON models, the most common type of models in ModelDB. To more completely characterize the state of computational neuroscience modeling work, we aim to identify works containing results derived from computational neuroscience approaches and their standardized associated metadata (eg, cell types, research topics).

Materials and methods: Known computational neuroscience work from ModelDB and identified neuroscience work queried from PubMed were included in our study. After pre-screening with SPECTER2 (a free document embedding method), GPT-3.5, and GPT-4 were used to identify likely computational neuroscience work and relevant metadata.

Results: SPECTER2, GPT-4, and GPT-3.5 demonstrated varied but high abilities in identification of computational neuroscience work. GPT-4 achieved 96.9% accuracy and GPT-3.5 improved from 54.2% to 85.5% through instruction-tuning and Chain of Thought. GPT-4 also showed high potential in identifying relevant metadata annotations.

Discussion: Accuracy in identification and extraction might further be improved by dealing with ambiguity of what are computational elements, including more information from papers (eg, Methods section), improving prompts, etc.

Conclusion: Natural language processing and large language model techniques can be added to ModelDB to facilitate further model discovery, and will contribute to a more standardized and comprehensive framework for establishing domain-specific resources.

将文献筛选和整理自动化,并应用于计算神经科学。
目的:ModelDB (https://modeldb.science) 是一个用于计算神经科学的发现平台,包含 1850 多个已发布的带有标准化元数据的模型代码。这些代码主要来自模型作者的主动提交,但这种方法本身存在局限性。例如,我们估计只捕获了神经元模型的三分之一左右,而神经元模型是 ModelDB 中最常见的模型类型。为了更全面地描述计算神经科学建模工作的现状,我们的目标是识别包含计算神经科学方法衍生结果的作品及其标准化的相关元数据(如细胞类型、研究课题):我们的研究包括从 ModelDB 中已知的计算神经科学作品和从 PubMed 中查询到的神经科学作品。经过 SPECTER2(一种免费的文档嵌入方法)、GPT-3.5 和 GPT-4 的预筛选,我们确定了可能的计算神经科学工作和相关元数据:结果:SPECTER2、GPT-4 和 GPT-3.5 在识别计算神经科学作品方面表现出了不同但很高的能力。GPT-4 的准确率达到 96.9%,GPT-3.5 通过指令调整和思维链从 54.2% 提高到 85.5%。GPT-4 在识别相关元数据注释方面也表现出了很高的潜力:讨论:通过处理计算要素的模糊性、从论文中纳入更多信息(如方法部分)、改进提示等,识别和提取的准确性可能会进一步提高:结论:自然语言处理和大型语言模型技术可以添加到 ModelDB 中,以促进模型的进一步发现,并将有助于建立一个更加标准化和全面的框架,用于建立特定领域的资源。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of the American Medical Informatics Association
Journal of the American Medical Informatics Association 医学-计算机:跨学科应用
CiteScore
14.50
自引率
7.80%
发文量
230
审稿时长
3-8 weeks
期刊介绍: JAMIA is AMIA''s premier peer-reviewed journal for biomedical and health informatics. Covering the full spectrum of activities in the field, JAMIA includes informatics articles in the areas of clinical care, clinical research, translational science, implementation science, imaging, education, consumer health, public health, and policy. JAMIA''s articles describe innovative informatics research and systems that help to advance biomedical science and to promote health. Case reports, perspectives and reviews also help readers stay connected with the most important informatics developments in implementation, policy and education.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信