Local large language model-assisted literature mining for on-surface reactions

Juan Xiang, Yizhang Li, Xinyi Zhang, Yu He, Qiang Sun
{"title":"Local large language model-assisted literature mining for on-surface reactions","authors":"Juan Xiang,&nbsp;Yizhang Li,&nbsp;Xinyi Zhang,&nbsp;Yu He,&nbsp;Qiang Sun","doi":"10.1002/mgea.88","DOIUrl":null,"url":null,"abstract":"<p>Large language models (LLMs) excel at extracting information from literatures. However, deploying LLMs necessitates substantial computational resources, and security concerns with online LLMs pose a challenge to their wider applications. Herein, we introduce a method for extracting scientific data from unstructured texts using a local LLM, exemplifying its applications to scientific literatures on the topic of on-surface reactions. By combining prompt engineering and multi-step text preprocessing, we show that the local LLM can effectively extract scientific information, achieving a recall rate of 91% and a precision rate of 70%. Moreover, despite significant differences in model parameter size, the performance of the local LLM is comparable to that of GPT-3.5 turbo (81% recall, 84% precision) and GPT-4o (85% recall, 87% precision). The simplicity, versatility, reduced computational requirements, and enhanced privacy of the local LLM makes it highly promising for data mining, with the potential to accelerate the application and development of LLMs across various fields.</p>","PeriodicalId":100889,"journal":{"name":"Materials Genome Engineering Advances","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mgea.88","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Materials Genome Engineering Advances","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mgea.88","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLMs) excel at extracting information from literatures. However, deploying LLMs necessitates substantial computational resources, and security concerns with online LLMs pose a challenge to their wider applications. Herein, we introduce a method for extracting scientific data from unstructured texts using a local LLM, exemplifying its applications to scientific literatures on the topic of on-surface reactions. By combining prompt engineering and multi-step text preprocessing, we show that the local LLM can effectively extract scientific information, achieving a recall rate of 91% and a precision rate of 70%. Moreover, despite significant differences in model parameter size, the performance of the local LLM is comparable to that of GPT-3.5 turbo (81% recall, 84% precision) and GPT-4o (85% recall, 87% precision). The simplicity, versatility, reduced computational requirements, and enhanced privacy of the local LLM makes it highly promising for data mining, with the potential to accelerate the application and development of LLMs across various fields.

Abstract Image

局部大语言模型辅助表面反应文献挖掘
大型语言模型(llm)擅长从文献中提取信息。然而,部署法学硕士需要大量的计算资源,并且在线法学硕士的安全问题对其更广泛的应用提出了挑战。在此,我们介绍了一种使用本地LLM从非结构化文本中提取科学数据的方法,并举例说明了其在表面反应主题的科学文献中的应用。通过将提示工程和多步文本预处理相结合,我们发现局部LLM可以有效地提取科学信息,召回率达到91%,准确率达到70%。此外,尽管模型参数大小存在显著差异,但局部LLM的性能与GPT-3.5 turbo(召回率81%,准确率84%)和gpt - 40(召回率85%,准确率87%)相当。本地法学硕士的简单性、多功能性、减少计算需求和增强隐私性使其在数据挖掘方面具有很高的前景,具有加速法学硕士在各个领域应用和发展的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信