Local large language model-assisted literature mining for on-surface reactions

Materials Genome Engineering Advances Pub Date : 2025-03-12 DOI:10.1002/mgea.88

Juan Xiang, Yizhang Li, Xinyi Zhang, Yu He, Qiang Sun

{"title":"Local large language model-assisted literature mining for on-surface reactions","authors":"Juan Xiang, Yizhang Li, Xinyi Zhang, Yu He, Qiang Sun","doi":"10.1002/mgea.88","DOIUrl":null,"url":null,"abstract":"<p>Large language models (LLMs) excel at extracting information from literatures. However, deploying LLMs necessitates substantial computational resources, and security concerns with online LLMs pose a challenge to their wider applications. Herein, we introduce a method for extracting scientific data from unstructured texts using a local LLM, exemplifying its applications to scientific literatures on the topic of on-surface reactions. By combining prompt engineering and multi-step text preprocessing, we show that the local LLM can effectively extract scientific information, achieving a recall rate of 91% and a precision rate of 70%. Moreover, despite significant differences in model parameter size, the performance of the local LLM is comparable to that of GPT-3.5 turbo (81% recall, 84% precision) and GPT-4o (85% recall, 87% precision). The simplicity, versatility, reduced computational requirements, and enhanced privacy of the local LLM makes it highly promising for data mining, with the potential to accelerate the application and development of LLMs across various fields.</p>","PeriodicalId":100889,"journal":{"name":"Materials Genome Engineering Advances","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mgea.88","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Materials Genome Engineering Advances","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mgea.88","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Large language models (LLMs) excel at extracting information from literatures. However, deploying LLMs necessitates substantial computational resources, and security concerns with online LLMs pose a challenge to their wider applications. Herein, we introduce a method for extracting scientific data from unstructured texts using a local LLM, exemplifying its applications to scientific literatures on the topic of on-surface reactions. By combining prompt engineering and multi-step text preprocessing, we show that the local LLM can effectively extract scientific information, achieving a recall rate of 91% and a precision rate of 70%. Moreover, despite significant differences in model parameter size, the performance of the local LLM is comparable to that of GPT-3.5 turbo (81% recall, 84% precision) and GPT-4o (85% recall, 87% precision). The simplicity, versatility, reduced computational requirements, and enhanced privacy of the local LLM makes it highly promising for data mining, with the potential to accelerate the application and development of LLMs across various fields.

Abstract Image

查看原文本刊更多论文

局部大语言模型辅助表面反应文献挖掘

大型语言模型（llm）擅长从文献中提取信息。然而，部署法学硕士需要大量的计算资源，并且在线法学硕士的安全问题对其更广泛的应用提出了挑战。在此，我们介绍了一种使用本地LLM从非结构化文本中提取科学数据的方法，并举例说明了其在表面反应主题的科学文献中的应用。通过将提示工程和多步文本预处理相结合，我们发现局部LLM可以有效地提取科学信息，召回率达到91%，准确率达到70%。此外，尽管模型参数大小存在显著差异，但局部LLM的性能与GPT-3.5 turbo（召回率81%，准确率84%）和gpt - 40（召回率85%，准确率87%）相当。本地法学硕士的简单性、多功能性、减少计算需求和增强隐私性使其在数据挖掘方面具有很高的前景，具有加速法学硕士在各个领域应用和发展的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Materials Genome Engineering Advances

自引率

0.00%

发文量