Juan Xiang, Yizhang Li, Xinyi Zhang, Yu He, Qiang Sun
{"title":"局部大语言模型辅助表面反应文献挖掘","authors":"Juan Xiang, Yizhang Li, Xinyi Zhang, Yu He, Qiang Sun","doi":"10.1002/mgea.88","DOIUrl":null,"url":null,"abstract":"<p>Large language models (LLMs) excel at extracting information from literatures. However, deploying LLMs necessitates substantial computational resources, and security concerns with online LLMs pose a challenge to their wider applications. Herein, we introduce a method for extracting scientific data from unstructured texts using a local LLM, exemplifying its applications to scientific literatures on the topic of on-surface reactions. By combining prompt engineering and multi-step text preprocessing, we show that the local LLM can effectively extract scientific information, achieving a recall rate of 91% and a precision rate of 70%. Moreover, despite significant differences in model parameter size, the performance of the local LLM is comparable to that of GPT-3.5 turbo (81% recall, 84% precision) and GPT-4o (85% recall, 87% precision). The simplicity, versatility, reduced computational requirements, and enhanced privacy of the local LLM makes it highly promising for data mining, with the potential to accelerate the application and development of LLMs across various fields.</p>","PeriodicalId":100889,"journal":{"name":"Materials Genome Engineering Advances","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mgea.88","citationCount":"0","resultStr":"{\"title\":\"Local large language model-assisted literature mining for on-surface reactions\",\"authors\":\"Juan Xiang, Yizhang Li, Xinyi Zhang, Yu He, Qiang Sun\",\"doi\":\"10.1002/mgea.88\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Large language models (LLMs) excel at extracting information from literatures. However, deploying LLMs necessitates substantial computational resources, and security concerns with online LLMs pose a challenge to their wider applications. Herein, we introduce a method for extracting scientific data from unstructured texts using a local LLM, exemplifying its applications to scientific literatures on the topic of on-surface reactions. By combining prompt engineering and multi-step text preprocessing, we show that the local LLM can effectively extract scientific information, achieving a recall rate of 91% and a precision rate of 70%. Moreover, despite significant differences in model parameter size, the performance of the local LLM is comparable to that of GPT-3.5 turbo (81% recall, 84% precision) and GPT-4o (85% recall, 87% precision). The simplicity, versatility, reduced computational requirements, and enhanced privacy of the local LLM makes it highly promising for data mining, with the potential to accelerate the application and development of LLMs across various fields.</p>\",\"PeriodicalId\":100889,\"journal\":{\"name\":\"Materials Genome Engineering Advances\",\"volume\":\"3 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/mgea.88\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Materials Genome Engineering Advances\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/mgea.88\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Materials Genome Engineering Advances","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/mgea.88","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Local large language model-assisted literature mining for on-surface reactions
Large language models (LLMs) excel at extracting information from literatures. However, deploying LLMs necessitates substantial computational resources, and security concerns with online LLMs pose a challenge to their wider applications. Herein, we introduce a method for extracting scientific data from unstructured texts using a local LLM, exemplifying its applications to scientific literatures on the topic of on-surface reactions. By combining prompt engineering and multi-step text preprocessing, we show that the local LLM can effectively extract scientific information, achieving a recall rate of 91% and a precision rate of 70%. Moreover, despite significant differences in model parameter size, the performance of the local LLM is comparable to that of GPT-3.5 turbo (81% recall, 84% precision) and GPT-4o (85% recall, 87% precision). The simplicity, versatility, reduced computational requirements, and enhanced privacy of the local LLM makes it highly promising for data mining, with the potential to accelerate the application and development of LLMs across various fields.