How Does a Generative Large Language Model Perform on Domain-Specific Information Extraction?─A Comparison between GPT-4 and a Rule-Based Method on Band Gap Extraction

IF 5.3 2区化学 Q1 CHEMISTRY, MEDICINAL

Journal of Chemical Information and Modeling Pub Date : 2024-10-07 DOI:10.1021/acs.jcim.4c00882

Xin Wang, Liangliang Huang, Shuozhi Xu, Kun Lu

{"title":"How Does a Generative Large Language Model Perform on Domain-Specific Information Extraction?─A Comparison between GPT-4 and a Rule-Based Method on Band Gap Extraction","authors":"Xin Wang, Liangliang Huang, Shuozhi Xu, Kun Lu","doi":"10.1021/acs.jcim.4c00882","DOIUrl":null,"url":null,"abstract":"The advent of generative Large Language Models (LLMs) has greatly impacted the field of Natural Language Processing. However, it is inconclusive how generative LLMs perform on domain-specific information extraction tasks. This study compares the performance of GPT-4 and a rule-based information extraction method based on ChemDataExtractor on band gap information extraction, a task that has important implications for the materials science domain. No training data is required for either method, which is desirable because there is a lack of training data in the materials science domain compared with a variety of material information that is of interest. Manual evaluation on 415 randomly selected articles showed that the GPT-4 model achieved a higher level of accuracy in extracting materials’ band gap information than the rule-based method (Correctness 87.95% vs 51.08%, Partial correctness 11.33% vs 36.87%, incorrectness 0.72% vs 12.05%). Further analysis of the errors reveals the strengths and weaknesses of the GPT-4 model compared to the rule-based method. The GPT-4 model shows stronger performance in interdependency resolution and complicated material name recognition, while it also has weaknesses in hallucination, identifying band gap values, and identifying band gap types. Revised prompt based on the error analysis leads to improved accuracy for GPT-4. To the best of our knowledge, this study is the first to compare the GPT-4 model and ChemDataExtractor for the band gap extraction task. This study provides evidence to support using generative LLMs for domain-specific information extraction tasks.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":"55 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c00882","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}

引用次数: 0

Abstract

The advent of generative Large Language Models (LLMs) has greatly impacted the field of Natural Language Processing. However, it is inconclusive how generative LLMs perform on domain-specific information extraction tasks. This study compares the performance of GPT-4 and a rule-based information extraction method based on ChemDataExtractor on band gap information extraction, a task that has important implications for the materials science domain. No training data is required for either method, which is desirable because there is a lack of training data in the materials science domain compared with a variety of material information that is of interest. Manual evaluation on 415 randomly selected articles showed that the GPT-4 model achieved a higher level of accuracy in extracting materials’ band gap information than the rule-based method (Correctness 87.95% vs 51.08%, Partial correctness 11.33% vs 36.87%, incorrectness 0.72% vs 12.05%). Further analysis of the errors reveals the strengths and weaknesses of the GPT-4 model compared to the rule-based method. The GPT-4 model shows stronger performance in interdependency resolution and complicated material name recognition, while it also has weaknesses in hallucination, identifying band gap values, and identifying band gap types. Revised prompt based on the error analysis leads to improved accuracy for GPT-4. To the best of our knowledge, this study is the first to compare the GPT-4 model and ChemDataExtractor for the band gap extraction task. This study provides evidence to support using generative LLMs for domain-specific information extraction tasks.

Abstract Image

查看原文本刊更多论文

生成式大语言模型在特定领域信息提取方面表现如何？

生成式大语言模型（LLM）的出现极大地影响了自然语言处理领域。然而，生成式大语言模型在特定领域信息提取任务中的表现尚无定论。本研究比较了 GPT-4 和基于 ChemDataExtractor 的规则信息提取方法在带隙信息提取任务中的表现，带隙信息提取任务对材料科学领域具有重要意义。这两种方法都不需要训练数据，这一点很可取，因为与感兴趣的各种材料信息相比，材料科学领域缺乏训练数据。对随机抽取的 415 篇文章进行的人工评估显示，GPT-4 模型在提取材料带隙信息方面的准确率高于基于规则的方法（正确率 87.95% vs 51.08%，部分正确率 11.33% vs 36.87%，错误率 0.72% vs 12.05%）。对错误的进一步分析显示了 GPT-4 模型与基于规则的方法相比的优缺点。GPT-4 模型在相互依存关系解析和复杂材料名称识别方面表现更强，而在幻觉、带隙值识别和带隙类型识别方面也存在不足。基于误差分析的修正提示提高了 GPT-4 的准确性。据我们所知，本研究是首次比较 GPT-4 模型和 ChemDataExtractor 在带隙提取任务中的应用。这项研究为在特定领域的信息提取任务中使用生成式 LLM 提供了支持证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Chemical Information and Modeling 化学-化学综合

CiteScore

9.80

自引率

10.70%

发文量

529

审稿时长

1.4 months

期刊介绍： The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.