将机器阅读理解技术应用于材料科学中的命名实体识别。

IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Zihui Huang, Liqiang He, Yuhang Yang, Andi Li, Zhiwen Zhang, Siwei Wu, Yang Wang, Yan He, Xujie Liu
{"title":"将机器阅读理解技术应用于材料科学中的命名实体识别。","authors":"Zihui Huang,&nbsp;Liqiang He,&nbsp;Yuhang Yang,&nbsp;Andi Li,&nbsp;Zhiwen Zhang,&nbsp;Siwei Wu,&nbsp;Yang Wang,&nbsp;Yan He,&nbsp;Xujie Liu","doi":"10.1186/s13321-024-00874-5","DOIUrl":null,"url":null,"abstract":"<div><p>Materials science is an interdisciplinary field that studies the properties, structures, and behaviors of different materials. A large amount of scientific literature contains rich knowledge in the field of materials science, but manually analyzing these papers to find material-related data is a daunting task. In information processing, named entity recognition (NER) plays a crucial role as it can automatically extract entities in the field of materials science, which have significant value in tasks such as building knowledge graphs. The typically used sequence labeling methods for traditional named entity recognition in material science (MatNER) tasks often fail to fully utilize the semantic information in the dataset and cannot effectively extract nested entities. Herein, we proposed to convert the sequence labeling task into a machine reading comprehension (MRC) task. MRC method effectively can solve the challenge of extracting multiple overlapping entities by transforming it into the form of answering multiple independent questions. Moreover, the MRC framework allows for a more comprehensive understanding of the contextual information and semantic relationships within materials science literature, by integrating prior knowledge from queries. State-of-the-art (SOTA) performance was achieved on the Matscholar, BC4CHEMD, NLMChem, SOFC, and SOFC-Slot datasets, with F1-scores of 89.64%, 94.30%, 85.89%, 85.95%, and 71.73%, respectively in MRC approach. By effectively utilizing semantic information and extracting nested entities, this approach holds great significance for knowledge extraction and data analysis in the field of materials science, and thus accelerating the development of material science.</p><p><b>Scientific contribution</b></p><p>We have developed an innovative NER method that enhances the efficiency and accuracy of automatic entity extraction in the field of materials science by transforming the sequence labeling task into a MRC task, this approach provides robust support for constructing knowledge graphs and other data analysis tasks.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00874-5","citationCount":"0","resultStr":"{\"title\":\"Application of machine reading comprehension techniques for named entity recognition in materials science\",\"authors\":\"Zihui Huang,&nbsp;Liqiang He,&nbsp;Yuhang Yang,&nbsp;Andi Li,&nbsp;Zhiwen Zhang,&nbsp;Siwei Wu,&nbsp;Yang Wang,&nbsp;Yan He,&nbsp;Xujie Liu\",\"doi\":\"10.1186/s13321-024-00874-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Materials science is an interdisciplinary field that studies the properties, structures, and behaviors of different materials. A large amount of scientific literature contains rich knowledge in the field of materials science, but manually analyzing these papers to find material-related data is a daunting task. In information processing, named entity recognition (NER) plays a crucial role as it can automatically extract entities in the field of materials science, which have significant value in tasks such as building knowledge graphs. The typically used sequence labeling methods for traditional named entity recognition in material science (MatNER) tasks often fail to fully utilize the semantic information in the dataset and cannot effectively extract nested entities. Herein, we proposed to convert the sequence labeling task into a machine reading comprehension (MRC) task. MRC method effectively can solve the challenge of extracting multiple overlapping entities by transforming it into the form of answering multiple independent questions. Moreover, the MRC framework allows for a more comprehensive understanding of the contextual information and semantic relationships within materials science literature, by integrating prior knowledge from queries. State-of-the-art (SOTA) performance was achieved on the Matscholar, BC4CHEMD, NLMChem, SOFC, and SOFC-Slot datasets, with F1-scores of 89.64%, 94.30%, 85.89%, 85.95%, and 71.73%, respectively in MRC approach. By effectively utilizing semantic information and extracting nested entities, this approach holds great significance for knowledge extraction and data analysis in the field of materials science, and thus accelerating the development of material science.</p><p><b>Scientific contribution</b></p><p>We have developed an innovative NER method that enhances the efficiency and accuracy of automatic entity extraction in the field of materials science by transforming the sequence labeling task into a MRC task, this approach provides robust support for constructing knowledge graphs and other data analysis tasks.</p></div>\",\"PeriodicalId\":617,\"journal\":{\"name\":\"Journal of Cheminformatics\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":7.1000,\"publicationDate\":\"2024-07-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00874-5\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cheminformatics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://link.springer.com/article/10.1186/s13321-024-00874-5\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00874-5","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

摘要

材料科学是一门研究不同材料的特性、结构和行为的交叉学科。大量科学文献蕴含着丰富的材料科学领域知识,但手动分析这些论文以查找与材料相关的数据是一项艰巨的任务。在信息处理中,命名实体识别(NER)起着至关重要的作用,因为它可以自动提取材料科学领域的实体,而这些实体在构建知识图谱等任务中具有重要价值。在传统的材料科学命名实体识别(MatNER)任务中,通常使用的序列标注方法往往不能充分利用数据集中的语义信息,也不能有效地提取嵌套实体。在此,我们提出将序列标注任务转换为机器阅读理解(MRC)任务。MRC 方法通过将其转换为回答多个独立问题的形式,有效地解决了提取多个重叠实体的难题。此外,MRC 框架通过整合查询中的先验知识,可以更全面地理解材料科学文献中的上下文信息和语义关系。MRC 方法在 Matscholar、BC4CHEMD、NLMChem、SOFC 和 SOFC-Slot 数据集上取得了最先进(SOTA)的性能,F1 分数分别为 89.64%、94.30%、85.89%、85.95% 和 71.73%。通过有效利用语义信息和提取嵌套实体,该方法对材料科学领域的知识提取和数据分析具有重要意义,从而加速了材料科学的发展。 科学贡献我们开发了一种创新的 NER 方法,通过将序列标注任务转化为 MRC 任务,提高了材料科学领域实体自动提取的效率和准确性,该方法为构建知识图谱和其他数据分析任务提供了强大的支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Application of machine reading comprehension techniques for named entity recognition in materials science

Materials science is an interdisciplinary field that studies the properties, structures, and behaviors of different materials. A large amount of scientific literature contains rich knowledge in the field of materials science, but manually analyzing these papers to find material-related data is a daunting task. In information processing, named entity recognition (NER) plays a crucial role as it can automatically extract entities in the field of materials science, which have significant value in tasks such as building knowledge graphs. The typically used sequence labeling methods for traditional named entity recognition in material science (MatNER) tasks often fail to fully utilize the semantic information in the dataset and cannot effectively extract nested entities. Herein, we proposed to convert the sequence labeling task into a machine reading comprehension (MRC) task. MRC method effectively can solve the challenge of extracting multiple overlapping entities by transforming it into the form of answering multiple independent questions. Moreover, the MRC framework allows for a more comprehensive understanding of the contextual information and semantic relationships within materials science literature, by integrating prior knowledge from queries. State-of-the-art (SOTA) performance was achieved on the Matscholar, BC4CHEMD, NLMChem, SOFC, and SOFC-Slot datasets, with F1-scores of 89.64%, 94.30%, 85.89%, 85.95%, and 71.73%, respectively in MRC approach. By effectively utilizing semantic information and extracting nested entities, this approach holds great significance for knowledge extraction and data analysis in the field of materials science, and thus accelerating the development of material science.

Scientific contribution

We have developed an innovative NER method that enhances the efficiency and accuracy of automatic entity extraction in the field of materials science by transforming the sequence labeling task into a MRC task, this approach provides robust support for constructing knowledge graphs and other data analysis tasks.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信