Band gap information extraction from materials science literature - a pilot study

Satanu Ghosh, Kun Lu
{"title":"Band gap information extraction from materials science literature - a pilot study","authors":"Satanu Ghosh, Kun Lu","doi":"10.1108/ajim-03-2022-0141","DOIUrl":null,"url":null,"abstract":"PurposeThe purpose of this paper is to present a preliminary work on extracting band gap information of materials from academic papers. With increasing demand for renewable energy, band gap information will help material scientists design and implement novel photovoltaic (PV) cells.Design/methodology/approachThe authors collected 1.44 million titles and abstracts of scholarly articles related to materials science, and then filtered the collection to 11,939 articles that potentially contain relevant information about materials and their band gap values. ChemDataExtractor was extended to extract information about PV materials and their band gap information. Evaluation was performed on randomly sampled information records of 415 papers.FindingsThe findings of this study show that the current system is able to correctly extract information for 51.32% articles, with partially correct extraction for 36.62% articles and incorrect for 12.04%. The authors have also identified the errors belonging to three main categories pertaining to chemical entity identification, band gap information and interdependency resolution. Future work will focus on addressing these errors to improve the performance of the system.Originality/valueThe authors did not find any literature to date on band gap information extraction from academic text using automated methods. This work is unique and original. Band gap information is of importance to materials scientists in applications such as solar cells, light emitting diodes and laser diodes.","PeriodicalId":421104,"journal":{"name":"Aslib J. Inf. Manag.","volume":"12 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Aslib J. Inf. Manag.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/ajim-03-2022-0141","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

PurposeThe purpose of this paper is to present a preliminary work on extracting band gap information of materials from academic papers. With increasing demand for renewable energy, band gap information will help material scientists design and implement novel photovoltaic (PV) cells.Design/methodology/approachThe authors collected 1.44 million titles and abstracts of scholarly articles related to materials science, and then filtered the collection to 11,939 articles that potentially contain relevant information about materials and their band gap values. ChemDataExtractor was extended to extract information about PV materials and their band gap information. Evaluation was performed on randomly sampled information records of 415 papers.FindingsThe findings of this study show that the current system is able to correctly extract information for 51.32% articles, with partially correct extraction for 36.62% articles and incorrect for 12.04%. The authors have also identified the errors belonging to three main categories pertaining to chemical entity identification, band gap information and interdependency resolution. Future work will focus on addressing these errors to improve the performance of the system.Originality/valueThe authors did not find any literature to date on band gap information extraction from academic text using automated methods. This work is unique and original. Band gap information is of importance to materials scientists in applications such as solar cells, light emitting diodes and laser diodes.
从材料科学文献中提取带隙信息-初步研究
目的介绍了从学术论文中提取材料带隙信息的初步工作。随着对可再生能源需求的增加,带隙信息将有助于材料科学家设计和实现新型光伏电池。设计/方法/方法作者收集了144万篇与材料科学相关的学术文章的标题和摘要,然后筛选出11939篇可能包含材料及其带隙值相关信息的文章。ChemDataExtractor扩展到提取PV材料及其带隙信息。对随机抽取的415篇论文的信息记录进行评价。本研究的结果表明,当前系统能够正确提取51.32%的文章的信息,部分正确提取36.62%的文章,不正确提取12.04%的文章。作者还确定了与化学实体识别、带隙信息和相互依赖性解析有关的三大类错误。未来的工作将集中在解决这些错误,以提高系统的性能。原创性/价值作者没有发现任何文献到目前为止的带隙信息提取从学术文本使用自动化的方法。这件作品是独一无二的。带隙信息对材料科学家在太阳能电池、发光二极管和激光二极管等应用中具有重要意义。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信