Youwan Na, Jeffrey J. Kim, Chanhyoung Park, Jaewon Hwang, Changgi Kim, Hokyung Lee and Jehoon Lee
{"title":"Advanced scientific information mining using LLM-driven approaches in layered cathode materials for sodium-ion batteries†","authors":"Youwan Na, Jeffrey J. Kim, Chanhyoung Park, Jaewon Hwang, Changgi Kim, Hokyung Lee and Jehoon Lee","doi":"10.1039/D5MA00004A","DOIUrl":null,"url":null,"abstract":"<p >Materials informatics (MI) has emerged as a powerful paradigm for accelerating materials discovery and development through data-driven approaches. The scarcity of structured materials data, however, remains a critical bottleneck in minimizing the error between experimental and predicted values. Here, we present an advanced large language model (LLM) framework for building a comprehensive materials database of layered metal oxide (LMO) cathode materials in sodium-ion batteries (SIBs). By implementing optimized advanced retrieval-augmented generation techniques, including the tree of clarity (ToC) methodology, our system achieved an accuracy of 0.8861 and an <em>F</em>1-score of 0.9371 in extracting structured materials data from open-source publications. The framework successfully processed 312 publications, rapidly extracting 945 data points related to material composition, crystallinity, operating voltage, and electrode composition at approximately 20 seconds per paper. This automated approach to materials data acquisition demonstrated here is expected to significantly accelerate the development of comprehensive materials databases and enable rapid materials discovery through MI.</p>","PeriodicalId":18242,"journal":{"name":"Materials Advances","volume":" 8","pages":" 2543-2548"},"PeriodicalIF":5.2000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/ma/d5ma00004a?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Materials Advances","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/ma/d5ma00004a","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Materials informatics (MI) has emerged as a powerful paradigm for accelerating materials discovery and development through data-driven approaches. The scarcity of structured materials data, however, remains a critical bottleneck in minimizing the error between experimental and predicted values. Here, we present an advanced large language model (LLM) framework for building a comprehensive materials database of layered metal oxide (LMO) cathode materials in sodium-ion batteries (SIBs). By implementing optimized advanced retrieval-augmented generation techniques, including the tree of clarity (ToC) methodology, our system achieved an accuracy of 0.8861 and an F1-score of 0.9371 in extracting structured materials data from open-source publications. The framework successfully processed 312 publications, rapidly extracting 945 data points related to material composition, crystallinity, operating voltage, and electrode composition at approximately 20 seconds per paper. This automated approach to materials data acquisition demonstrated here is expected to significantly accelerate the development of comprehensive materials databases and enable rapid materials discovery through MI.