{"title":"为现实世界的材料发现启用大型语言模型","authors":"Santiago Miret, N. M. Anoop Krishnan","doi":"10.1038/s42256-025-01058-y","DOIUrl":null,"url":null,"abstract":"<p>Large language models (LLMs) create exciting possibilities to accelerate scientific discovery and knowledge dissemination in materials science. While LLMs have been successfully applied to select scientific problems and rudimentary challenges, they currently fall short of being practical materials science tools. In this Perspective, we show relevant failure cases of LLMs in materials science that reveal the current limitations of LLMs related to comprehending and reasoning over complex, interconnected materials science knowledge. Given these shortcomings, we outline a framework for developing materials science LLMs (MatSci-LLMs) that are grounded in domain knowledge, which can enable hypothesis generation followed by hypothesis testing for impactful materials science challenges. The path to attaining performant MatSci-LLMs rests, in large part, on building high-quality, multimodal datasets sourced from scientific literature, where various information extraction challenges persist. As such, we describe key materials science information extraction challenges that need to be overcome to build large-scale, multimodal datasets that capture valuable materials science principles and broader knowledge.</p>","PeriodicalId":48533,"journal":{"name":"Nature Machine Intelligence","volume":"23 1","pages":""},"PeriodicalIF":23.9000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enabling large language models for real-world materials discovery\",\"authors\":\"Santiago Miret, N. M. Anoop Krishnan\",\"doi\":\"10.1038/s42256-025-01058-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Large language models (LLMs) create exciting possibilities to accelerate scientific discovery and knowledge dissemination in materials science. While LLMs have been successfully applied to select scientific problems and rudimentary challenges, they currently fall short of being practical materials science tools. In this Perspective, we show relevant failure cases of LLMs in materials science that reveal the current limitations of LLMs related to comprehending and reasoning over complex, interconnected materials science knowledge. Given these shortcomings, we outline a framework for developing materials science LLMs (MatSci-LLMs) that are grounded in domain knowledge, which can enable hypothesis generation followed by hypothesis testing for impactful materials science challenges. The path to attaining performant MatSci-LLMs rests, in large part, on building high-quality, multimodal datasets sourced from scientific literature, where various information extraction challenges persist. As such, we describe key materials science information extraction challenges that need to be overcome to build large-scale, multimodal datasets that capture valuable materials science principles and broader knowledge.</p>\",\"PeriodicalId\":48533,\"journal\":{\"name\":\"Nature Machine Intelligence\",\"volume\":\"23 1\",\"pages\":\"\"},\"PeriodicalIF\":23.9000,\"publicationDate\":\"2025-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nature Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1038/s42256-025-01058-y\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1038/s42256-025-01058-y","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Enabling large language models for real-world materials discovery
Large language models (LLMs) create exciting possibilities to accelerate scientific discovery and knowledge dissemination in materials science. While LLMs have been successfully applied to select scientific problems and rudimentary challenges, they currently fall short of being practical materials science tools. In this Perspective, we show relevant failure cases of LLMs in materials science that reveal the current limitations of LLMs related to comprehending and reasoning over complex, interconnected materials science knowledge. Given these shortcomings, we outline a framework for developing materials science LLMs (MatSci-LLMs) that are grounded in domain knowledge, which can enable hypothesis generation followed by hypothesis testing for impactful materials science challenges. The path to attaining performant MatSci-LLMs rests, in large part, on building high-quality, multimodal datasets sourced from scientific literature, where various information extraction challenges persist. As such, we describe key materials science information extraction challenges that need to be overcome to build large-scale, multimodal datasets that capture valuable materials science principles and broader knowledge.
期刊介绍:
Nature Machine Intelligence is a distinguished publication that presents original research and reviews on various topics in machine learning, robotics, and AI. Our focus extends beyond these fields, exploring their profound impact on other scientific disciplines, as well as societal and industrial aspects. We recognize limitless possibilities wherein machine intelligence can augment human capabilities and knowledge in domains like scientific exploration, healthcare, medical diagnostics, and the creation of safe and sustainable cities, transportation, and agriculture. Simultaneously, we acknowledge the emergence of ethical, social, and legal concerns due to the rapid pace of advancements.
To foster interdisciplinary discussions on these far-reaching implications, Nature Machine Intelligence serves as a platform for dialogue facilitated through Comments, News Features, News & Views articles, and Correspondence. Our goal is to encourage a comprehensive examination of these subjects.
Similar to all Nature-branded journals, Nature Machine Intelligence operates under the guidance of a team of skilled editors. We adhere to a fair and rigorous peer-review process, ensuring high standards of copy-editing and production, swift publication, and editorial independence.