Kamyar Barakati, Aleksander Molak, Chris Nelson, Xiaohang Zhang, Ichiro Takeuchi, Sergei V. Kalinin
{"title":"Causal discovery from data assisted by large language models","authors":"Kamyar Barakati, Aleksander Molak, Chris Nelson, Xiaohang Zhang, Ichiro Takeuchi, Sergei V. Kalinin","doi":"10.1063/5.0272287","DOIUrl":null,"url":null,"abstract":"Knowledge-driven discovery of novel materials necessitates the development of causal models for property emergence. While in the classical physical paradigm, the causal relationships are deduced based on physical principles or via experiment, the rapid accumulation of observational data necessitates learning causal relationships between dissimilar aspects of material structure and functionalities based on observations. For this, it is essential to integrate experimental data with prior domain knowledge. Here, we demonstrate this approach by combining high-resolution scanning transmission electron microscopy data with insights derived from large language models (LLMs). By applying ChatGPT to domain-specific literature, such as arXiv papers on ferroelectrics, and combining the obtained information with data-driven causal discovery, we construct adjacency matrices for directed acyclic graphs that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3. This approach enables us to hypothesize how synthesis conditions influence material properties and guides experimental validation. The ultimate objective of this work is to develop a unified framework that integrates LLM-driven literature analysis with data-driven discovery, facilitating the precise engineering of ferroelectric materials by establishing clear connections between synthesis conditions and their resulting material properties.","PeriodicalId":8094,"journal":{"name":"Applied Physics Letters","volume":"1 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Physics Letters","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1063/5.0272287","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, APPLIED","Score":null,"Total":0}
引用次数: 0
Abstract
Knowledge-driven discovery of novel materials necessitates the development of causal models for property emergence. While in the classical physical paradigm, the causal relationships are deduced based on physical principles or via experiment, the rapid accumulation of observational data necessitates learning causal relationships between dissimilar aspects of material structure and functionalities based on observations. For this, it is essential to integrate experimental data with prior domain knowledge. Here, we demonstrate this approach by combining high-resolution scanning transmission electron microscopy data with insights derived from large language models (LLMs). By applying ChatGPT to domain-specific literature, such as arXiv papers on ferroelectrics, and combining the obtained information with data-driven causal discovery, we construct adjacency matrices for directed acyclic graphs that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3. This approach enables us to hypothesize how synthesis conditions influence material properties and guides experimental validation. The ultimate objective of this work is to develop a unified framework that integrates LLM-driven literature analysis with data-driven discovery, facilitating the precise engineering of ferroelectric materials by establishing clear connections between synthesis conditions and their resulting material properties.
期刊介绍:
Applied Physics Letters (APL) features concise, up-to-date reports on significant new findings in applied physics. Emphasizing rapid dissemination of key data and new physical insights, APL offers prompt publication of new experimental and theoretical papers reporting applications of physics phenomena to all branches of science, engineering, and modern technology.
In addition to regular articles, the journal also publishes invited Fast Track, Perspectives, and in-depth Editorials which report on cutting-edge areas in applied physics.
APL Perspectives are forward-looking invited letters which highlight recent developments or discoveries. Emphasis is placed on very recent developments, potentially disruptive technologies, open questions and possible solutions. They also include a mini-roadmap detailing where the community should direct efforts in order for the phenomena to be viable for application and the challenges associated with meeting that performance threshold. Perspectives are characterized by personal viewpoints and opinions of recognized experts in the field.
Fast Track articles are invited original research articles that report results that are particularly novel and important or provide a significant advancement in an emerging field. Because of the urgency and scientific importance of the work, the peer review process is accelerated. If, during the review process, it becomes apparent that the paper does not meet the Fast Track criterion, it is returned to a normal track.