Causal discovery from data assisted by large language models

IF 3.6 2区物理与天体物理 Q2 PHYSICS, APPLIED

Applied Physics Letters Pub Date : 2025-09-24 DOI:10.1063/5.0272287

Kamyar Barakati, Aleksander Molak, Chris Nelson, Xiaohang Zhang, Ichiro Takeuchi, Sergei V. Kalinin

{"title":"Causal discovery from data assisted by large language models","authors":"Kamyar Barakati, Aleksander Molak, Chris Nelson, Xiaohang Zhang, Ichiro Takeuchi, Sergei V. Kalinin","doi":"10.1063/5.0272287","DOIUrl":null,"url":null,"abstract":"Knowledge-driven discovery of novel materials necessitates the development of causal models for property emergence. While in the classical physical paradigm, the causal relationships are deduced based on physical principles or via experiment, the rapid accumulation of observational data necessitates learning causal relationships between dissimilar aspects of material structure and functionalities based on observations. For this, it is essential to integrate experimental data with prior domain knowledge. Here, we demonstrate this approach by combining high-resolution scanning transmission electron microscopy data with insights derived from large language models (LLMs). By applying ChatGPT to domain-specific literature, such as arXiv papers on ferroelectrics, and combining the obtained information with data-driven causal discovery, we construct adjacency matrices for directed acyclic graphs that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3. This approach enables us to hypothesize how synthesis conditions influence material properties and guides experimental validation. The ultimate objective of this work is to develop a unified framework that integrates LLM-driven literature analysis with data-driven discovery, facilitating the precise engineering of ferroelectric materials by establishing clear connections between synthesis conditions and their resulting material properties.","PeriodicalId":8094,"journal":{"name":"Applied Physics Letters","volume":"1 1","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Physics Letters","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1063/5.0272287","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

Knowledge-driven discovery of novel materials necessitates the development of causal models for property emergence. While in the classical physical paradigm, the causal relationships are deduced based on physical principles or via experiment, the rapid accumulation of observational data necessitates learning causal relationships between dissimilar aspects of material structure and functionalities based on observations. For this, it is essential to integrate experimental data with prior domain knowledge. Here, we demonstrate this approach by combining high-resolution scanning transmission electron microscopy data with insights derived from large language models (LLMs). By applying ChatGPT to domain-specific literature, such as arXiv papers on ferroelectrics, and combining the obtained information with data-driven causal discovery, we construct adjacency matrices for directed acyclic graphs that map the causal relationships between structural, chemical, and polarization degrees of freedom in Sm-doped BiFeO3. This approach enables us to hypothesize how synthesis conditions influence material properties and guides experimental validation. The ultimate objective of this work is to develop a unified framework that integrates LLM-driven literature analysis with data-driven discovery, facilitating the precise engineering of ferroelectric materials by establishing clear connections between synthesis conditions and their resulting material properties.

查看原文本刊更多论文

在大型语言模型的帮助下，从数据中发现因果关系

知识驱动的新材料发现需要财产出现的因果模型的发展。在经典物理范式中，因果关系是根据物理原理或通过实验推导出来的，而观测数据的快速积累需要基于观测来学习材料结构和功能不同方面之间的因果关系。为此，必须将实验数据与先验领域知识相结合。在这里，我们通过结合高分辨率扫描透射电子显微镜数据和来自大型语言模型（LLMs）的见解来证明这种方法。通过将ChatGPT应用于特定领域的文献，如arXiv关于铁电体的论文，并将获得的信息与数据驱动的因果发现相结合，我们构建了有向无环图的邻接矩阵，以映射sm掺杂BiFeO3的结构、化学和极化自由度之间的因果关系。这种方法使我们能够假设合成条件如何影响材料性能并指导实验验证。这项工作的最终目标是开发一个统一的框架，将法学硕士驱动的文献分析与数据驱动的发现相结合，通过在合成条件与其所得材料性质之间建立明确的联系，促进铁电材料的精确工程。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Physics Letters 物理-物理：应用

CiteScore

6.40

自引率

10.00%

发文量

1821

审稿时长

1.6 months

期刊介绍： Applied Physics Letters (APL) features concise, up-to-date reports on significant new findings in applied physics. Emphasizing rapid dissemination of key data and new physical insights, APL offers prompt publication of new experimental and theoretical papers reporting applications of physics phenomena to all branches of science, engineering, and modern technology. In addition to regular articles, the journal also publishes invited Fast Track, Perspectives, and in-depth Editorials which report on cutting-edge areas in applied physics. APL Perspectives are forward-looking invited letters which highlight recent developments or discoveries. Emphasis is placed on very recent developments, potentially disruptive technologies, open questions and possible solutions. They also include a mini-roadmap detailing where the community should direct efforts in order for the phenomena to be viable for application and the challenges associated with meeting that performance threshold. Perspectives are characterized by personal viewpoints and opinions of recognized experts in the field. Fast Track articles are invited original research articles that report results that are particularly novel and important or provide a significant advancement in an emerging field. Because of the urgency and scientific importance of the work, the peer review process is accelerated. If, during the review process, it becomes apparent that the paper does not meet the Fast Track criterion, it is returned to a normal track.