Inconsistency of LLMs in molecular representations

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY
Bing Yan, Angelica Chen and Kyunghyun Cho
{"title":"Inconsistency of LLMs in molecular representations","authors":"Bing Yan, Angelica Chen and Kyunghyun Cho","doi":"10.1039/D5DD00176E","DOIUrl":null,"url":null,"abstract":"<p >Large language models (LLM) have demonstrated remarkable capabilities in chemistry, yet their ability to capture intrinsic chemistry remains uncertain. Within any familiar, chemically equivalent representation family, rigorous chemical reasoning should be representation-invariant, yielding consistent predictions across these representations. Here, we introduce the first systematic benchmark to evaluate the consistency of LLMs across key chemistry tasks. We curated the benchmark using paired representations of SMILES strings and IUPAC names. We find that the state-of-the-art general LLMs exhibit strikingly low consistency rates (≤1%). Even after finetuning on our dataset, the models still generate inconsistent predictions. To address this, we incorporate a sequence-level symmetric Kullback–Leibler (KL) divergence loss as a consistency regularizer. While this intervention improves surface-level consistency, it fails to enhance accuracy, suggesting that consistency and accuracy are orthogonal properties. These findings indicate that both consistency and accuracy must be considered to properly assess LLMs' capabilities in scientific reasoning.</p>","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 10","pages":" 2876-2892"},"PeriodicalIF":6.2000,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/dd/d5dd00176e?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/dd/d5dd00176e","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Large language models (LLM) have demonstrated remarkable capabilities in chemistry, yet their ability to capture intrinsic chemistry remains uncertain. Within any familiar, chemically equivalent representation family, rigorous chemical reasoning should be representation-invariant, yielding consistent predictions across these representations. Here, we introduce the first systematic benchmark to evaluate the consistency of LLMs across key chemistry tasks. We curated the benchmark using paired representations of SMILES strings and IUPAC names. We find that the state-of-the-art general LLMs exhibit strikingly low consistency rates (≤1%). Even after finetuning on our dataset, the models still generate inconsistent predictions. To address this, we incorporate a sequence-level symmetric Kullback–Leibler (KL) divergence loss as a consistency regularizer. While this intervention improves surface-level consistency, it fails to enhance accuracy, suggesting that consistency and accuracy are orthogonal properties. These findings indicate that both consistency and accuracy must be considered to properly assess LLMs' capabilities in scientific reasoning.

Abstract Image

法学硕士在分子表征上的不一致性
大型语言模型(LLM)已经在化学领域展示了非凡的能力,但它们捕捉内在化学的能力仍然不确定。在任何熟悉的化学等价表示族中,严格的化学推理应该是表示不变的,在这些表示中产生一致的预测。在这里,我们引入了第一个系统基准来评估llm在关键化学任务中的一致性。我们使用SMILES字符串和IUPAC名称的成对表示来策划基准测试。我们发现,最先进的一般法学硕士表现出惊人的低一致性率(≤1%)。即使在对我们的数据集进行微调之后,模型仍然会产生不一致的预测。为了解决这个问题,我们将序列级对称Kullback-Leibler (KL)散度损失作为一致性正则化器。虽然这种干预提高了表面水平的一致性,但却未能提高准确性,这表明一致性和准确性是正交的。这些发现表明,要正确评估法学硕士的科学推理能力,必须考虑一致性和准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.80
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信