A Hybrid Transformer Architecture with a Quantized Self-Attention Mechanism Applied to Molecular Generation.

IF 5.7 1区化学 Q2 CHEMISTRY, PHYSICAL

Journal of Chemical Theory and Computation Pub Date : 2025-05-27 Epub Date: 2025-05-07 DOI:10.1021/acs.jctc.5c00331

Anthony M Smaldone, Yu Shee, Gregory W Kyro, Marwa H Farag, Zohim Chandani, Elica Kyoseva, Victor S Batista

{"title":"A Hybrid Transformer Architecture with a Quantized Self-Attention Mechanism Applied to Molecular Generation.","authors":"Anthony M Smaldone, Yu Shee, Gregory W Kyro, Marwa H Farag, Zohim Chandani, Elica Kyoseva, Victor S Batista","doi":"10.1021/acs.jctc.5c00331","DOIUrl":null,"url":null,"abstract":"The success of the self-attention mechanism in classical machine learning models has inspired the development of quantum analogs aimed at reducing the computational overhead. Self-attention integrates learnable query and key matrices to calculate attention scores between all pairs of tokens in a sequence. These scores are then multiplied by a learnable value matrix to obtain the output self-attention matrix, enabling the model to effectively capture long-range dependencies within the input sequence. Here, we propose a hybrid quantum-classical self-attention mechanism as part of a transformer decoder, the architecture underlying large language models (LLMs). To demonstrate its utility in chemistry, we train this model on the QM9 dataset for conditional generation, using SMILES strings as input, each labeled with a set of physicochemical properties that serve as conditions during inference. Our theoretical analysis shows that the time complexity of the query-key dot product is reduced from <math><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mi>d</mi><mo>)</mo></mrow></math> in a classical model to <math><mi>O</mi><mrow><mo>(</mo><msup><mrow><mi>n</mi></mrow><mrow><mn>2</mn></mrow></msup><mi>log</mi><mo>⁡</mo><mi>d</mi><mo>)</mo></mrow></math> in our quantum model, where n and d represent the sequence length and the embedding dimension, respectively. We perform simulations using NVIDIA's CUDA-Q platform, which is designed for efficient GPU scalability. This work provides a promising avenue for quantum-enhanced natural language processing (NLP).","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":"5143-5154"},"PeriodicalIF":5.7000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jctc.5c00331","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/7 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

The success of the self-attention mechanism in classical machine learning models has inspired the development of quantum analogs aimed at reducing the computational overhead. Self-attention integrates learnable query and key matrices to calculate attention scores between all pairs of tokens in a sequence. These scores are then multiplied by a learnable value matrix to obtain the output self-attention matrix, enabling the model to effectively capture long-range dependencies within the input sequence. Here, we propose a hybrid quantum-classical self-attention mechanism as part of a transformer decoder, the architecture underlying large language models (LLMs). To demonstrate its utility in chemistry, we train this model on the QM9 dataset for conditional generation, using SMILES strings as input, each labeled with a set of physicochemical properties that serve as conditions during inference. Our theoretical analysis shows that the time complexity of the query-key dot product is reduced from $O (n^{2} d)$ in a classical model to $O (n^{2} \log d)$ in our quantum model, where n and d represent the sequence length and the embedding dimension, respectively. We perform simulations using NVIDIA's CUDA-Q platform, which is designed for efficient GPU scalability. This work provides a promising avenue for quantum-enhanced natural language processing (NLP).

查看原文本刊更多论文

一种具有量化自关注机制的混合变压器结构应用于分子生成。

经典机器学习模型中自注意机制的成功激发了旨在减少计算开销的量子类似物的发展。自注意集成了可学习查询和关键矩阵来计算序列中所有标记对之间的注意分数。然后将这些分数乘以一个可学习的值矩阵，得到输出的自注意矩阵，使模型能够有效地捕获输入序列中的长期依赖关系。在这里，我们提出了一种混合量子经典自关注机制，作为转换器解码器的一部分，这是大型语言模型（llm）的基础架构。为了证明其在化学中的实用性，我们在QM9数据集上训练该模型进行条件生成，使用SMILES字符串作为输入，每个字符串都标记有一组物理化学性质，作为推理过程中的条件。我们的理论分析表明，查询键点积的时间复杂度从经典模型中的O（n2d）降低到量子模型中的O(n2log)，其中n和d分别表示序列长度和嵌入维数。我们使用NVIDIA的CUDA-Q平台进行模拟，该平台旨在实现高效的GPU可扩展性。这项工作为量子增强自然语言处理（NLP）提供了一条有前途的途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Chemical Theory and Computation 化学-物理：原子、分子和化学物理

CiteScore

9.90

自引率

16.40%

发文量

568

审稿时长

1 months

期刊介绍： The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.