基于体素投影面积和深度学习的准确、合理的碰撞截面预测

IF 2.3 4区 化学 Q1 SOCIAL WORK
Jiongyu Wang, Yuxuan Liao, Ting Xie, Ruixi Chen, Jiahui Lai, Zhimin Zhang, Hongmei Lu
{"title":"基于体素投影面积和深度学习的准确、合理的碰撞截面预测","authors":"Jiongyu Wang,&nbsp;Yuxuan Liao,&nbsp;Ting Xie,&nbsp;Ruixi Chen,&nbsp;Jiahui Lai,&nbsp;Zhimin Zhang,&nbsp;Hongmei Lu","doi":"10.1002/cem.70040","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Ion mobility spectrometry–mass spectrometry (IMS-MS) enables rapid acquisition of collision cross section (CCS), a critical physicochemical property for analyte characterization. Despite CCS being theoretically defined as the rotationally averaged projected area of 3D atomic spheres, existing models have underutilized this geometric insight. Here, we present a projected area–based CCS prediction method (PACCS). It integrates voxel-projected area approximation, graph neural network (GNN)–extracted features, and <i>m/z</i> to achieve accurate and rational CCS prediction. A voxel-based algorithm efficiently calculates molecular projected areas by leveraging Fibonacci grids sampling and discretizing 3D conformers into voxel grids. PACCS demonstrates exceptional performance, achieving a median relative error (MedRE) of 1.03% and a coefficient of determination (<i>R</i><sup>2</sup>) of 0.994 on the test set. External test set against AllCCS2, GraphCCS, SigmaCCS, CCSbase, and DeepCCS highlights the superiority of PACCS, with 80.1% of predictions exhibiting &lt; 3% error. Notably, PACCS exhibits broad applicability across diverse molecular types, including environmental contaminants (<i>R</i><sup>2</sup> = 0.954–0.979) and structurally complex phycotoxins (<i>R</i><sup>2</sup> = 0.961), highlighting the superiority of PACCS in robustness and versatility. Computational efficiency is enhanced via parallelization, enabling large-scale CCS database generation (e.g., 5.9 million entries for ChEMBL within 10 h). Ablation studies confirm the pivotal role of voxel-projected areas (Pearson correlation coefficients &gt; 0.988), while stability analyses reveal minimal sensitivity to conformational variability (standard deviation of <i>R</i><sup>2</sup> is 0.00003). PACCS provides an open-source, scalable solution for expanding CCS databases, advancing compound identification in metabolomics and environmental analysis.</p>\n </div>","PeriodicalId":15274,"journal":{"name":"Journal of Chemometrics","volume":"39 7","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accurate and Rational Collision Cross Section Prediction Using Voxel-Projected Area and Deep Learning\",\"authors\":\"Jiongyu Wang,&nbsp;Yuxuan Liao,&nbsp;Ting Xie,&nbsp;Ruixi Chen,&nbsp;Jiahui Lai,&nbsp;Zhimin Zhang,&nbsp;Hongmei Lu\",\"doi\":\"10.1002/cem.70040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>Ion mobility spectrometry–mass spectrometry (IMS-MS) enables rapid acquisition of collision cross section (CCS), a critical physicochemical property for analyte characterization. Despite CCS being theoretically defined as the rotationally averaged projected area of 3D atomic spheres, existing models have underutilized this geometric insight. Here, we present a projected area–based CCS prediction method (PACCS). It integrates voxel-projected area approximation, graph neural network (GNN)–extracted features, and <i>m/z</i> to achieve accurate and rational CCS prediction. A voxel-based algorithm efficiently calculates molecular projected areas by leveraging Fibonacci grids sampling and discretizing 3D conformers into voxel grids. PACCS demonstrates exceptional performance, achieving a median relative error (MedRE) of 1.03% and a coefficient of determination (<i>R</i><sup>2</sup>) of 0.994 on the test set. External test set against AllCCS2, GraphCCS, SigmaCCS, CCSbase, and DeepCCS highlights the superiority of PACCS, with 80.1% of predictions exhibiting &lt; 3% error. Notably, PACCS exhibits broad applicability across diverse molecular types, including environmental contaminants (<i>R</i><sup>2</sup> = 0.954–0.979) and structurally complex phycotoxins (<i>R</i><sup>2</sup> = 0.961), highlighting the superiority of PACCS in robustness and versatility. Computational efficiency is enhanced via parallelization, enabling large-scale CCS database generation (e.g., 5.9 million entries for ChEMBL within 10 h). Ablation studies confirm the pivotal role of voxel-projected areas (Pearson correlation coefficients &gt; 0.988), while stability analyses reveal minimal sensitivity to conformational variability (standard deviation of <i>R</i><sup>2</sup> is 0.00003). PACCS provides an open-source, scalable solution for expanding CCS databases, advancing compound identification in metabolomics and environmental analysis.</p>\\n </div>\",\"PeriodicalId\":15274,\"journal\":{\"name\":\"Journal of Chemometrics\",\"volume\":\"39 7\",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2025-07-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemometrics\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cem.70040\",\"RegionNum\":4,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"SOCIAL WORK\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemometrics","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cem.70040","RegionNum":4,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"SOCIAL WORK","Score":null,"Total":0}
引用次数: 0

摘要

离子迁移谱-质谱(IMS-MS)可以快速获取碰撞截面(CCS),这是分析物表征的关键物理化学性质。尽管CCS在理论上被定义为三维原子球体的旋转平均投影面积,但现有的模型并没有充分利用这种几何洞察力。本文提出了一种基于投影区域的CCS预测方法(PACCS)。结合体素投影面积逼近、图神经网络(GNN)提取特征和m/z,实现准确合理的CCS预测。基于体素的算法通过利用斐波那契网格采样和离散三维构象到体素网格有效地计算分子投影区域。PACCS表现出优异的性能,在测试集上的中位相对误差(MedRE)为1.03%,决定系数(R2)为0.994。针对AllCCS2、GraphCCS、SigmaCCS、CCSbase和DeepCCS的外部测试集突出了PACCS的优势,80.1%的预测显示出<; 3%的误差。值得注意的是,PACCS在不同的分子类型中表现出广泛的适用性,包括环境污染物(R2 = 0.954-0.979)和结构复杂的藻毒素(R2 = 0.961),这突出了PACCS在稳健性和通用性方面的优势。通过并行化提高了计算效率,实现了大规模的CCS数据库生成(例如,在10小时内为ChEMBL生成590万个条目)。消融研究证实了体素投影区域的关键作用(Pearson相关系数>; 0.988),而稳定性分析显示对构象变异性的敏感性最小(R2的标准差为0.00003)。PACCS提供了一个开源的、可扩展的解决方案,用于扩展CCS数据库,推进代谢组学和环境分析中的化合物鉴定。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Accurate and Rational Collision Cross Section Prediction Using Voxel-Projected Area and Deep Learning

Ion mobility spectrometry–mass spectrometry (IMS-MS) enables rapid acquisition of collision cross section (CCS), a critical physicochemical property for analyte characterization. Despite CCS being theoretically defined as the rotationally averaged projected area of 3D atomic spheres, existing models have underutilized this geometric insight. Here, we present a projected area–based CCS prediction method (PACCS). It integrates voxel-projected area approximation, graph neural network (GNN)–extracted features, and m/z to achieve accurate and rational CCS prediction. A voxel-based algorithm efficiently calculates molecular projected areas by leveraging Fibonacci grids sampling and discretizing 3D conformers into voxel grids. PACCS demonstrates exceptional performance, achieving a median relative error (MedRE) of 1.03% and a coefficient of determination (R2) of 0.994 on the test set. External test set against AllCCS2, GraphCCS, SigmaCCS, CCSbase, and DeepCCS highlights the superiority of PACCS, with 80.1% of predictions exhibiting < 3% error. Notably, PACCS exhibits broad applicability across diverse molecular types, including environmental contaminants (R2 = 0.954–0.979) and structurally complex phycotoxins (R2 = 0.961), highlighting the superiority of PACCS in robustness and versatility. Computational efficiency is enhanced via parallelization, enabling large-scale CCS database generation (e.g., 5.9 million entries for ChEMBL within 10 h). Ablation studies confirm the pivotal role of voxel-projected areas (Pearson correlation coefficients > 0.988), while stability analyses reveal minimal sensitivity to conformational variability (standard deviation of R2 is 0.00003). PACCS provides an open-source, scalable solution for expanding CCS databases, advancing compound identification in metabolomics and environmental analysis.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Chemometrics
Journal of Chemometrics 化学-分析化学
CiteScore
5.20
自引率
8.30%
发文量
78
审稿时长
2 months
期刊介绍: The Journal of Chemometrics is devoted to the rapid publication of original scientific papers, reviews and short communications on fundamental and applied aspects of chemometrics. It also provides a forum for the exchange of information on meetings and other news relevant to the growing community of scientists who are interested in chemometrics and its applications. Short, critical review papers are a particularly important feature of the journal, in view of the multidisciplinary readership at which it is aimed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信