From mundane to surprising nonadditivity: drivers and impact on ML models

IF 3.1 3区生物学 Q3 BIOCHEMISTRY & MOLECULAR BIOLOGY

Journal of Computer-Aided Molecular Design Pub Date : 2024-07-25 DOI:10.1007/s10822-024-00566-0

Laura Guasch, Niels Maeder, John G. Cumming, Christian Kramer

{"title":"From mundane to surprising nonadditivity: drivers and impact on ML models","authors":"Laura Guasch, Niels Maeder, John G. Cumming, Christian Kramer","doi":"10.1007/s10822-024-00566-0","DOIUrl":null,"url":null,"abstract":"<div><p>Nonadditivity (NA) in Structure-Activity and Structure-Property Relationship (SAR) data is a rare but very information rich phenomenon. It can indicate conformational flexibility, structural rearrangements, and errors in assay results and structural assignment. While purely ligand-based conformational causes of NA are rather well understood and mundane, other factors are less so and cause surprising NA that has a huge influence on SAR analysis and ML model performance. We here report a systematic analysis across a wide range of properties (20 on-target biological activities and 4 physicochemical ADME-related properties) to understand the frequency of various different phenomena that may lead to NA. A set of novel descriptors were developed to characterize double transformation cycles and identify trends in NA. Double transformation cycles were classified into “surprising” and “mundane” categories, with the majority being classed as mundane. We also examined commonalities among surprising cycles, finding LogP differences to have the most significant impact on NA. A distinct behavior of NA for on-target sets compared to ADME sets was observed. Finally, we show that machine learning models struggle with highly nonadditive data, indicating that a better understanding of NA is an important future research direction.</p></div>","PeriodicalId":621,"journal":{"name":"Journal of Computer-Aided Molecular Design","volume":"38 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer-Aided Molecular Design","FirstCategoryId":"99","ListUrlMain":"https://link.springer.com/article/10.1007/s10822-024-00566-0","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Nonadditivity (NA) in Structure-Activity and Structure-Property Relationship (SAR) data is a rare but very information rich phenomenon. It can indicate conformational flexibility, structural rearrangements, and errors in assay results and structural assignment. While purely ligand-based conformational causes of NA are rather well understood and mundane, other factors are less so and cause surprising NA that has a huge influence on SAR analysis and ML model performance. We here report a systematic analysis across a wide range of properties (20 on-target biological activities and 4 physicochemical ADME-related properties) to understand the frequency of various different phenomena that may lead to NA. A set of novel descriptors were developed to characterize double transformation cycles and identify trends in NA. Double transformation cycles were classified into “surprising” and “mundane” categories, with the majority being classed as mundane. We also examined commonalities among surprising cycles, finding LogP differences to have the most significant impact on NA. A distinct behavior of NA for on-target sets compared to ADME sets was observed. Finally, we show that machine learning models struggle with highly nonadditive data, indicating that a better understanding of NA is an important future research direction.

Abstract Image

查看原文本刊更多论文

从平凡到令人惊讶的非加性：驱动因素和对 ML 模型的影响。

结构-活性和结构-性质关系（SAR）数据中的非相加性（NA）是一种罕见但信息丰富的现象。它可以表明构象的灵活性、结构的重排以及检测结果和结构分配的错误。虽然纯粹基于配体的构象原因导致的 NA 比较容易理解，也很普通，但其他因素就不那么容易理解了，它们会导致令人惊讶的 NA，对 SAR 分析和 ML 模型性能产生巨大影响。我们在此报告了对各种性质（20 种靶上生物活性和 4 种物理化学 ADME 相关性质）的系统分析，以了解可能导致 NA 的各种不同现象的发生频率。我们开发了一套新的描述指标来描述双重转化周期并确定 NA 的趋势。双重转化周期被分为 "惊人 "和 "平凡 "两类，其中大多数被归为平凡类。我们还研究了令人惊讶的周期之间的共性，发现 LogP 差异对 NA 的影响最大。我们还观察到，与 ADME 集相比，目标集的 NA 具有独特的行为。最后，我们发现机器学习模型在处理高度非加性数据时非常吃力，这表明更好地理解NA是未来的一个重要研究方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Computer-Aided Molecular Design 生物-计算机：跨学科应用

CiteScore

8.00

自引率

8.60%

发文量

审稿时长

3 months

期刊介绍： The Journal of Computer-Aided Molecular Design provides a form for disseminating information on both the theory and the application of computer-based methods in the analysis and design of molecules. The scope of the journal encompasses papers which report new and original research and applications in the following areas: - theoretical chemistry; - computational chemistry; - computer and molecular graphics; - molecular modeling; - protein engineering; - drug design; - expert systems; - general structure-property relationships; - molecular dynamics; - chemical database development and usage.