CPSign: conformal prediction for cheminformatics modeling

IF 7.1 2区 化学 Q1 CHEMISTRY, MULTIDISCIPLINARY
Staffan Arvidsson McShane, Ulf Norinder, Jonathan Alvarsson, Ernst Ahlberg, Lars Carlsson, Ola Spjuth
{"title":"CPSign: conformal prediction for cheminformatics modeling","authors":"Staffan Arvidsson McShane,&nbsp;Ulf Norinder,&nbsp;Jonathan Alvarsson,&nbsp;Ernst Ahlberg,&nbsp;Lars Carlsson,&nbsp;Ola Spjuth","doi":"10.1186/s13321-024-00870-9","DOIUrl":null,"url":null,"abstract":"<div><p>Conformal prediction has seen many applications in pharmaceutical science, being able to calibrate outputs of machine learning models and producing valid prediction intervals. We here present the open source software CPSign that is a complete implementation of conformal prediction for cheminformatics modeling. CPSign implements inductive and transductive conformal prediction for classification and regression, and probabilistic prediction with the Venn-ABERS methodology. The main chemical representation is signatures but other types of descriptors are also supported. The main modeling methodology is support vector machines (SVMs), but additional modeling methods are supported via an extension mechanism, e.g. DeepLearning4J models. We also describe features for visualizing results from conformal models including calibration and efficiency plots, as well as features to publish predictive models as REST services. We compare CPSign against other common cheminformatics modeling approaches including random forest, and a directed message-passing neural network. The results show that CPSign produces robust predictive performance with comparative predictive efficiency, with superior runtime and lower hardware requirements compared to neural network based models. CPSign has been used in several studies and is in production-use in multiple organizations. The ability to work directly with chemical input files, perform descriptor calculation and modeling with SVM in the conformal prediction framework, with a single software package having a low footprint and fast execution time makes CPSign a convenient and yet flexible package for training, deploying, and predicting on chemical data. CPSign can be downloaded from GitHub at https://github.com/arosbio/cpsign.</p><p><b>Scientific contribution</b></p><p> CPSign provides a single software that allows users to perform data preprocessing, modeling and make predictions directly on chemical structures, using conformal and probabilistic prediction. Building and evaluating new models can be achieved at a high abstraction level, without sacrificing flexibility and predictive performance—showcased with a method evaluation against contemporary modeling approaches, where CPSign performs on par with a state-of-the-art deep learning based model.</p></div>","PeriodicalId":617,"journal":{"name":"Journal of Cheminformatics","volume":"16 1","pages":""},"PeriodicalIF":7.1000,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://jcheminf.biomedcentral.com/counter/pdf/10.1186/s13321-024-00870-9","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cheminformatics","FirstCategoryId":"92","ListUrlMain":"https://link.springer.com/article/10.1186/s13321-024-00870-9","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Conformal prediction has seen many applications in pharmaceutical science, being able to calibrate outputs of machine learning models and producing valid prediction intervals. We here present the open source software CPSign that is a complete implementation of conformal prediction for cheminformatics modeling. CPSign implements inductive and transductive conformal prediction for classification and regression, and probabilistic prediction with the Venn-ABERS methodology. The main chemical representation is signatures but other types of descriptors are also supported. The main modeling methodology is support vector machines (SVMs), but additional modeling methods are supported via an extension mechanism, e.g. DeepLearning4J models. We also describe features for visualizing results from conformal models including calibration and efficiency plots, as well as features to publish predictive models as REST services. We compare CPSign against other common cheminformatics modeling approaches including random forest, and a directed message-passing neural network. The results show that CPSign produces robust predictive performance with comparative predictive efficiency, with superior runtime and lower hardware requirements compared to neural network based models. CPSign has been used in several studies and is in production-use in multiple organizations. The ability to work directly with chemical input files, perform descriptor calculation and modeling with SVM in the conformal prediction framework, with a single software package having a low footprint and fast execution time makes CPSign a convenient and yet flexible package for training, deploying, and predicting on chemical data. CPSign can be downloaded from GitHub at https://github.com/arosbio/cpsign.

Scientific contribution

CPSign provides a single software that allows users to perform data preprocessing, modeling and make predictions directly on chemical structures, using conformal and probabilistic prediction. Building and evaluating new models can be achieved at a high abstraction level, without sacrificing flexibility and predictive performance—showcased with a method evaluation against contemporary modeling approaches, where CPSign performs on par with a state-of-the-art deep learning based model.

CPSign:用于化学信息学建模的保形预测
共形预测在制药科学中有许多应用,它能够校准机器学习模型的输出结果,并产生有效的预测区间。我们在此介绍开源软件 CPSign,它是保形预测在化学信息学建模中的完整实现。CPSign 实现了用于分类和回归的归纳和转导保形预测,以及采用 Venn-ABERS 方法的概率预测。主要的化学表征是签名,但也支持其他类型的描述符。主要的建模方法是支持向量机(SVM),但也通过扩展机制支持其他建模方法,例如 DeepLearning4J 模型。我们还介绍了可视化保形模型结果的功能,包括校准和效率图,以及将预测模型发布为 REST 服务的功能。我们将 CPSign 与其他常见的化学信息学建模方法(包括随机森林和有向消息传递神经网络)进行了比较。结果表明,与基于神经网络的模型相比,CPSign 可产生稳健的预测性能,预测效率更高,运行时间更长,硬件要求更低。CPSign 已被用于多项研究,并在多个机构投入生产使用。CPSign 可以直接处理化学输入文件,在保形预测框架内使用 SVM 进行描述符计算和建模,而且只需一个软件包,占用空间小,执行速度快,因此是一个方便灵活的软件包,可用于化学数据的训练、部署和预测。CPSign 可从 GitHub 下载:https://github.com/arosbio/cpsign 。科学贡献 CPSign 提供了一个单一的软件,允许用户使用保形预测和概率预测,直接对化学结构进行数据预处理、建模和预测。在不牺牲灵活性和预测性能的情况下,可以在高抽象级别上构建和评估新模型--通过与当代建模方法的方法评估可以看出,CPSign 的性能与基于深度学习的最先进模型相当。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Cheminformatics
Journal of Cheminformatics CHEMISTRY, MULTIDISCIPLINARY-COMPUTER SCIENCE, INFORMATION SYSTEMS
CiteScore
14.10
自引率
7.00%
发文量
82
审稿时长
3 months
期刊介绍: Journal of Cheminformatics is an open access journal publishing original peer-reviewed research in all aspects of cheminformatics and molecular modelling. Coverage includes, but is not limited to: chemical information systems, software and databases, and molecular modelling, chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases, computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信