Andrea Hunklinger , Peter Hartog , Martin Šícho , Guillaume Godin , Igor V. Tetko
{"title":"openOCHEM 共识模型是第一届 EUOS/SLAS 联合化合物溶解度挑战赛中表现最佳的开源预测模型","authors":"Andrea Hunklinger , Peter Hartog , Martin Šícho , Guillaume Godin , Igor V. Tetko","doi":"10.1016/j.slasd.2024.01.005","DOIUrl":null,"url":null,"abstract":"<div><p>The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website <span>https://ochem.eu/article/27</span><svg><path></path></svg>. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants.</p></div>","PeriodicalId":21764,"journal":{"name":"SLAS Discovery","volume":"29 2","pages":"Article 100144"},"PeriodicalIF":2.7000,"publicationDate":"2024-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2472555224000066/pdfft?md5=6b7aa512858162a77178db862a6715d1&pid=1-s2.0-S2472555224000066-main.pdf","citationCount":"0","resultStr":"{\"title\":\"The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS joint compound solubility challenge\",\"authors\":\"Andrea Hunklinger , Peter Hartog , Martin Šícho , Guillaume Godin , Igor V. Tetko\",\"doi\":\"10.1016/j.slasd.2024.01.005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website <span>https://ochem.eu/article/27</span><svg><path></path></svg>. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants.</p></div>\",\"PeriodicalId\":21764,\"journal\":{\"name\":\"SLAS Discovery\",\"volume\":\"29 2\",\"pages\":\"Article 100144\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2024-02-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2472555224000066/pdfft?md5=6b7aa512858162a77178db862a6715d1&pid=1-s2.0-S2472555224000066-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"SLAS Discovery\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2472555224000066\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"SLAS Discovery","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2472555224000066","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS joint compound solubility challenge
The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.eu/article/27. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants.
期刊介绍:
Advancing Life Sciences R&D: SLAS Discovery reports how scientists develop and utilize novel technologies and/or approaches to provide and characterize chemical and biological tools to understand and treat human disease.
SLAS Discovery is a peer-reviewed journal that publishes scientific reports that enable and improve target validation, evaluate current drug discovery technologies, provide novel research tools, and incorporate research approaches that enhance depth of knowledge and drug discovery success.
SLAS Discovery emphasizes scientific and technical advances in target identification/validation (including chemical probes, RNA silencing, gene editing technologies); biomarker discovery; assay development; virtual, medium- or high-throughput screening (biochemical and biological, biophysical, phenotypic, toxicological, ADME); lead generation/optimization; chemical biology; and informatics (data analysis, image analysis, statistics, bio- and chemo-informatics). Review articles on target biology, new paradigms in drug discovery and advances in drug discovery technologies.
SLAS Discovery is of particular interest to those involved in analytical chemistry, applied microbiology, automation, biochemistry, bioengineering, biomedical optics, biotechnology, bioinformatics, cell biology, DNA science and technology, genetics, information technology, medicinal chemistry, molecular biology, natural products chemistry, organic chemistry, pharmacology, spectroscopy, and toxicology.
SLAS Discovery is a member of the Committee on Publication Ethics (COPE) and was published previously (1996-2016) as the Journal of Biomolecular Screening (JBS).