Rafael Mena-Yedra, Juana López Redondo, Horacio Pérez-Sánchez, Pilar Martinez Ortigosa
{"title":"ALMERIA: Boosting Pairwise Molecular Contrasts with Scalable Methods","authors":"Rafael Mena-Yedra, Juana López Redondo, Horacio Pérez-Sánchez, Pilar Martinez Ortigosa","doi":"10.15388/24-infor558","DOIUrl":null,"url":null,"abstract":"This work introduces ALMERIA, a decision-support tool for drug discovery. It estimates compound similarities and predicts activity, considering conformation variability. The methodology spans from data preparation to model selection and optimization. Implemented using scalable software, it handles large data volumes swiftly. Experiments were conducted on a distributed computer cluster using the DUD-E database. Models were evaluated on different data partitions to assess generalization ability with new compounds. The tool demonstrates excellent performance in molecular activity prediction (ROC AUC: 0.99, 0.96, 0.87), indicating good generalization properties of the chosen data representation and modelling. Molecular conformation sensitivity is also evaluated.\nPDF XML","PeriodicalId":56292,"journal":{"name":"Informatica","volume":"304 1","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatica","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.15388/24-infor558","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This work introduces ALMERIA, a decision-support tool for drug discovery. It estimates compound similarities and predicts activity, considering conformation variability. The methodology spans from data preparation to model selection and optimization. Implemented using scalable software, it handles large data volumes swiftly. Experiments were conducted on a distributed computer cluster using the DUD-E database. Models were evaluated on different data partitions to assess generalization ability with new compounds. The tool demonstrates excellent performance in molecular activity prediction (ROC AUC: 0.99, 0.96, 0.87), indicating good generalization properties of the chosen data representation and modelling. Molecular conformation sensitivity is also evaluated.
PDF XML
期刊介绍:
The quarterly journal Informatica provides an international forum for high-quality original research and publishes papers on mathematical simulation and optimization, recognition and control, programming theory and systems, automation systems and elements. Informatica provides a multidisciplinary forum for scientists and engineers involved in research and design including experts who implement and manage information systems applications.