Giorgos Papanastasiou, Pedro P Sanchez, Argyrios Christodoulidis, Guang Yang, Walter Hugo Lopez Pinaya
{"title":"Confounder-aware foundation modeling for accurate phenotype profiling in cell imaging.","authors":"Giorgos Papanastasiou, Pedro P Sanchez, Argyrios Christodoulidis, Guang Yang, Walter Hugo Lopez Pinaya","doi":"10.1038/s44303-025-00116-9","DOIUrl":null,"url":null,"abstract":"<p><p>Image-based profiling is rapidly transforming drug discovery, offering unprecedented insights into cellular responses. However, experimental variability hinders accurate identification of mechanisms of action (MoA) and compound targets. Existing methods commonly fail to generalize to novel compounds, limiting their utility in exploring uncharted chemical space. To address this, we present a confounder-aware foundation model integrating a causal mechanism within a latent diffusion model, enabling the generation of balanced synthetic datasets for robust biological effect estimation. Trained on over 13 million Cell Painting images and 107 thousand compounds, our model learns robust cellular phenotype representations, mitigating confounder impact. We achieve state-of-the-art MoA and target prediction for both seen (0.66 and 0.65 ROC-AUC) and unseen compounds (0.65 and 0.73 ROC-AUC), significantly surpassing real and batch-corrected data. This innovative framework advances drug discovery by delivering robust biological effect estimations for novel compounds, potentially accelerating hit expansion. Our model establishes a scalable and adaptable foundation for cell imaging, holding the potential to become a cornerstone in data-driven drug discovery.</p>","PeriodicalId":501709,"journal":{"name":"npj Imaging","volume":"3 1","pages":"52"},"PeriodicalIF":0.0000,"publicationDate":"2025-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1038/s44303-025-00116-9","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Image-based profiling is rapidly transforming drug discovery, offering unprecedented insights into cellular responses. However, experimental variability hinders accurate identification of mechanisms of action (MoA) and compound targets. Existing methods commonly fail to generalize to novel compounds, limiting their utility in exploring uncharted chemical space. To address this, we present a confounder-aware foundation model integrating a causal mechanism within a latent diffusion model, enabling the generation of balanced synthetic datasets for robust biological effect estimation. Trained on over 13 million Cell Painting images and 107 thousand compounds, our model learns robust cellular phenotype representations, mitigating confounder impact. We achieve state-of-the-art MoA and target prediction for both seen (0.66 and 0.65 ROC-AUC) and unseen compounds (0.65 and 0.73 ROC-AUC), significantly surpassing real and batch-corrected data. This innovative framework advances drug discovery by delivering robust biological effect estimations for novel compounds, potentially accelerating hit expansion. Our model establishes a scalable and adaptable foundation for cell imaging, holding the potential to become a cornerstone in data-driven drug discovery.