PatternsPub Date : 2024-05-08DOI: 10.1016/j.patter.2024.100991
Zhehuan Fan, Jie Yu, Xiang Zhang, Yijie Chen, Shihui Sun, Yuanyuan Zhang, Mingan Chen, Fu Xiao, Wenyong Wu, Xutong Li, Mingyue Zheng, Xiaomin Luo, Dingyan Wang
{"title":"Reducing overconfident errors in molecular property classification using Posterior Network","authors":"Zhehuan Fan, Jie Yu, Xiang Zhang, Yijie Chen, Shihui Sun, Yuanyuan Zhang, Mingan Chen, Fu Xiao, Wenyong Wu, Xutong Li, Mingyue Zheng, Xiaomin Luo, Dingyan Wang","doi":"10.1016/j.patter.2024.100991","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100991","url":null,"abstract":"<p>Deep-learning-based classification models are increasingly used for predicting molecular properties in drug development. However, traditional classification models using the Softmax function often give overconfident mispredictions for out-of-distribution samples, highlighting a critical lack of accurate uncertainty estimation. Such limitations can result in substantial costs and should be avoided during drug development. Inspired by advances in evidential deep learning and Posterior Network, we replaced the Softmax function with a normalizing flow to enhance the uncertainty estimation ability of the model in molecular property classification. The proposed strategy was evaluated across diverse scenarios, including simulated experiments based on a synthetic dataset, ADMET predictions, and ligand-based virtual screening. The results demonstrate that compared with the vanilla model, the proposed strategy effectively alleviates the problem of giving overconfident but incorrect predictions. Our findings support the promising application of evidential deep learning in drug development and offer a valuable framework for further research.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"29 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140933408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PatternsPub Date : 2024-05-03DOI: 10.1016/j.patter.2024.100983
Abdelrahman Sharafeldin, Nabil Imam, Hannah Choi
{"title":"Active sensing with predictive coding and uncertainty minimization","authors":"Abdelrahman Sharafeldin, Nabil Imam, Hannah Choi","doi":"10.1016/j.patter.2024.100983","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100983","url":null,"abstract":"<p>We present an end-to-end architecture for embodied exploration inspired by two biological computations: predictive coding and uncertainty minimization. The architecture can be applied to any exploration setting in a task-independent and intrinsically driven manner. We first demonstrate our approach in a maze navigation task and show that it can discover the underlying transition distributions and spatial features of the environment. Second, we apply our model to a more complex active vision task, whereby an agent actively samples its visual environment to gather information. We show that our model builds unsupervised representations through exploration that allow it to efficiently categorize visual scenes. We further show that using these representations for downstream classification leads to superior data efficiency and learning speed compared to other baselines while maintaining lower parameter complexity. Finally, the modular structure of our model facilitates interpretability, allowing us to probe its internal mechanisms and representations during exploration.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"9 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PatternsPub Date : 2024-05-02DOI: 10.1016/j.patter.2024.100987
Kelly Rootes-Murdy, Sandeep Panta, Ross Kelly, Javier Romero, Yann Quidé, Murray J. Cairns, Carmel Loughland, Vaughan J. Carr, Stanley V. Catts, Assen Jablensky, Melissa J. Green, Frans Henskens, Dylan Kiltschewskij, Patricia T. Michie, Bryan Mowry, Christos Pantelis, Paul E. Rasser, William R. Reay, Ulrich Schall, Rodney J. Scott, Vince D. Calhoun
{"title":"Cortical similarities in psychiatric and mood disorders identified in federated VBM analysis via COINSTAC","authors":"Kelly Rootes-Murdy, Sandeep Panta, Ross Kelly, Javier Romero, Yann Quidé, Murray J. Cairns, Carmel Loughland, Vaughan J. Carr, Stanley V. Catts, Assen Jablensky, Melissa J. Green, Frans Henskens, Dylan Kiltschewskij, Patricia T. Michie, Bryan Mowry, Christos Pantelis, Paul E. Rasser, William R. Reay, Ulrich Schall, Rodney J. Scott, Vince D. Calhoun","doi":"10.1016/j.patter.2024.100987","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100987","url":null,"abstract":"<p>Structural neuroimaging studies have identified a combination of shared and disorder-specific patterns of gray matter (GM) deficits across psychiatric disorders. Pooling large data allows for examination of a possible common neuroanatomical basis that may identify a certain vulnerability for mental illness. Large-scale collaborative research is already facilitated by data repositories, institutionally supported databases, and data archives. However, these data-sharing methodologies can suffer from significant barriers. Federated approaches augment these approaches by enabling access or more sophisticated, shareable and scaled-up analyses of large-scale data. We examined GM alterations using Collaborative Informatics and Neuroimaging Suite Toolkit for Anonymous Computation, an open-source, decentralized analysis application. Through federated analysis of eight sites, we identified significant overlap in the GM patterns (<em>n</em> = 4,102) of individuals with schizophrenia, major depressive disorder, and autism spectrum disorder. These results show cortical and subcortical regions that may indicate a shared vulnerability to psychiatric disorders.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"9 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MUSTANG: Multi-sample spatial transcriptomics data analysis with cross-sample transcriptional similarity guidance","authors":"Seyednami Niyakan, Jianting Sheng, Yuliang Cao, Xiang Zhang, Zhan Xu, Ling Wu, Stephen T.C. Wong, Xiaoning Qian","doi":"10.1016/j.patter.2024.100986","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100986","url":null,"abstract":"<p>Spatially resolved transcriptomics has revolutionized genome-scale transcriptomic profiling by providing high-resolution characterization of transcriptional patterns. Here, we present our spatial transcriptomics analysis framework, MUSTANG (MUlti-sample Spatial Transcriptomics data ANalysis with cross-sample transcriptional similarity Guidance), which is capable of performing multi-sample spatial transcriptomics spot cellular deconvolution by allowing both cross-sample expression-based similarity information sharing as well as spatial correlation in gene expression patterns within samples. Experiments on a semi-synthetic spatial transcriptomics dataset and three real-world spatial transcriptomics datasets demonstrate the effectiveness of MUSTANG in revealing biological insights inherent in the cellular characterization of tissue samples under study.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"32 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PatternsPub Date : 2024-05-02DOI: 10.1016/j.patter.2024.100985
Guangyu Wang, Kai Wang, Yuanxu Gao, Longbin Chen, Tianrun Gao, Yuanlin Ma, Zeyu Jiang, Guoxing Yang, Fajin Feng, Shuoping Zhang, Yifan Gu, Guangdong Liu, Lei Chen, Li-Shuang Ma, Ye Sang, Yanwen Xu, Ge Lin, Xiaohong Liu
{"title":"A generalized AI system for human embryo selection covering the entire IVF cycle via multi-modal contrastive learning","authors":"Guangyu Wang, Kai Wang, Yuanxu Gao, Longbin Chen, Tianrun Gao, Yuanlin Ma, Zeyu Jiang, Guoxing Yang, Fajin Feng, Shuoping Zhang, Yifan Gu, Guangdong Liu, Lei Chen, Li-Shuang Ma, Ye Sang, Yanwen Xu, Ge Lin, Xiaohong Liu","doi":"10.1016/j.patter.2024.100985","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100985","url":null,"abstract":"<p><em>In vitro</em> fertilization (IVF) has revolutionized infertility treatment, benefiting millions of couples worldwide. However, current clinical practices for embryo selection rely heavily on visual inspection of morphology, which is highly variable and experience dependent. Here, we propose a comprehensive artificial intelligence (AI) system that can interpret embryo-developmental knowledge encoded in vast unlabeled multi-modal datasets and provide personalized embryo selection. This AI platform consists of a transformer-based network backbone named IVFormer and a self-supervised learning framework, VTCLR (visual-temporal contrastive learning of representations), for training multi-modal embryo representations pre-trained on large and unlabeled data. When evaluated on clinical scenarios covering the entire IVF cycle, our pre-trained AI model demonstrates accurate and reliable performance on euploidy ranking and live-birth occurrence prediction. For AI vs. physician for euploidy ranking, our model achieved superior performance across all score categories. The results demonstrate the potential of the AI system as a non-invasive, efficient, and cost-effective tool to improve embryo selection and IVF outcomes.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"75 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PatternsPub Date : 2024-05-01DOI: 10.1016/j.patter.2024.100973
Ruoqi Liu, Pin-Yu Chen, Ping Zhang
{"title":"CURE: A deep learning framework pre-trained on large-scale patient data for treatment effect estimation","authors":"Ruoqi Liu, Pin-Yu Chen, Ping Zhang","doi":"10.1016/j.patter.2024.100973","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100973","url":null,"abstract":"<p>Treatment effect estimation (TEE) aims to identify the causal effects of treatments on important outcomes. Current machine-learning-based methods, mainly trained on labeled data for specific treatments or outcomes, can be sub-optimal with limited labeled data. In this article, we propose a new pre-training and fine-tuning framework, CURE (causal treatment effect estimation), for TEE from observational data. CURE is pre-trained on large-scale unlabeled patient data to learn representative contextual patient representations and fine-tuned on labeled patient data for TEE. We present a new sequence encoding approach for longitudinal patient data embedding both structure and time. Evaluated on four downstream TEE tasks, CURE outperforms the state-of-the-art methods, marking a 7% increase in area under the precision-recall curve and an 8% rise in the influence-function-based precision of estimating heterogeneous effects. Validation with four randomized clinical trials confirms its efficacy in producing trial conclusions, highlighting CURE’s capacity to supplement traditional clinical trials.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"2011 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832895","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PatternsPub Date : 2024-05-01DOI: 10.1016/j.patter.2024.100982
Taykhoom Dalal, Chirag J. Patel
{"title":"PYPE: A pipeline for phenome-wide association and Mendelian randomization in investigator-driven biobank scale analysis","authors":"Taykhoom Dalal, Chirag J. Patel","doi":"10.1016/j.patter.2024.100982","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100982","url":null,"abstract":"<p>Phenome-wide association studies (PheWASs) serve as a way of documenting the relationship between genotypes and multiple phenotypes, helping to uncover unexplored genotype-phenotype associations (known as pleiotropy). Secondly, Mendelian randomization (MR) can be harnessed to make causal statements about a pair of phenotypes by comparing their genetic architecture. Thus, approaches that automate both PheWASs and MR can enhance biobank-scale analyses, circumventing the need for multiple tools by providing a comprehensive, end-to-end tool to drive scientific discovery. To this end, we present PYPE, a Python pipeline for running, visualizing, and interpreting PheWASs. PYPE utilizes input genotype or phenotype files to automatically estimate associations between the chosen independent variables and phenotypes. PYPE can also produce a variety of visualizations and can be used to identify nearby genes and functional consequences of significant associations. Finally, PYPE can identify possible causal relationships between phenotypes using MR under a variety of causal effect modeling scenarios.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"36 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140832826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PatternsPub Date : 2024-04-09DOI: 10.1016/j.patter.2024.100968
Rita González-Márquez, Luca Schmidt, Benjamin M. Schmidt, Philipp Berens, Dmitry Kobak
{"title":"The landscape of biomedical research","authors":"Rita González-Márquez, Luca Schmidt, Benjamin M. Schmidt, Philipp Berens, Dmitry Kobak","doi":"10.1016/j.patter.2024.100968","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100968","url":null,"abstract":"<p>The number of publications in biomedicine and life sciences has grown so much that it is difficult to keep track of new scientific works and to have an overview of the evolution of the field as a whole. Here, we present a two-dimensional (2D) map of the entire corpus of biomedical literature, based on the abstract texts of 21 million English articles from the PubMed database. To embed the abstracts into 2D, we used the large language model PubMedBERT, combined with <em>t</em>-SNE tailored to handle samples of this size. We used our map to study the emergence of the COVID-19 literature, the evolution of the neuroscience discipline, the uptake of machine learning, the distribution of gender imbalance in academic authorship, and the distribution of retracted paper mill articles. Furthermore, we present an interactive website that allows easy exploration and will enable further insights and facilitate future research.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"29 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140804659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PatternsPub Date : 2024-04-04DOI: 10.1016/j.patter.2024.100967
Sarah M. Burbach, Bryan Briney
{"title":"Improving antibody language models with native pairing","authors":"Sarah M. Burbach, Bryan Briney","doi":"10.1016/j.patter.2024.100967","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100967","url":null,"abstract":"<p>Existing antibody language models are limited by their use of unpaired antibody sequence data. A recently published dataset of ∼1.6 × 10<sup>6</sup> natively paired human antibody sequences offers a unique opportunity to evaluate how antibody language models are improved by training with native pairs. We trained three baseline antibody language models (BALM), using natively paired (BALM-paired), randomly-paired (BALM-shuffled), or unpaired (BALM-unpaired) sequences from this dataset. To address the paucity of paired sequences, we additionally fine-tuned ESM (evolutionary scale modeling)-2 with natively paired antibody sequences (ft-ESM). We provide evidence that training with native pairs allows the model to learn immunologically relevant features that span the light and heavy chains, which cannot be simulated by training with random pairs. We additionally show that training with native pairs improves model performance on a variety of metrics, including the ability of the model to classify antibodies by pathogen specificity.</p>","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"3 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140804621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PatternsPub Date : 2024-03-14DOI: 10.1016/j.patter.2024.100947
Debsindhu Bhowmik, Pei Zhang, Zachary Fox, Stephan Irle, John Gounley
{"title":"Enhancing molecular design efficiency: Uniting language models and generative networks with genetic algorithms","authors":"Debsindhu Bhowmik, Pei Zhang, Zachary Fox, Stephan Irle, John Gounley","doi":"10.1016/j.patter.2024.100947","DOIUrl":"https://doi.org/10.1016/j.patter.2024.100947","url":null,"abstract":"This study examines the effectiveness of generative models in drug discovery, material science, and polymer science, aiming to overcome constraints associated with traditional inverse design methods relying on heuristic rules. Generative models generate synthetic data resembling real data, enabling deep learning model training without extensive labeled datasets. They prove valuable in creating virtual libraries of molecules for material science and facilitating drug discovery by generating molecules with specific properties. While generative adversarial networks (GANs) are explored for these purposes, mode collapse restricts their efficacy, limiting novel structure variability. To address this, we introduce a masked language model (LM) inspired by natural language processing. Although LMs alone can have inherent limitations, we propose a hybrid architecture combining LMs and GANs to efficiently generate new molecules, demonstrating superior performance over standalone masked LMs, particularly for smaller population sizes. This hybrid LM-GAN architecture enhances efficiency in optimizing properties and generating novel samples.","PeriodicalId":36242,"journal":{"name":"Patterns","volume":"2 1","pages":""},"PeriodicalIF":6.5,"publicationDate":"2024-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140169430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}