Yang Jingping, Wang Qi, Zhang Bolei, Gong Luyu, Guo Yue, Li Erguang
{"title":"Unlocking cross-modal interplay of single-cell and spatial joint profiling with CellMATE","authors":"Yang Jingping, Wang Qi, Zhang Bolei, Gong Luyu, Guo Yue, Li Erguang","doi":"10.1101/2024.09.06.610031","DOIUrl":"https://doi.org/10.1101/2024.09.06.610031","url":null,"abstract":"A key advantage of single-cell multimodal joint profiling is the modality interplay, which is essential for deciphering the cell fate. However, while current analytical methods can leverage the additive benefits, they fall short to explore the synergistic insights of joint profiling, thereby diminishing the advantage of joint profiling. Here, we introduce CellMATE, a Multi-head Adversarial Training-based Early-integration approach specifically developed for multimodal joint profiling. CellMATE can capture both additive and synergistic benefits inherent in joint profiling through auto-learning of multimodal distributions and simultaneously represents all features into a unified latent space. Through extensive evaluation across diverse joint profiling scenarios, CellMATE demonstrated its superiority in ensuring utility of cross-modal properties, uncovering cellular heterogeneity and plasticity, and delineating differentiation trajectories. CellMATE uniquely unlocks the full potential of joint profiling to elucidate the dynamic nature of cells during critical processes as differentiation, development and diseases.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jong Bhak, Dougu Nam, Kyungwhan An, Yoonsung Kwon, Jihun Bhak, Sungwon Jeon, Hyojung Ryu
{"title":"Reversible Transcriptomic Age Shifts from Physiological Stress in Whole Blood","authors":"Jong Bhak, Dougu Nam, Kyungwhan An, Yoonsung Kwon, Jihun Bhak, Sungwon Jeon, Hyojung Ryu","doi":"10.1101/2024.09.08.611853","DOIUrl":"https://doi.org/10.1101/2024.09.08.611853","url":null,"abstract":"We develop a genome-wide transcriptomic clock for predicting chronological age using whole blood samples from 463 healthy individuals. Our findings reveal profound age acceleration, up to 24.47 years, under perturbed homeostasis in COVID-19 patients, which reverted to baseline upon recovery. This study demonstrates that the whole blood transcriptome can track reversible changes in biological age induced by stressors in real physiological time, suggesting a potential role for anti-aging interventions in disease management.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amos Ssematimba, Sasidhar Malladi, Peter J Bonney, Kaitlyn M St. Charles, Holden C Hutchinson, Melissa Schoenbaum, Rosemary Marusak, Marie R Culhane, Carol J Cardona
{"title":"Estimating adequate contact rates and time of Highly Pathogenic Avian Influenza virus introduction into individual United States commercial poultry flocks during the 2022/24 epizootic","authors":"Amos Ssematimba, Sasidhar Malladi, Peter J Bonney, Kaitlyn M St. Charles, Holden C Hutchinson, Melissa Schoenbaum, Rosemary Marusak, Marie R Culhane, Carol J Cardona","doi":"10.1101/2024.09.08.611909","DOIUrl":"https://doi.org/10.1101/2024.09.08.611909","url":null,"abstract":"Following confirmation of the first case of the ongoing U.S. HPAI H5N1 epizootic in commercial poultry on February 8, 2022, the virus has continued to devastate the U.S. poultry sector and the pathogen has since managed to cross over to livestock and a few human cases have also been reported. Efficient outbreak management benefits greatly from timely detection and proper identification of the pathways of virus introduction and spread. In this study, using changes in mortality rates as a proxy for HPAI incidence in a layer, broiler and turkey flock, mathematical modeling techniques, specifically the Approximate Bayesian Computation algorithm in conjunction with a stochastic within-flock HPAI transmission model, were used to estimate the time window of pathogen introduction into the flock (TOI) and adequate contact rate (ACR) based on the daily mortality and diagnostic test results. The estimated TOI was then used together with the day when the first positive sample was collected to calculate the most likely time to first positive sample (MTFPS) which reflects the time to HPAI detection in the flock. The estimated joint (i.e., all species combined) median of the MTFPS for different flocks was six days, the joint median most likely ACR was 6.8 newly infected birds per infectious bird per day, the joint median was 13 and the joint median number of test days per flock was two. These results were also grouped by species and by epidemic phase and discussed accordingly. We conclude that findings from this and related studies are beneficial for the different stakeholders in outbreak management and combining TOI analysis with complementary approaches such as phylogenetic analyses is critically important for improved understanding of disease transmission pathways. The estimated parameters can also inform models used for surveillance design, risk analysis, and emergency preparedness.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"45 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MaskMol: Knowledge-guided Molecular Image Pre-Training Framework for Activity Cliffs with Pixel Masking","authors":"Zhixiang cheng, Hongxin Xiang, Pengsen Ma, Zeng Li, Xin Jin, Xixi Yang, Jianxin Lin, Bosheng Song, Yang Deng, Xinxin Feng, Changhui Deng, Xiangxiang Zeng","doi":"10.1101/2024.09.04.611324","DOIUrl":"https://doi.org/10.1101/2024.09.04.611324","url":null,"abstract":"Activity cliffs, which refer to pairs of molecules that are structurally similar but show significant differences in their potency, can lead to model representation collapse and make the model challenging to distinguish them. Our research indicates that as molecular similarity increases, graph-based methods struggle to capture these nuances, whereas image-based approaches effectively retain the distinctions. Thus, we developed MaskMol, a knowledge-guided molecular image self-supervised learning framework. MaskMol accurately learns the representation of molecular images by considering multiple levels of molecular knowledge, such as atoms, bonds, and substructures. By utilizing pixel masking tasks, MaskMol extracts fine-grained information from molecular images, overcoming the limitations of existing deep learning models in identifying subtle structural changes. Experimental results demonstrate MaskMol's high accuracy and transferability in activity cliff estimation and compound potency prediction across 20 different macromolecular targets, outperforming 25 state-of-the-art deep learning and machine learning approaches. Visualization analyses reveal MaskMol's high biological interpretability in identifying activity cliff-relevant molecular substructures. Notably, through MaskMol, we identified candidate EP4 inhibitors that could be used to treat tumors. This study not only raises awareness about activity cliffs but also introduces a novel method for molecular image representation learning and virtual screening, advancing drug discovery and providing new insights into structure-activity relationships (SAR).","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saeed Alahmari, Andrew Schultz, Jordan Albrecht, Vural Tagal, Zaid Siddiqui, Sadhya Prabhakaran, Issam El Naqa, Alexander Anderson, Laura Heiser, Noemi Andor
{"title":"Cell identity revealed by precise cell cycle state mapping links data modalities","authors":"Saeed Alahmari, Andrew Schultz, Jordan Albrecht, Vural Tagal, Zaid Siddiqui, Sadhya Prabhakaran, Issam El Naqa, Alexander Anderson, Laura Heiser, Noemi Andor","doi":"10.1101/2024.09.04.610488","DOIUrl":"https://doi.org/10.1101/2024.09.04.610488","url":null,"abstract":"Several methods for cell cycle inference from sequencing data exist and are widely adopted. In contrast, methods for classification of cell cycle state from imaging data are scarce. We have for the first time integrated sequencing and imaging derived cell cycle pseudo-times for assigning 449 imaged cells to 693 sequenced cells at an average resolution of 3.4 and 2.4 cells for sequencing and imaging data respectively. Data integration revealed thousands of pathways and organelle features that are correlated with each other, including several previously known interactions and novel associations. The ability to assign the transcriptome state of a profiled cell to its closest living relative, which is still actively growing and expanding opens the door for genotype-phenotype mapping at single cell resolution forward in time.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brent Allman, Luiz Vieira, Daniel J Diaz, Claus O Wilke
{"title":"A systematic evaluation of the language-of-viral-escape model using multiple machine learning frameworks","authors":"Brent Allman, Luiz Vieira, Daniel J Diaz, Claus O Wilke","doi":"10.1101/2024.09.04.611278","DOIUrl":"https://doi.org/10.1101/2024.09.04.611278","url":null,"abstract":"Predicting the evolutionary patterns of emerging and endemic viruses is key for mitigating their spread in host populations. In particular, it is critical to rapidly identify mutations with the potential for immune escape or increased disease burden (variants of concern). Knowing which circulating mutations are such variants of concern can inform treatment or mitigation strategies such as alternative vaccines or targeted social distancing. A recent study proposed that variants of concern can be identified using two quantities extracted from protein language models, grammaticality and semantic change. These quantities are defined in analogy to concepts from natural language processing. Grammaticality is intended to be a measure of whether a variant viral protein is viable, and semantic change is intended to be a measure of potential for immune escape. Here, we systematically test this hypothesis, taking advantage of several high-throughput datasets that have become available, and also testing additional machine learning models for calculating the grammaticality metric. We find that grammaticality can be a measure of protein viability, though the more traditional metric ΔΔG appears to be more effective. By contrast, we do not find compelling evidence that semantic change is a useful tool for identifying immune escape mutations.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185468","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ankit Agrawal, Stefan Thomann, Sukanya Basu, Dominic Grün
{"title":"NiCo Identifies Extrinsic Drivers of Cell State Modulation by Niche Covariation Analysis","authors":"Ankit Agrawal, Stefan Thomann, Sukanya Basu, Dominic Grün","doi":"10.1101/2024.09.08.611848","DOIUrl":"https://doi.org/10.1101/2024.09.08.611848","url":null,"abstract":"Cell states are modulated by intrinsic driving forces such as gene expression noise and extrinsic signals from the tissue microenvironment. The distinction between intrinsic and extrinsic cell state determinants is essential for understanding the regulation of cell fate in tissues during development, homeostasis and disease. The rapidly growing availability of single-cell resolution spatial transcriptomics makes it possible to meet this challenge. However, available computational methods to infer topological tissue domains, spatially variable gene expression, or ligand-receptor interactions are limited in capturing cell state changes driven by crosstalk between individual cell types within the same niche. We present NiCo, a computational framework for integrating single-cell resolution spatial transcriptomics with matched single-cell RNA-sequencing reference data to infer the influence of the spatial niche on the cell state. By applying NiCo to mouse embryogenesis, adult small intestine and liver data, we demonstrate the capacity to predict novel niche interactions that govern cell state variation underlying tissue development and homeostasis. In particular, NiCo predicts a feedback mechanism between Kupffer cells and neighboring stellate cells limiting stellate cell activation in the normal liver. NiCo provides a powerful tool to elucidate tissue architecture and to identify drivers of cellular states in local niches.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224343","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improvements in Prediction Performance of Ensemble Approaches for Genomic Prediction in Crop Breeding","authors":"Shunichiro Tomura, Mark Cooper, Owen M. Powell","doi":"10.1101/2024.09.06.611589","DOIUrl":"https://doi.org/10.1101/2024.09.06.611589","url":null,"abstract":"The refinement of prediction accuracy in genomic prediction is a key factor in accelerating genetic gain for crop breeding. The mainstream strategy for prediction performance improvement has been developing an individual prediction model outperforming others across diverse prediction scenarios. However, this approach has limitations in situations when there is inconsistency in the superiority\u0000of individual models, attributed to the existence of complex nonlinear interactions among genetic markers. This phenomenon is expected given the No Free Lunch Theorem, which states that the average performance of an individual prediction model is expected to be equivalent to the others across all scenarios. Hence, we investigate the potential to leverage the concept of a stacked ensemble as an alternative method. We consider two traits, days to anthesis (DTA) and tiller number (TILN), measured on a Nested Association Mapping study, referred to herein as TeoNAM; a public maize\u0000(Zea mays) inbred W22 was crossed to five inbred Teosinte lines. The TeoNAM data set and the two traits were selected as the example of choice based on prior evidence that the traits were under the control of networks of genes and high levels of segregation diversity for the nodes of the genetic\u0000networks. Our analysis of both traits for the TeoNAM demonstrated an improvement in prediction performance, measured as the Pearson correlation, for the ensemble approach across all the proposed scenarios, for at least more than 95% of cases, compared to the six individual prediction models that contributed to the ensemble; rrBLUP, BayesB, RKHS, RF, SVR and GAT. The observed result indicates that there is a potential for ensemble approaches to enhance the performance of genomic prediction for crop breeding.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"53 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Vaxign-DL for Vaccine Candidate Prediction with added ESM-Generated Features","authors":"Yichao Chen, Yuhan Zhang, Yongqun He","doi":"10.1101/2024.09.04.611295","DOIUrl":"https://doi.org/10.1101/2024.09.04.611295","url":null,"abstract":"Many vaccine design programs have been developed, including our own machine learning approaches Vaxign-ML and Vaxign-DL. Using deep learning techniques, Vaxign-DL predicts bacterial protective antigens by calculating 509 biological and biomedical features from protein sequences. In this study, we first used the protein folding ESM program to calculate a set of 1,280 features from individual protein sequences, and then utilized the new set of features separately or in combination with the traditional set of 509 features to predict protective antigens. Our result showed that the usage of ESM-derived features alone was able to accurately predict vaccine antigens with a performance similar to the orginal Vaxign-DL prediction method, and the usage of the combined ESM-derived and orginal Vaxign-DL features significantly improved the prediction performance according to a set of seven scores including specificity, sensitivity, and AUROC. To further evaluate the updated methods, we conducted a Leave-One-Pathogen-Out Validation (LOPOV) study, and found that the usage of ESM-derived features significantly improved the the prediction of vaccine antigens from 10 bacterial pathogens. This research is the first reported study demonstrating the added value of protein folding features for vaccine antigen prediction.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142185469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liqi Kang, Banghao Wu, Bingxin Zhou, Pan Tan, Yun (Kenneth) Kang, Yongzhen Yan, Yi Zong, Shuang Li, Zhuo Liu, Liang Hong
{"title":"AI-enabled Alkaline-resistant Evolution of Protein to Apply in Mass Production","authors":"Liqi Kang, Banghao Wu, Bingxin Zhou, Pan Tan, Yun (Kenneth) Kang, Yongzhen Yan, Yi Zong, Shuang Li, Zhuo Liu, Liang Hong","doi":"10.1101/2024.09.04.611192","DOIUrl":"https://doi.org/10.1101/2024.09.04.611192","url":null,"abstract":"Artificial intelligence (AI) models have been used to study the compositional regularities of proteins in nature, enabling it to assist in protein design to improve the efficiency of protein engineering and reduce manufacturing cost. However, in industrial settings, proteins are often required to work in extreme environments where they are relatively scarce or even non-existent in nature. Since such proteins are almost absent in the training datasets, it is uncertain whether AI model possesses the capability of evolving the protein to adapt extreme conditions. Antibodies are crucial components of affinity chromatography, and they are hoped to remain active at the extreme environments where most proteins cannot tolerate. In this study, we applied an advanced large language model (LLM), the Pro-PRIME model, to improve the alkali resistance of a representative antibody, a VHH antibody capable of binding to growth hormone. Through two rounds of design, we ensured that the selected mutant has enhanced functionality, including higher thermal stability, extreme pH resistance and stronger affinity, thereby validating the generalized capability of the LLM in meeting specific demands. To the best of our knowledge, this is the first LLM-designed protein product, which is successfully applied in mass production.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"60 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}