{"title":"DockingGA: enhancing targeted molecule generation using transformer neural network and genetic algorithm with docking simulation","authors":"Changnan Gao, Wenjie Bao, Shuang Wang, Jianyang Zheng, Lulu Wang, Yongqi Ren, Linfang Jiao, Jianmin Wang, Xun Wang","doi":"10.1093/bfgp/elae011","DOIUrl":"https://doi.org/10.1093/bfgp/elae011","url":null,"abstract":"Generative molecular models generate novel molecules with desired properties by searching chemical space. Traditional combinatorial optimization methods, such as genetic algorithms, have demonstrated superior performance in various molecular optimization tasks. However, these methods do not utilize docking simulation to inform the design process, and heavy dependence on the quality and quantity of available data, as well as require additional structural optimization to become candidate drugs. To address this limitation, we propose a novel model named DockingGA that combines Transformer neural networks and genetic algorithms to generate molecules with better binding affinity for specific targets. In order to generate high quality molecules, we chose the Self-referencing Chemical Structure Strings to represent the molecule and optimize the binding affinity of the molecules to different targets. Compared to other baseline models, DockingGA proves to be the optimal model in all docking results for the top 1, 10 and 100 molecules, while maintaining 100% novelty. Furthermore, the distribution of physicochemical properties demonstrates the ability of DockingGA to generate molecules with favorable and appropriate properties. This innovation creates new opportunities for the application of generative models in practical drug discovery.","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140592804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comprehensive survey on deep learning-based identification and predicting the interaction mechanism of long non-coding RNAs","authors":"Biyu Diao, Jin Luo, Yu Guo","doi":"10.1093/bfgp/elae010","DOIUrl":"https://doi.org/10.1093/bfgp/elae010","url":null,"abstract":"Long noncoding RNAs (lncRNAs) have been discovered to be extensively involved in eukaryotic epigenetic, transcriptional, and post-transcriptional regulatory processes with the advancements in sequencing technology and genomics research. Therefore, they play crucial roles in the body’s normal physiology and various disease outcomes. Presently, numerous unknown lncRNA sequencing data require exploration. Establishing deep learning-based prediction models for lncRNAs provides valuable insights for researchers, substantially reducing time and costs associated with trial and error and facilitating the disease-relevant lncRNA identification for prognosis analysis and targeted drug development as the era of artificial intelligence progresses. However, most lncRNA-related researchers lack awareness of the latest advancements in deep learning models and model selection and application in functional research on lncRNAs. Thus, we elucidate the concept of deep learning models, explore several prevalent deep learning algorithms and their data preferences, conduct a comprehensive review of recent literature studies with exemplary predictive performance over the past 5 years in conjunction with diverse prediction functions, critically analyze and discuss the merits and limitations of current deep learning models and solutions, while also proposing prospects based on cutting-edge advancements in lncRNA research.","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140592722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An improved hierarchical variational autoencoder for cell-cell communication estimation using single-cell RNA-seq data.","authors":"Shuhui Liu, Yupei Zhang, Jiajie Peng, Xuequn Shang","doi":"10.1093/bfgp/elac056","DOIUrl":"10.1093/bfgp/elac056","url":null,"abstract":"<p><p>Analysis of cell-cell communication (CCC) in the tumor micro-environment helps decipher the underlying mechanism of cancer progression and drug tolerance. Currently, single-cell RNA-Seq data are available on a large scale, providing an unprecedented opportunity to predict cellular communications. There have been many achievements and applications in inferring cell-cell communication based on the known interactions between molecules, such as ligands, receptors and extracellular matrix. However, the prior information is not quite adequate and only involves a fraction of cellular communications, producing many false-positive or false-negative results. To this end, we propose an improved hierarchical variational autoencoder (HiVAE) based model to fully use single-cell RNA-seq data for automatically estimating CCC. Specifically, the HiVAE model is used to learn the potential representation of cells on known ligand-receptor genes and all genes in single-cell RNA-seq data, respectively, which are then utilized for cascade integration. Subsequently, transfer entropy is employed to measure the transmission of information flow between two cells based on the learned representations, which are regarded as directed communication relationships. Experiments are conducted on single-cell RNA-seq data of the human skin disease dataset and the melanoma dataset, respectively. Results show that the HiVAE model is effective in learning cell representations, and transfer entropy could be used to estimate the communication scores between cell types.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9222533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiya Guo, Jin Ning, Yuanze Chen, Guoliang Liu, Liyan Zhao, Yue Fan, Shiquan Sun
{"title":"Recent advances in differential expression analysis for single-cell RNA-seq and spatially resolved transcriptomic studies.","authors":"Xiya Guo, Jin Ning, Yuanze Chen, Guoliang Liu, Liyan Zhao, Yue Fan, Shiquan Sun","doi":"10.1093/bfgp/elad011","DOIUrl":"10.1093/bfgp/elad011","url":null,"abstract":"<p><p>Differential expression (DE) analysis is a necessary step in the analysis of single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) data. Unlike traditional bulk RNA-seq, DE analysis for scRNA-seq or SRT data has unique characteristics that may contribute to the difficulty of detecting DE genes. However, the plethora of DE tools that work with various assumptions makes it difficult to choose an appropriate one. Furthermore, a comprehensive review on detecting DE genes for scRNA-seq data or SRT data from multi-condition, multi-sample experimental designs is lacking. To bridge such a gap, here, we first focus on the challenges of DE detection, then highlight potential opportunities that facilitate further progress in scRNA-seq or SRT analysis, and finally provide insights and guidance in selecting appropriate DE tools or developing new computational DE methods.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9258877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniela Felício, Miguel Alves-Ferreira, Mariana Santos, Marlene Quintas, Alexandra M Lopes, Carolina Lemos, Nádia Pinto, Sandra Martins
{"title":"Integrating functional scoring and regulatory data to predict the effect of non-coding SNPs in a complex neurological disease.","authors":"Daniela Felício, Miguel Alves-Ferreira, Mariana Santos, Marlene Quintas, Alexandra M Lopes, Carolina Lemos, Nádia Pinto, Sandra Martins","doi":"10.1093/bfgp/elad020","DOIUrl":"10.1093/bfgp/elad020","url":null,"abstract":"<p><p>Most SNPs associated with complex diseases seem to lie in non-coding regions of the genome; however, their contribution to gene expression and disease phenotype remains poorly understood. Here, we established a workflow to provide assistance in prioritising the functional relevance of non-coding SNPs of candidate genes as susceptibility loci in polygenic neurological disorders. To illustrate the applicability of our workflow, we considered the multifactorial disorder migraine as a model to follow our step-by-step approach. We annotated the overlap of selected SNPs with regulatory elements and assessed their potential impact on gene expression based on publicly available prediction algorithms and functional genomics information. Some migraine risk loci have been hypothesised to reside in non-coding regions and to be implicated in the neurotransmission pathway. In this study, we used a set of 22 non-coding SNPs from neurotransmission and synaptic machinery-related genes previously suggested to be involved in migraine susceptibility based on our candidate gene association studies. After prioritising these SNPs, we focused on non-reported ones that demonstrated high regulatory potential: (1) VAMP2_rs1150 (3' UTR) was predicted as a target of hsa-mir-5010-3p miRNA, possibly disrupting its own gene expression; (2) STX1A_rs6951030 (proximal enhancer) may affect the binding affinity of zinc-finger transcription factors (namely ZNF423) and disturb TBL2 gene expression; and (3) SNAP25_rs2327264 (distal enhancer) expected to be in a binding site of ONECUT2 transcription factor. This study demonstrated the applicability of our practical workflow to facilitate the prioritisation of potentially relevant non-coding SNPs and predict their functional impact in multifactorial neurological diseases.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9918600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liangrui Ren, Jun Wang, Wei Li, Maozu Guo, Guoxian Yu
{"title":"Single-cell RNA-seq data clustering by deep information fusion.","authors":"Liangrui Ren, Jun Wang, Wei Li, Maozu Guo, Guoxian Yu","doi":"10.1093/bfgp/elad017","DOIUrl":"10.1093/bfgp/elad017","url":null,"abstract":"<p><p>Determining cell types by single-cell transcriptomics data is fundamental for downstream analysis. However, cell clustering and data imputation still face the computation challenges, due to the high dropout rate, sparsity and dimensionality of single-cell data. Although some deep learning based solutions have been proposed to handle these challenges, they still can not leverage gene attribute information and cell topology in a sensible way to explore the consistent clustering. In this paper, we present scDeepFC, a deep information fusion-based single-cell data clustering method for cell clustering and data imputation. Specifically, scDeepFC uses a deep auto-encoder (DAE) network and a deep graph convolution network to embed high-dimensional gene attribute information and high-order cell-cell topological information into different low-dimensional representations, and then fuses them to generate a more comprehensive and accurate consensus representation via a deep information fusion network. In addition, scDeepFC integrates the zero-inflated negative binomial (ZINB) into DAE to model the dropout events. By jointly optimizing the ZINB loss and cell graph reconstruction loss, scDeepFC generates a salient embedding representation for clustering cells and imputing missing data. Extensive experiments on real single-cell datasets prove that scDeepFC outperforms other popular single-cell analysis methods. Both the gene attribute and cell topology information can improve the cell clustering.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9489133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dianshuang Zhou, Shiwei Guo, Yangyang Wang, Jiyun Zhao, Honghao Liu, Feiyang Zhou, Yan Huang, Yue Gu, Gang Jin, Yan Zhang
{"title":"Functional characteristics of DNA N6-methyladenine modification based on long-read sequencing in pancreatic cancer.","authors":"Dianshuang Zhou, Shiwei Guo, Yangyang Wang, Jiyun Zhao, Honghao Liu, Feiyang Zhou, Yan Huang, Yue Gu, Gang Jin, Yan Zhang","doi":"10.1093/bfgp/elad021","DOIUrl":"10.1093/bfgp/elad021","url":null,"abstract":"<p><p>Abnormalities of DNA modifications are closely related to the pathogenesis and prognosis of pancreatic cancer. The development of third-generation sequencing technology has brought opportunities for the study of new epigenetic modification in cancer. Here, we screened the N6-methyladenine (6mA) and 5-methylcytosine (5mC) modification in pancreatic cancer based on Oxford Nanopore Technologies sequencing. The 6mA levels were lower compared with 5mC and upregulated in pancreatic cancer. We developed a novel method to define differentially methylated deficient region (DMDR), which overlapped 1319 protein-coding genes in pancreatic cancer. Genes screened by DMDRs were more significantly enriched in the cancer genes compared with the traditional differential methylation method (P < 0.001 versus P = 0.21, hypergeometric test). We then identified a survival-related signature based on DMDRs (DMDRSig) that stratified patients into high- and low-risk groups. Functional enrichment analysis indicated that 891 genes were closely related to alternative splicing. Multi-omics data from the cancer genome atlas showed that these genes were frequently altered in cancer samples. Survival analysis indicated that seven genes with high expression (ADAM9, ADAM10, EPS8, FAM83A, FAM111B, LAMA3 and TES) were significantly associated with poor prognosis. In addition, the distinction for pancreatic cancer subtypes was determined using 46 subtype-specific genes and unsupervised clustering. Overall, our study is the first to explore the molecular characteristics of 6mA modifications in pancreatic cancer, indicating that 6mA has the potential to be a target for future clinical treatment.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9588453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emma F Jones, Anisha Haldar, Vishal H Oza, Brittany N Lasseigne
{"title":"Quantifying transcriptome diversity: a review.","authors":"Emma F Jones, Anisha Haldar, Vishal H Oza, Brittany N Lasseigne","doi":"10.1093/bfgp/elad019","DOIUrl":"10.1093/bfgp/elad019","url":null,"abstract":"<p><p>Following the central dogma of molecular biology, gene expression heterogeneity can aid in predicting and explaining the wide variety of protein products, functions and, ultimately, heterogeneity in phenotypes. There is currently overlapping terminology used to describe the types of diversity in gene expression profiles, and overlooking these nuances can misrepresent important biological information. Here, we describe transcriptome diversity as a measure of the heterogeneity in (1) the expression of all genes within a sample or a single gene across samples in a population (gene-level diversity) or (2) the isoform-specific expression of a given gene (isoform-level diversity). We first overview modulators and quantification of transcriptome diversity at the gene level. Then, we discuss the role alternative splicing plays in driving transcript isoform-level diversity and how it can be quantified. Additionally, we overview computational resources for calculating gene-level and isoform-level diversity for high-throughput sequencing data. Finally, we discuss future applications of transcriptome diversity. This review provides a comprehensive overview of how gene expression diversity arises, and how measuring it determines a more complete picture of heterogeneity across proteins, cells, tissues, organisms and species.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":2.5,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11484519/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10195229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sourajyoti Datta, Muhammad Nabeel Asim, Andreas Dengel, Sheraz Ahmed
{"title":"NTpred: a robust and precise machine learning framework for in silico identification of Tyrosine nitration sites in protein sequences.","authors":"Sourajyoti Datta, Muhammad Nabeel Asim, Andreas Dengel, Sheraz Ahmed","doi":"10.1093/bfgp/elad018","DOIUrl":"10.1093/bfgp/elad018","url":null,"abstract":"<p><p>Post-translational modifications (PTMs) either enhance a protein's activity in various sub-cellular processes, or degrade their activity which leads toward failure of intracellular processes. Tyrosine nitration (NT) modification degrades protein's activity that initiates and propagates various diseases including neurodegenerative, cardiovascular, autoimmune diseases and carcinogenesis. Identification of NT modification supports development of novel therapies and drug discoveries for associated diseases. Identification of NT modification in biochemical labs is expensive, time consuming and error-prone. To supplement this process, several computational approaches have been proposed. However these approaches fail to precisely identify NT modification, due to the extraction of irrelevant, redundant and less discriminative features from protein sequences. This paper presents the NTpred framework that is competent in extracting comprehensive features from raw protein sequences using four different sequence encoders. To reap the benefits of different encoders, it generates four additional feature spaces by fusing different combinations of individual encodings. Furthermore, it eradicates irrelevant and redundant features from eight different feature spaces through a Recursive Feature Elimination process. Selected features of four individual encodings and four feature fusion vectors are used to train eight different Gradient Boosted Tree classifiers. The probability scores from the trained classifiers are utilized to generate a new probabilistic feature space, which is used to train a Logistic Regression classifier. On the BD1 benchmark dataset, the proposed framework outperforms the existing best-performing predictor in 5-fold cross validation and independent test evaluation with combined improvement of 13.7% in MCC and 20.1% in AUC. Similarly, on the BD2 benchmark dataset, the proposed framework outperforms the existing best-performing predictor with combined improvement of 5.3% in MCC and 1.0% in AUC. NTpred is publicly available for further experimentation and predictive use at: https://sds_genetic_analysis.opendfki.de/PredNTS/.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9544857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Integrating single-cell RNA sequencing data to genome-wide association analysis data identifies significant cell types in influenza A virus infection and COVID-19.","authors":"Yixin Zou, Xifang Sun, Yifan Wang, Yidi Wang, Xiangyu Ye, Junlan Tu, Rongbin Yu, Peng Huang","doi":"10.1093/bfgp/elad025","DOIUrl":"10.1093/bfgp/elad025","url":null,"abstract":"<p><p>With the global pandemic of COVID-19, the research on influenza virus has entered a new stage, but it is difficult to elucidate the pathogenesis of influenza disease. Genome-wide association studies (GWASs) have greatly shed light on the role of host genetic background in influenza pathogenesis and prognosis, whereas single-cell RNA sequencing (scRNA-seq) has enabled unprecedented resolution of cellular diversity and in vivo following influenza disease. Here, we performed a comprehensive analysis of influenza GWAS and scRNA-seq data to reveal cell types associated with influenza disease and provide clues to understanding pathogenesis. We downloaded two GWAS summary data, two scRNA-seq data on influenza disease. After defining cell types for each scRNA-seq data, we used RolyPoly and LDSC-cts to integrate GWAS and scRNA-seq. Furthermore, we analyzed scRNA-seq data from the peripheral blood mononuclear cells (PBMCs) of a healthy population to validate and compare our results. After processing the scRNA-seq data, we obtained approximately 70 000 cells and identified up to 13 cell types. For the European population analysis, we determined an association between neutrophils and influenza disease. For the East Asian population analysis, we identified an association between monocytes and influenza disease. In addition, we also identified monocytes as a significantly related cell type in a dataset of healthy human PBMCs. In this comprehensive analysis, we identified neutrophils and monocytes as influenza disease-associated cell types. More attention and validation should be given in future studies.</p>","PeriodicalId":55323,"journal":{"name":"Briefings in Functional Genomics","volume":null,"pages":null},"PeriodicalIF":4.0,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9669193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}