Chaorui Yan, Aoyun Geng, Zhuoyu Pan, Zilong Zhang, Feifei Cui
{"title":"MultiFeatVotPIP: a voting-based ensemble learning framework for predicting proinflammatory peptides.","authors":"Chaorui Yan, Aoyun Geng, Zhuoyu Pan, Zilong Zhang, Feifei Cui","doi":"10.1093/bib/bbae505","DOIUrl":"https://doi.org/10.1093/bib/bbae505","url":null,"abstract":"<p><p>Inflammatory responses may lead to tissue or organ damage, and proinflammatory peptides (PIPs) are signaling peptides that can induce such responses. Many diseases have been redefined as inflammatory diseases. To identify PIPs more efficiently, we expanded the dataset and designed an ensemble learning model with manually encoded features. Specifically, we adopted a more comprehensive feature encoding method and considered the actual impact of certain features to filter them. Identification and prediction of PIPs were performed using an ensemble learning model based on five different classifiers. The results show that the model's sensitivity, specificity, accuracy, and Matthews correlation coefficient are all higher than those of the state-of-the-art models. We named this model MultiFeatVotPIP, and both the model and the data can be accessed publicly at https://github.com/ChaoruiYan019/MultiFeatVotPIP. Additionally, we have developed a user-friendly web interface for users, which can be accessed at http://www.bioai-lab.com/MultiFeatVotPIP.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11479713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianyi Chen, Xindian Wei, Lianxin Xie, Yunfei Zhang, Cheng Liu, Wenjun Shen, Si Wu, Hau-San Wong
{"title":"SELF-Former: multi-scale gene filtration transformer for single-cell spatial reconstruction.","authors":"Tianyi Chen, Xindian Wei, Lianxin Xie, Yunfei Zhang, Cheng Liu, Wenjun Shen, Si Wu, Hau-San Wong","doi":"10.1093/bib/bbae523","DOIUrl":"https://doi.org/10.1093/bib/bbae523","url":null,"abstract":"<p><p>The spatial reconstruction of single-cell RNA sequencing (scRNA-seq) data into spatial transcriptomics (ST) is a rapidly evolving field that addresses the significant challenge of aligning gene expression profiles to their spatial origins within tissues. This task is complicated by the inherent batch effects and the need for precise gene expression characterization to accurately reflect spatial information. To address these challenges, we developed SELF-Former, a transformer-based framework that utilizes multi-scale structures to learn gene representations, while designing spatial correlation constraints for the reconstruction of corresponding ST data. SELF-Former excels in recovering the spatial information of ST data and effectively mitigates batch effects between scRNA-seq and ST data. A novel aspect of SELF-Former is the introduction of a gene filtration module, which significantly enhances the spatial reconstruction task by selecting genes that are crucial for accurate spatial positioning and reconstruction. The superior performance and effectiveness of SELF-Former's modules have been validated across four benchmark datasets, establishing it as a robust and effective method for spatial reconstruction tasks. SELF-Former demonstrates its capability to extract meaningful gene expression information from scRNA-seq data and accurately map it to the spatial context of real ST data. Our method represents a significant advancement in the field, offering a reliable approach for spatial reconstruction.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483138/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constructing the dynamic transcriptional regulatory networks to identify phenotype-specific transcription regulators.","authors":"Yang Guo, Zhiqiang Xiao","doi":"10.1093/bib/bbae542","DOIUrl":"https://doi.org/10.1093/bib/bbae542","url":null,"abstract":"<p><p>The transcriptional regulatory network (TRN) is a graph framework that helps understand the complex transcriptional regulation mechanisms in the transcription process. Identifying the phenotype-specific transcription regulators is vital to reveal the functional roles of transcription elements in associating the specific phenotypes. Although many methods have been developed towards detecting the phenotype-specific transcription elements based on the static TRN in the past decade, most of them are not satisfactory for elucidating the phenotype-related functional roles of transcription regulators in multiple levels, as the dynamic characteristics of transcription regulators are usually ignored in static models. In this study, we introduce a novel framework called DTGN to identify the phenotype-specific transcription factors (TFs) and pathways by constructing dynamic TRNs. We first design a graph autoencoder model to integrate the phenotype-oriented time-series gene expression data and static TRN to learn the temporal representations of genes. Then, based on the learned temporal representations of genes, we develop a statistical method to construct a series of dynamic TRNs associated with the development of specific phenotypes. Finally, we identify the phenotype-specific TFs and pathways from the constructed dynamic TRNs. Results from multiple phenotypic datasets show that the proposed DTGN framework outperforms most existing methods in identifying phenotype-specific TFs and pathways. Our framework offers a new approach to exploring the functional roles of transcription regulators that associate with specific phenotypes in a dynamic model.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11503644/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142495337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Shi, Qiaohui Liu, Qingrong Ji, Qisheng He, Xing-Ming Zhao
{"title":"MicroHDF: predicting host phenotypes with metagenomic data using a deep forest-based framework.","authors":"Kai Shi, Qiaohui Liu, Qingrong Ji, Qisheng He, Xing-Ming Zhao","doi":"10.1093/bib/bbae530","DOIUrl":"10.1093/bib/bbae530","url":null,"abstract":"<p><p>The gut microbiota plays a vital role in human health, and significant effort has been made to predict human phenotypes, especially diseases, with the microbiota as a promising indicator or predictor with machine learning (ML) methods. However, the accuracy is impacted by a lot of factors when predicting host phenotypes with the metagenomic data, e.g. small sample size, class imbalance, high-dimensional features, etc. To address these challenges, we propose MicroHDF, an interpretable deep learning framework to predict host phenotypes, where a cascade layers of deep forest units is designed for handling sample class imbalance and high dimensional features. The experimental results show that the performance of MicroHDF is competitive with that of existing state-of-the-art methods on 13 publicly available datasets of six different diseases. In particular, it performs best with the area under the receiver operating characteristic curve of 0.9182 ± 0.0098 and 0.9469 ± 0.0076 for inflammatory bowel disease (IBD) and liver cirrhosis, respectively. Our MicroHDF also shows better performance and robustness in cross-study validation. Furthermore, MicroHDF is applied to two high-risk diseases, IBD and autism spectrum disorder, as case studies to identify potential biomarkers. In conclusion, our method provides an effective and reliable prediction of the host phenotype and discovers informative features with biological insights.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11500453/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142516299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unlocking cross-modal interplay of single-cell joint profiling with CellMATE.","authors":"Qi Wang, Bolei Zhang, Yue Guo, Luyu Gong, Erguang Li, Jingping Yang","doi":"10.1093/bib/bbae582","DOIUrl":"https://doi.org/10.1093/bib/bbae582","url":null,"abstract":"<p><p>A key advantage of single-cell multimodal joint profiling is the modality interplay, which is essential for deciphering the cell fate. However, while current analytical methods can leverage the additive benefits, they fall short to explore the synergistic insights of joint profiling, thereby diminishing the advantage of joint profiling. Here, we introduce CellMATE, a Multi-head Adversarial Training-based Early-integration approach specifically developed for multimodal joint profiling. CellMATE can capture both additive and synergistic benefits inherent in joint profiling through auto-learning of multimodal distributions and simultaneously represents all features into a unified latent space. Through extensive evaluation across diverse joint profiling scenarios, CellMATE demonstrated its superiority in ensuring utility of cross-modal properties, uncovering cellular heterogeneity and plasticity, and delineating differentiation trajectories. CellMATE uniquely unlocks the full potential of joint profiling to elucidate the dynamic nature of cells during critical processes as differentiation, development, and diseases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xueping Zhou, Manqi Cai, Molin Yue, Juan C Celedón, Jiebiao Wang, Ying Ding, Wei Chen, Yanming Li
{"title":"Molecular group and correlation guided structural learning for multi-phenotype prediction.","authors":"Xueping Zhou, Manqi Cai, Molin Yue, Juan C Celedón, Jiebiao Wang, Ying Ding, Wei Chen, Yanming Li","doi":"10.1093/bib/bbae585","DOIUrl":"10.1093/bib/bbae585","url":null,"abstract":"<p><p>We propose a supervised learning bioinformatics tool, Biological gRoup guIded muLtivariate muLtiple lIneAr regression with peNalizaTion (Brilliant), designed for feature selection and outcome prediction in genomic data with multi-phenotypic responses. Brilliant specifically incorporates genome and/or phenotype grouping structures, as well as phenotype correlation structures, in feature selection, effect estimation, and outcome prediction under a penalized multi-response linear regression model. Extensive simulations demonstrate its superior performance compared to competing methods. We applied Brilliant to two omics studies. In the first study, we identified novel association signals between multivariate gene expressions and high-dimensional DNA methylation profiles, providing biological insights for the baseline CpG-to-gene regulation patterns in a Puerto Rican children asthma cohort. The second study focused on cell-type deconvolution prediction using high-dimensional gene expression profiles. Using Brilliant, we improved the accuracy for cell-type fraction prediction and identified novel cell-type signature genes.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562839/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Limuxuan He, Quan Zou, Qi Dai, Shuang Cheng, Yansu Wang
{"title":"Adversarial regularized autoencoder graph neural network for microbe-disease associations prediction.","authors":"Limuxuan He, Quan Zou, Qi Dai, Shuang Cheng, Yansu Wang","doi":"10.1093/bib/bbae584","DOIUrl":"10.1093/bib/bbae584","url":null,"abstract":"<p><strong>Background: </strong>Microorganisms inhabit various regions of the human body and significantly contribute to numerous diseases. Predicting the associations between microbes and diseases is crucial for understanding pathogenic mechanisms and informing prevention and treatment strategies. Biological experiments to determine these associations are time-consuming and costly. Therefore, integrating deep learning with biological networks can efficiently identify potential microbe-disease associations on a large scale.</p><p><strong>Methods: </strong>We propose an adversarial regularized autoencoder graph neural network algorithm, named Stacked Adversarial Regularization for Microbe-Disease Associations Prediction (SARMDA), for predicting associations between microbes and diseases. First, we integrate topological structural similarity and functional similarity metrics of microbes and diseases to construct a heterogeneous network. Then, utilizing an autoencoder based on GraphSAGE, we learn both the topological and attribute representations of nodes within the constructed network. Finally, we introduce an adversarial regularized autoencoder graph neural network embedding model to address the inherent limitations of traditional GraphSAGE autoencoders in capturing global information.</p><p><strong>Results: </strong>Under the five-fold cross-validation on microbe-disease pairs, SARMDA was compared with eight advanced methods using the Human Microbe-Disease Association Database (HMDAD) and Disbiome databases. The best area under the ROC curve (AUC) achieved by SARMDA on HMDAD was 0.9891$pm$0.0057, and the best area under the precision-recall curve (AUPR) was 0.9902$pm$0.0128. On the Disbiome dataset, the AUC was 0.9328$pm$0.0072, and the best AUPR was 0.9233$pm$0.0089, outperforming the other eight MDAs prediction methods. Furthermore, the effectiveness of our model was demonstrated through a detailed analysis of asthma and inflammatory bowel disease cases.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11554402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142614872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jessica Butts, Leif Verace, Christine Wendt, Russel P Bowler, Craig P Hersh, Qi Long, Lynn Eberly, Sandra E Safo
{"title":"HIP: a method for high-dimensional multi-view data integration and prediction accounting for subgroup heterogeneity.","authors":"Jessica Butts, Leif Verace, Christine Wendt, Russel P Bowler, Craig P Hersh, Qi Long, Lynn Eberly, Sandra E Safo","doi":"10.1093/bib/bbae470","DOIUrl":"10.1093/bib/bbae470","url":null,"abstract":"<p><p>Epidemiologic and genetic studies in many complex diseases suggest subgroup disparities (e.g. by sex, race) in disease course and patient outcomes. We consider this from the standpoint of integrative analysis where we combine information from different views (e.g. genomics, proteomics, clinical data). Existing integrative analysis methods ignore the heterogeneity in subgroups, and stacking the views and accounting for subgroup heterogeneity does not model the association among the views. We propose Heterogeneity in Integration and Prediction (HIP), a statistical approach for joint association and prediction that leverages the strengths in each view to identify molecular signatures that are shared by and specific to a subgroup. We apply HIP to proteomics and gene expression data pertaining to chronic obstructive pulmonary disease (COPD) to identify proteins and genes shared by, and unique to, males and females, contributing to the variation in COPD, measured by airway wall thickness. Our COPD findings have identified proteins, genes, and pathways that are common across and specific to males and females, some implicated in COPD, while others could lead to new insights into sex differences in COPD mechanisms. HIP accounts for subgroup heterogeneity in multi-view data, ranks variables based on importance, is applicable to univariate or multivariate continuous outcomes, and incorporates covariate adjustment. With the efficient algorithms implemented using PyTorch, this method has many potential scientific applications and could enhance multiomics research in health disparities. HIP is available at https://github.com/lasandrall/HIP, a video tutorial at https://youtu.be/O6E2OLmeMDo and a Shiny Application at https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/ for users with limited programming experience.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11440091/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miao Cui, Yadong Liu, Xian Yu, Hongzhe Guo, Tao Jiang, Yadong Wang, Bo Liu
{"title":"miniSNV: accurate and fast single nucleotide variant calling from nanopore sequencing data.","authors":"Miao Cui, Yadong Liu, Xian Yu, Hongzhe Guo, Tao Jiang, Yadong Wang, Bo Liu","doi":"10.1093/bib/bbae473","DOIUrl":"https://doi.org/10.1093/bib/bbae473","url":null,"abstract":"<p><p>Nanopore sequence technology has demonstrated a longer read length and enabled to potentially address the limitations of short-read sequencing including long-range haplotype phasing and accurate variant calling. However, there is still room for improvement in terms of the performance of single nucleotide variant (SNV) identification and computing resource usage for the state-of-the-art approaches. In this work, we introduce miniSNV, a lightweight SNV calling algorithm that simultaneously achieves high performance and yield. miniSNV utilizes known common variants in populations as variation backgrounds and leverages read pileup, read-based phasing, and consensus generation to identify and genotype SNVs for Oxford Nanopore Technologies (ONT) long reads. Benchmarks on real and simulated ONT data under various error profiles demonstrate that miniSNV has superior sensitivity and comparable accuracy on SNV detection and runs faster with outstanding scalability and lower memory than most state-of-the-art variant callers. miniSNV is available from https://github.com/CuiMiao-HIT/miniSNV.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11428505/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yetong Zhou, Shengming Zhou, Yue Bi, Quan Zou, Cangzhi Jia
{"title":"A two-task predictor for discovering phase separation proteins and their undergoing mechanism.","authors":"Yetong Zhou, Shengming Zhou, Yue Bi, Quan Zou, Cangzhi Jia","doi":"10.1093/bib/bbae528","DOIUrl":"10.1093/bib/bbae528","url":null,"abstract":"<p><p>Liquid-liquid phase separation (LLPS) is one of the mechanisms mediating the compartmentalization of macromolecules (proteins and nucleic acids) in cells, forming biomolecular condensates or membraneless organelles. Consequently, the systematic identification of potential LLPS proteins is crucial for understanding the phase separation process and its biological mechanisms. A two-task predictor, Opt_PredLLPS, was developed to discover potential phase separation proteins and further evaluate their mechanism. The first task model of Opt_PredLLPS combines a convolutional neural network (CNN) and bidirectional long short-term memory (BiLSTM) through a fully connected layer, where the CNN utilizes evolutionary information features as input, and BiLSTM utilizes multimodal features as input. If a protein is predicted to be an LLPS protein, it is input into the second task model to predict whether this protein needs to interact with its partners to undergo LLPS. The second task model employs the XGBoost classification algorithm and 37 physicochemical properties following a three-step feature selection. The effectiveness of the model was validated on multiple benchmark datasets, and in silico saturation mutagenesis was used to identify regions that play a key role in phase separation. These findings may assist future research on the LLPS mechanism and the discovery of potential phase separation proteins.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"25 6","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11492799/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}