{"title":"Enhancing LncRNA-miRNA interaction prediction with multimodal contrastive representation learning.","authors":"Zhixia Teng, Zhaowen Tian, Murong Zhou, Guohua Wang, Zhen Tian, Yuming Zhao","doi":"10.1093/bib/bbaf281","DOIUrl":"https://doi.org/10.1093/bib/bbaf281","url":null,"abstract":"<p><p>Interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) play an important role in the development of complex human diseases by collaboratively regulating gene transcription and expression. Therefore, identifying lncRNA-miRNA interactions (LMIs) is essential for diagnosing and treating complex human diseases. Because identifying LMIs with wet experiments is time-consuming and labor-intensive, some computational methods have been developed to infer LMIs. However, these approaches excel at utilizing single-modal information but struggle to integrate multimodal data from lncRNAs and miRNAs, which is essential for uncovering complex patterns in LMIs, ultimately limiting their performance. Therefore, this article proposes a novel multimodal contrastive representation learning model (MCRLMI) for LMI predictions. The model fully integrates multi-source similarity information and sequence encodings of lncRNAs and miRNAs. It leverages a graph convolutional network (GCN) and a Transformer to capture local neighborhood structural features and long-distance dependencies, respectively, enabling the collaborative modeling of structural and semantic information. Subsequently, to effectively integrate multimodal characteristics with encoded information, a multichannel attention mechanism and contrastive learning are introduced to fuse the extracted features. Finally, a Kolmogorov-Arnold Network (KAN) is trained with the optimized embeddings to predict LMIs. Extensive experiments show that the proposed MCRLMI consistently outperforms existing methods. Moreover, case studies further validate the potential of MCRLMI to identify novel LMIs in practical applications.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144309542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xudong Liu, Zhiwei Nie, Haorui Si, Xurui Shen, Yutian Liu, Xiansong Huang, Tianyi Dong, Fan Xu, Zhixiang Ren, Peng Zhou, Jie Chen
{"title":"Generative prediction of real-world prevalent SARS-CoV-2 mutation with in silico virus evolution.","authors":"Xudong Liu, Zhiwei Nie, Haorui Si, Xurui Shen, Yutian Liu, Xiansong Huang, Tianyi Dong, Fan Xu, Zhixiang Ren, Peng Zhou, Jie Chen","doi":"10.1093/bib/bbaf276","DOIUrl":"https://doi.org/10.1093/bib/bbaf276","url":null,"abstract":"<p><p>Predicting the mutation prevalence trends of emerging viruses in the real world is an efficient means to update vaccines or drugs in advance. It is crucial to develop a computational method for the prediction of real-world prevalent SARS-CoV-2 mutations considering the impact of multiple selective pressures within and between hosts. Here, a deep-learning generative framework for real-world prevalent SARS-CoV-2 mutation prediction, named ViralForesight, is developed on top of protein language models and in silico virus evolution. Through the paradigm of host-to-herd in silico virus evolution, ViralForesight reproduced previous real-world prevalent SARS-CoV-2 mutations for multiple lineages with superior performance. More importantly, ViralForesight correctly predicted the future prevalent mutations that dominated the COVID-19 pandemic in the real world more than half a year in advance with in vitro experimental validation. Overall, ViralForesight demonstrates a proactive approach to the prevention of emerging viral infections, accelerating the process of discovering future prevalent mutations with the power of generative deep learning.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144324526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editor's Note on 'Bioinformatics in Russia: history and present-day landscape'.","authors":"","doi":"10.1093/bib/bbaf181","DOIUrl":"10.1093/bib/bbaf181","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12101724/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144131840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"scaLR: a low-resource deep neural network-based platform for single cell analysis and biomarker discovery.","authors":"Saiyam Jogani, Anand Santosh Pol, Mayur Prajapati, Amit Samal, Kriti Bhatia, Jayendra Parmar, Urvik Patel, Falak Shah, Nisarg Vyas, Saurabh Gupta","doi":"10.1093/bib/bbaf243","DOIUrl":"10.1093/bib/bbaf243","url":null,"abstract":"<p><p>Single-cell ribonucleic acid (RNA) sequencing (scRNA-seq) produces vast amounts of individual cell profiling data. Its analysis presents a significant challenge in accurately annotating cell types and their associated biomarkers. Different pipelines based on deep neural network (DNN) methods have been employed to tackle these issues. These pipelines have arisen as a promising resource and can extract meaningful and concise features from noisy, diverse, and high-dimensional data to enhance annotations and subsequent analysis. Existing tools require high computational resources to execute large sample datasets. We have developed a cutting-edge platform known as scaLR (Single-cell analysis using low resource) that efficiently processes data into feature subsets, samples in batches to reduce the required memory for processing large datasets, and running DNN models in multiple central processing units. scaLR is equipped with data processing, feature extraction, training, evaluation, and downstream analysis. Its novel feature extraction algorithm first trains the model on a feature subset and stores the importance of the features for all the features in that subset. At the end of the training of all subsets, the top-K features are selected based on their importance. The final model is trained on top-K features; its performance evaluation and associated downstream analysis provide significant biomarkers for different cell types and diseases/traits. Our findings indicate that scaLR offers comparable prediction accuracy and requires less model training time and computational resources than existing Python-based pipelines. We present scaLR, a Python-based platform, engineered to utilize minimal computational resources while maintaining comparable execution times and analysis costs to existing frameworks.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121358/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144172713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyan Yu, Yixuan Ren, Min Xia, Zhenqiu Shu, Liehuang Zhu
{"title":"Decoupled GNNs based on multi-view contrastive learning for scRNA-seq data clustering.","authors":"Xiaoyan Yu, Yixuan Ren, Min Xia, Zhenqiu Shu, Liehuang Zhu","doi":"10.1093/bib/bbaf198","DOIUrl":"10.1093/bib/bbaf198","url":null,"abstract":"<p><p>Clustering is pivotal in deciphering cellular heterogeneity in single-cell RNA sequencing (scRNA-seq) data. However, it suffers from several challenges in handling the high dimensionality and complexity of scRNA-seq data. Especially when employing graph neural networks (GNNs) for cell clustering, the dependencies between cells expand exponentially with the number of layers. This results in high computational complexity, negatively impacting the model's training efficiency. To address these challenges, we propose a novel approach, called decoupled GNNs, based on multi-view contrastive learning (scDeGNN), for scRNA-seq data clustering. Firstly, this method constructs two adjacency matrices to generate distinct views, and trains them using decoupled GNNs to derive the initial cell feature representations. These representations are then refined through a multilayer perceptron and a contrastive learning layer, ensuring the consistency and discriminability of the learned features. Finally, the learned representations are fused and applied to the cell clustering task. Extensive experimental results on nine real scRNA-seq datasets from various organisms and tissues show that the proposed scDeGNN method significantly outperforms other state-of-the-art scRNA-seq data clustering algorithms across multiple evaluation metrics.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12077398/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144076070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge-guided multi-level network modeling with experimental characterization identifies PRKCA as a novel biomarker and tumor suppressor triggering ferroptosis in prostate cancer.","authors":"Yuxin Lin, Zongming Jia, Jixiang Wu, Hubo Yang, Xin Chen, He Wang, Xuedong Wei, Wenying Yan, Xin Qi, Yuhua Huang","doi":"10.1093/bib/bbaf220","DOIUrl":"10.1093/bib/bbaf220","url":null,"abstract":"<p><p>Prostate cancer (PCa) is observed with high incidence in men worldwide. Ferroptosis, occurred from disorders in a series of gene and pathway regulation, is an emerging target against cancer. However, most of the computational approaches solely treated ferroptosis-related genes (FRGs) as independent variables in model training, and the interactions among FRGs and other candidates were not fully deciphered in a disease-specific content. In this study, a novel network-based and knowledge-guided bioinformatics model was proposed by integrating ferroptosis-related prior knowledge with topological and functional characterization on a protein-protein interaction network for biomarker discovery in PCa development and ferroptosis. The model started at a random walk with restart algorithm for weighting genes close to known FRGs in the PCa-specific network to extract a core subnetwork for robustness and vulnerability analysis. Then key regulatory modules and a candidate gene, i.e. PRKCA, were respectively identified using a multi-level prioritization strategy with hub-bottleneck node filtering, edge-based gene co-expression measuring, community module detecting and a newly defined Ferr.neighbor functional score. The experimental validation using human clinical samples, cell lines, and nude mice convinced the role of PRKCA as a latent biomarker and a tumor suppressor in PCa carcinogenesis with a potential mechanism on triggering GPX4-mediated ferroptosis of PCa cells. This study provides a general-purpose systems biology framework for significant FRG screening, and future translational perspectives of PRKCA as a novel diagnostic and therapeutic signature for PCa management should be explored.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12090055/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144109750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luxuan Wang, Beihong Ji, Jingchen Zhai, Junmei Wang
{"title":"Advancing promiscuous aggregating inhibitor analysis with intelligent machine learning classification.","authors":"Luxuan Wang, Beihong Ji, Jingchen Zhai, Junmei Wang","doi":"10.1093/bib/bbaf205","DOIUrl":"10.1093/bib/bbaf205","url":null,"abstract":"<p><p>Small molecules have been playing a crucial role in drug discovery; however, some exhibit nonspecific inhibitory effects during hit screening due to the formation of colloidal aggregators. Such false positives often lead to significant research costs and time investment. Therefore, to identify potential aggregating compounds efficiently and accurately at an early stage of drug discovery, we employed several machine learning techniques to develop classification models for identifying promiscuous aggregating inhibitors. Using a training dataset of 10 000 aggregators and 10 000 nonaggregators, models were trained by combining four different molecular representations with various machine learning algorithms. We found that the best-performing model is the one that employs path-based FP2 fingerprints in conjunction with the cubic support vector machine algorithm, which achieved the highest accuracy and area under the receiver operating characteristic curve values for both the validation and test datasets while maintaining high sensitivity and specificity levels (>0.93). Additionally, we have proposed a new model interpretation method, global sensitivity analysis (GSA), to complement the well-recognized SHapley Additive exPlanations analysis. Several comparative studies have shown that GSA is a time-efficient and accurate approach for identifying crucial descriptors that contribute to model prediction, especially in the scenario where the dataset contains a substantial number of data entries with a limited set of descriptors. Our models as well as GSA findings can provide useful guidance on screening library design to minimize false positives.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12056367/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143961831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"scValue: value-based subsampling of large-scale single-cell transcriptomic data for machine and deep learning tasks.","authors":"Li Huang, Weikang Gong, Dongsheng Chen","doi":"10.1093/bib/bbaf279","DOIUrl":"10.1093/bib/bbaf279","url":null,"abstract":"<p><p>Large single-cell ribonucleic acid-sequencing (scRNA-seq) datasets offer unprecedented biological insights but present substantial computational challenges for visualization and analysis. While existing subsampling methods can enhance efficiency, they may not ensure optimal performance in downstream machine learning and deep learning (ML/DL) tasks. Here, we introduce scValue, a novel approach that ranks individual cells by 'data value' using out-of-bag estimates from a random forest model. scValue prioritizes high-value cells and allocates greater representation to cell types with higher variability in data value, effectively preserving key biological signals within subsamples. We benchmarked scValue on automatic cell-type annotation tasks across four large datasets, paired with distinct ML/DL models. Our method consistently outperformed existing subsampling methods, closely matching full-data performance across all annotation tasks. In three additional case studies-label transfer learning, cross-study label harmonization, and bulk RNA-seq deconvolution-scValue more effectively preserved T-cell annotations across human gut-colon datasets, more accurately reproduced T-cell subtype relationships in a human spleen dataset, and constructed a more reliable single-cell immune reference for cell-type deconvolution in simulated bulk tissue samples. Finally, using 16 public datasets ranging from tens of thousands to millions of cells, we evaluated subsampling quality based on computational time, Gini coefficient, and Hausdorff distance. scValue demonstrated fast execution, well-balanced cell-type representation, and distributional properties akin to uniform sampling. Overall, scValue provides a robust and scalable solution for subsampling large scRNA-seq data in ML/DL workflows. It is available as an open-source Python package installable via pip, with source code at https://github.com/LHBCB/scvalue.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12165832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144293342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anas Bilal, Muhammad Taseer Suleman, Khalid Almohammadi, Abdulkareem Alzahrani, Xiaowen Liu
{"title":"2OM-Pred: prediction of 2-O-methylation sites in ribonucleic acid using diverse classifiers.","authors":"Anas Bilal, Muhammad Taseer Suleman, Khalid Almohammadi, Abdulkareem Alzahrani, Xiaowen Liu","doi":"10.1093/bib/bbaf282","DOIUrl":"https://doi.org/10.1093/bib/bbaf282","url":null,"abstract":"<p><p>2-O-methylation (2OM) is a vital post-transcriptional modification which is formed by a functional group through the attachment of a methyl (-CH3) group to the second position of an aromatic ring hydroxyl group (-OH). It plays an active part in RNA physical configuration stability and the way different RNA molecules interrelate. Further, this modification plays a pivotal role in changing the epigenetic regulation of cellular processes. Previous approaches like mass spectrometry could not fully enhance the identification of RNA-modified sites. Sequence data were useful in the development of measures that meant the use of computationally intelligent system to identify 2OM sites quickly. This research proposed a new novel method of feature extraction and generation from the available sequences, and the feature dimensionality reduction has been done through the incorporation of statistical moments. The final feature vectors were developed and used to train prediction models. The assessment of prediction models was carried out through independent set tests and k-fold cross-validation. Through rigorous testing, the bagging ensemble model outperformed and revealed optimal accuracy scores. A publicly accessible web-based application has been developed which can be accessed via https://2om-pred-webapp.streamlit.app/.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144309541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Mei, Zhiyuan Wang, Hang Yang, Xiaoke Li, Yaqing Xu
{"title":"Network analysis of multivariate time series data in biological systems: methods and applications.","authors":"Hao Mei, Zhiyuan Wang, Hang Yang, Xiaoke Li, Yaqing Xu","doi":"10.1093/bib/bbaf223","DOIUrl":"10.1093/bib/bbaf223","url":null,"abstract":"<p><p>Network analysis has become an essential tool in biological and biomedical research, providing insights into complex biological mechanisms. Since biological systems are inherently time-dependent, incorporating time-varying methods is crucial for capturing temporal changes, adaptive interactions, and evolving dependencies within networks. Our study explores key time-varying methodologies for network structure estimation and network inference based on observed structures. We begin by discussing approaches for estimating network structures from data, focusing on the time-varying Gaussian graphical model, dynamic Bayesian network, and vector autoregression-based causal analysis. Next, we examine analytical techniques that leverage pre-specified or observed networks, including other autoregression-based methods and latent variable models. Furthermore, we explore practical applications and computational tools designed for these methods. By synthesizing these approaches, our study provides a comprehensive evaluation of their strengths and limitations in the context of biological data analysis.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12096012/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144118818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}