{"title":"A novel deep learning framework with dynamic tokenization for identifying chromatin interactions along with motif importance investigation.","authors":"Liangcan Li, Xin Li, Hao Wu","doi":"10.1093/bib/bbaf289","DOIUrl":"10.1093/bib/bbaf289","url":null,"abstract":"<p><p>A comprehensive understanding of chromatin interaction networks is crucial for unraveling the regulatory mechanisms of gene expression. While various computational methods have been developed to predict chromatin interactions and address the limitations and high costs of high-throughput experimental techniques, their performance is often overestimated due to the specificity of chromatin interaction data. In this study, we proposed Inter-Chrom, a novel deep learning model integrating dynamic tokenization, DNABERT's word embedding, and the efficient channel attention mechanism to identify chromatin interactions using sequence and genomic features, leveraging a newly curated dataset. Experimental results demonstrate that Inter-Chrom outperforms existing methods on three cell line datasets. Additionally, we proposed a novel method for calculating motif importance and analyzed the motifs with high importance scores identified through this method, including those that have been extensively studied and others that have received limited attention to date. Inter-Chrom's robustness for input variations and superior ability to leverage sequence features position it as a powerful tool for advancing chromatin interaction research. The source code of Inter-Chrom is freely available at https://github.com/HaoWuLab-Bioinformatics/Inter-Chrom.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204613/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144332453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GC-balanced polar codes correcting insertions, deletions and substitutions for DNA storage.","authors":"Rui Zhang, Huaming Wu","doi":"10.1093/bib/bbaf278","DOIUrl":"10.1093/bib/bbaf278","url":null,"abstract":"<p><p>In order to address the insertion, deletion, and substitution (IDS) errors inherent in deoxyribonucleic acid (DNA) storage channels during DNA synthesis and sequencing, we propose a novel GC-balanced polar code scheme tailored to rectify these errors by incorporating the unique characteristics of the DNA storage channel into the polar code design. The innovation lies in modeling errors as a drift vector, reflecting deviations from the desired DNA sequence, aiming to improve the reliability of DNA-based data storage. In this paper, we developed a GC-balanced polar code scheme named DNA-BP Code, which stands for balanced polar code for DNA storage, that effectively rectifies IDS errors in DNA storage. The computational complexity of the proposed encoding and decoding algorithms is $mathcal{O}(Nlog N)$ with respect to the code length $N$. Simulation results show the bit error rate and block error rate as functions of the code length and IDS probability, demonstrating the efficacy of our approach in enhancing the accuracy of DNA storage systems.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204671/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144332456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex McSweeney-Davis, Chengran Fang, Emmanuel Caruyer, Anne Kerbrat, Jing-Rebecca Li
{"title":"Alpha_Mesh_Swc: automatic and robust surface mesh generation from the skeleton description of brain cells.","authors":"Alex McSweeney-Davis, Chengran Fang, Emmanuel Caruyer, Anne Kerbrat, Jing-Rebecca Li","doi":"10.1093/bib/bbaf258","DOIUrl":"10.1093/bib/bbaf258","url":null,"abstract":"<p><p>In recent years, there has been a significant increase in publicly available skeleton descriptions of real brain cells from laboratories all over the world. In theory, this should make it is possible to perform large-scale realistic simulations on brain cells. However, currently there is still a gap between the skeleton descriptions and high-quality simulation-ready surface and volume meshes of brain cells. We propose and implement a tool called Alpha_Mesh_Swc (AMS) to generate automatically and efficiently triangular surface meshes that are optimized for finite element simulations. We use an Alpha Wrapping method with an offset parameter on component surface meshes to efficiently generate a global watertight mesh. Then mesh simplification and re-meshing are used to produce an optimal surface mesh. Our methodology limits the number of surface triangles, while preserving geometrical accuracy, permit cutting, and gluing of cell components, is robust to imperfect skeleton descriptions and allows mixed cell descriptions (surface meshes combined with skeletons). We compared the robustness, performance and accuracy of AMS against existing tools and found significant improvement in terms of mesh accuracy. We show, on average, we can generate fully automatically a brain cell (neurons or glia) surface mesh in a couple of minutes on a laptop computer resulting in a simplified surface mesh with only around 10k nodes. The resulting meshes were used to perform diffusion MRI simulations in neurons and microglia. The code and a number of sample brain cell surface meshes have been made publicly available.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12146268/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144246483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gated-GPS: enhancing protein-protein interaction site prediction with scalable learning and imbalance-aware optimization.","authors":"Xin Gao, Hanqun Cao, Jinpeng Li, Jiezhong Qiu, Guangyong Chen, Pheng-Ann Heng","doi":"10.1093/bib/bbaf248","DOIUrl":"10.1093/bib/bbaf248","url":null,"abstract":"<p><p>In protein-protein interaction site (PPIS) prediction, existing machine learning models struggle with small datasets, limiting their predictive accuracy for unseen proteins. Additionally, class imbalance in protein complexes, where binding residues constitute a small fraction of all residues, hinders model performance. To address these challenges, we constructed a training dataset 9$times $ larger than previous benchmarks by filtering the latest protein-protein complex data, improving diversity and generalization. We propose Gated-GPS, a Graph Transformer model with a novel gating mechanism designed to effectively leverage this expanded dataset. Additionally, we integrate cross-entropy loss with Tversky Loss to adjust sensitivity to positive and negative samples, mitigating class imbalance by emphasizing underrepresented binding residues. Experimental results show that Gated-GPS outperforms state-of-the-art (SOTA) models across four test sets. Notably, on the UBTest dataset, designed to evaluate generalization on unbounded proteins, our method improves MCC and AUPRC by 18.5% and 21.4%, respectively, over the previous SOTA. In a case study of snake venom toxin-protein interactions, our model accurately identified interaction sites, demonstrating its potential for therapeutic design and advancing the understanding of complex protein interactions.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133684/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144214975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DeepExDC interprets genomic compartmentalization changes in single-cell Hi-C data.","authors":"Hongqiang Lyu, Pei Cao, Wenyao Long, Xiaoran Yin, Shengjun Xu, Laiyi Fu","doi":"10.1093/bib/bbaf301","DOIUrl":"10.1093/bib/bbaf301","url":null,"abstract":"<p><p>Single-cell Hi-C (scHi-C) technology enables probing of higher-order chromatin structures in individual cells. It provides an opportunity to get a deeper insight into genomic compartmentalization changes of single cells across different conditions, paving the way to a common understanding of the interplay among compartmental organization, genome functions, and cellular phenotypes. Unfortunately, there are only a few methods currently available for the differential analysis of A/B compartments on Hi-C data at the bulk level; the computational analysis of compartmentalization changes at the single-cell level is a field in its infancy. Herein, we propose DeepExDC, an interpretable 1D convolutional neural network for differential analysis of A/B compartments in scHi-C data on a genome-wide scale. It accepts Hi-C contact matrices at the single-cell level, runs without any distribution assumption and differential pattern limitation, and interprets genomic compartmentalization changes across multiple conditions. The results on simulated and experimental scHi-C data show that our DeepExDC has higher accuracies in detecting different types of compartmentalization changes, and the interpretation values are demonstrated to be able to reflect compartment changes across cell types. It is also observed that the differential compartments given by DeepExDC agree well with those by state-of-the-art methods at the bulk level, help to characterize heterogeneity of single cells, and exhibit a reasonable biological relevance in multiple regards. In addition, considering that DeepExDC is free of distribution assumptions and differential patterns, we attempted to transfer it onto scRNA-seq and scATAC-seq data; it is interesting that our method also presents considerable power compared with the competing methods.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206447/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144526478","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Editor's Note on 'Bioinformatics in Russia: history and present-day landscape'.","authors":"","doi":"10.1093/bib/bbaf181","DOIUrl":"10.1093/bib/bbaf181","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12101724/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144131840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"scaLR: a low-resource deep neural network-based platform for single cell analysis and biomarker discovery.","authors":"Saiyam Jogani, Anand Santosh Pol, Mayur Prajapati, Amit Samal, Kriti Bhatia, Jayendra Parmar, Urvik Patel, Falak Shah, Nisarg Vyas, Saurabh Gupta","doi":"10.1093/bib/bbaf243","DOIUrl":"10.1093/bib/bbaf243","url":null,"abstract":"<p><p>Single-cell ribonucleic acid (RNA) sequencing (scRNA-seq) produces vast amounts of individual cell profiling data. Its analysis presents a significant challenge in accurately annotating cell types and their associated biomarkers. Different pipelines based on deep neural network (DNN) methods have been employed to tackle these issues. These pipelines have arisen as a promising resource and can extract meaningful and concise features from noisy, diverse, and high-dimensional data to enhance annotations and subsequent analysis. Existing tools require high computational resources to execute large sample datasets. We have developed a cutting-edge platform known as scaLR (Single-cell analysis using low resource) that efficiently processes data into feature subsets, samples in batches to reduce the required memory for processing large datasets, and running DNN models in multiple central processing units. scaLR is equipped with data processing, feature extraction, training, evaluation, and downstream analysis. Its novel feature extraction algorithm first trains the model on a feature subset and stores the importance of the features for all the features in that subset. At the end of the training of all subsets, the top-K features are selected based on their importance. The final model is trained on top-K features; its performance evaluation and associated downstream analysis provide significant biomarkers for different cell types and diseases/traits. Our findings indicate that scaLR offers comparable prediction accuracy and requires less model training time and computational resources than existing Python-based pipelines. We present scaLR, a Python-based platform, engineered to utilize minimal computational resources while maintaining comparable execution times and analysis costs to existing frameworks.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121358/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144172713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anas Bilal, Muhammad Taseer Suleman, Khalid Almohammadi, Abdulkareem Alzahrani, Xiaowen Liu
{"title":"2OM-Pred: prediction of 2-O-methylation sites in ribonucleic acid using diverse classifiers.","authors":"Anas Bilal, Muhammad Taseer Suleman, Khalid Almohammadi, Abdulkareem Alzahrani, Xiaowen Liu","doi":"10.1093/bib/bbaf282","DOIUrl":"10.1093/bib/bbaf282","url":null,"abstract":"<p><p>2-O-methylation (2OM) is a vital post-transcriptional modification which is formed by a functional group through the attachment of a methyl (-CH3) group to the second position of an aromatic ring hydroxyl group (-OH). It plays an active part in RNA physical configuration stability and the way different RNA molecules interrelate. Further, this modification plays a pivotal role in changing the epigenetic regulation of cellular processes. Previous approaches like mass spectrometry could not fully enhance the identification of RNA-modified sites. Sequence data were useful in the development of measures that meant the use of computationally intelligent system to identify 2OM sites quickly. This research proposed a new novel method of feature extraction and generation from the available sequences, and the feature dimensionality reduction has been done through the incorporation of statistical moments. The final feature vectors were developed and used to train prediction models. The assessment of prediction models was carried out through independent set tests and k-fold cross-validation. Through rigorous testing, the bagging ensemble model outperformed and revealed optimal accuracy scores. A publicly accessible web-based application has been developed which can be accessed via https://2om-pred-webapp.streamlit.app/.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12199913/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144309541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"scValue: value-based subsampling of large-scale single-cell transcriptomic data for machine and deep learning tasks.","authors":"Li Huang, Weikang Gong, Dongsheng Chen","doi":"10.1093/bib/bbaf279","DOIUrl":"10.1093/bib/bbaf279","url":null,"abstract":"<p><p>Large single-cell ribonucleic acid-sequencing (scRNA-seq) datasets offer unprecedented biological insights but present substantial computational challenges for visualization and analysis. While existing subsampling methods can enhance efficiency, they may not ensure optimal performance in downstream machine learning and deep learning (ML/DL) tasks. Here, we introduce scValue, a novel approach that ranks individual cells by 'data value' using out-of-bag estimates from a random forest model. scValue prioritizes high-value cells and allocates greater representation to cell types with higher variability in data value, effectively preserving key biological signals within subsamples. We benchmarked scValue on automatic cell-type annotation tasks across four large datasets, paired with distinct ML/DL models. Our method consistently outperformed existing subsampling methods, closely matching full-data performance across all annotation tasks. In three additional case studies-label transfer learning, cross-study label harmonization, and bulk RNA-seq deconvolution-scValue more effectively preserved T-cell annotations across human gut-colon datasets, more accurately reproduced T-cell subtype relationships in a human spleen dataset, and constructed a more reliable single-cell immune reference for cell-type deconvolution in simulated bulk tissue samples. Finally, using 16 public datasets ranging from tens of thousands to millions of cells, we evaluated subsampling quality based on computational time, Gini coefficient, and Hausdorff distance. scValue demonstrated fast execution, well-balanced cell-type representation, and distributional properties akin to uniform sampling. Overall, scValue provides a robust and scalable solution for subsampling large scRNA-seq data in ML/DL workflows. It is available as an open-source Python package installable via pip, with source code at https://github.com/LHBCB/scvalue.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12165832/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144293342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaoyan Yu, Yixuan Ren, Min Xia, Zhenqiu Shu, Liehuang Zhu
{"title":"Decoupled GNNs based on multi-view contrastive learning for scRNA-seq data clustering.","authors":"Xiaoyan Yu, Yixuan Ren, Min Xia, Zhenqiu Shu, Liehuang Zhu","doi":"10.1093/bib/bbaf198","DOIUrl":"10.1093/bib/bbaf198","url":null,"abstract":"<p><p>Clustering is pivotal in deciphering cellular heterogeneity in single-cell RNA sequencing (scRNA-seq) data. However, it suffers from several challenges in handling the high dimensionality and complexity of scRNA-seq data. Especially when employing graph neural networks (GNNs) for cell clustering, the dependencies between cells expand exponentially with the number of layers. This results in high computational complexity, negatively impacting the model's training efficiency. To address these challenges, we propose a novel approach, called decoupled GNNs, based on multi-view contrastive learning (scDeGNN), for scRNA-seq data clustering. Firstly, this method constructs two adjacency matrices to generate distinct views, and trains them using decoupled GNNs to derive the initial cell feature representations. These representations are then refined through a multilayer perceptron and a contrastive learning layer, ensuring the consistency and discriminability of the learned features. Finally, the learned representations are fused and applied to the cell clustering task. Extensive experimental results on nine real scRNA-seq datasets from various organisms and tissues show that the proposed scDeGNN method significantly outperforms other state-of-the-art scRNA-seq data clustering algorithms across multiple evaluation metrics.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12077398/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144076070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}