{"title":"Accurate Tracking of Arabidopsis Root Cortex Cell Nuclei in 3D Time-Lapse Microscopy Images Based on Genetic Algorithm.","authors":"Yu Song, Tatsuaki Goh, Yinhao Li, Jiahua Dong, Shunsuke Miyashima, Yutaro Iwamoto, Yohei Kondo, Keiji Nakajima, Yen-Wei Chen","doi":"10.1109/TCBBIO.2025.3617480","DOIUrl":"https://doi.org/10.1109/TCBBIO.2025.3617480","url":null,"abstract":"<p><p>Arabidopsis is a widely used model plant to study physiology and development. Live imaging is an important technique to visualize and quantify processes in plant growth and cell division, where accurate cell tracking is essential. The commonly used software TrackMate adopts a tracking-by-detection approach, applying Laplacian of Gaussian (LoG) for blob detection and a Linear Assignment Problem (LAP) tracker for tracking. However, its performance declines when cells are densely arranged. To overcome this limitation, we propose an accurate tracking method based on a Genetic Algorithm (GA) that incorporates knowledge of Arabidopsis root cellular patterns and spatial relationships among volumes. Our method follows a coarse-to-fine strategy: first performing relatively simple line-level tracking of nuclei, then refining associations based on the linear arrangement of cell files and their spatial relationships. We evaluated the method on longterm live imaging datasets of Arabidopsis root tips, and with minor manual correction, it achieved accurate nuclear tracking. To the best of our knowledge, this represents the first successful attempt to address a long-standing problem in time-lapse microscopy of the root meristem by providing an accurate tracking method for Arabidopsis root nuclei.</p>","PeriodicalId":520987,"journal":{"name":"IEEE transactions on computational biology and bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siyu Tao, Yang Yang, Xin Liu, Yimiao Feng, And Jie Zheng
{"title":"Learning universal knowledge graph embedding for predicting biomedical pairwise interactions.","authors":"Siyu Tao, Yang Yang, Xin Liu, Yimiao Feng, And Jie Zheng","doi":"10.1109/TCBBIO.2025.3617331","DOIUrl":"https://doi.org/10.1109/TCBBIO.2025.3617331","url":null,"abstract":"<p><p>Predicting biomedical interactions is crucial for understanding various biological processes and drug discovery. Graph neural networks (GNNs) are promising in identifying novel interactions when extensive labeled data are available. However, labeling biomedical interactions is often time-consuming and labor-intensive, resulting in low-data scenarios. Furthermore, distribution shifts between training and test data in real-world applications pose a challenge to the generalizability of GNN models. Recent studies suggest that pre-training GNN models with self-supervised learning on unlabeled data can enhance their performance in predicting biomedical interactions. Here, we propose LukePi, a novel self-supervised pre-training framework that pre-trains GNN models on biomedical knowledge graphs (BKGs). LukePi is trained with two self-supervised tasks: topology-based node degree classification and semantics-based edge recovery. The former is to predict the degree of a node from its topological context and the latter is to infer both type and existence of a candidate edge by learning semantic information in the BKG. By integrating the two complementary tasks, LukePi effectively captures the rich information from the BKG, thereby enhancing the quality of node representations. We evaluate the performance of LukePi on two critical link prediction tasks: predicting synthetic lethality and drug-target interactions, using four benchmark datasets. In both distribution-shift and low-data scenarios, LukePi significantly outperforms 22 baseline models, demonstrating the power of the graph pre-training strategy when labeled data are sparse.</p>","PeriodicalId":520987,"journal":{"name":"IEEE transactions on computational biology and bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ying Shi, Shifu Luo, Yi Pan, Hao Wu, Wenjian Wang, Jinyan Li
{"title":"Germline Variation Calling from Long Reads' Alignment Data through Spatiotemporal Attention.","authors":"Ying Shi, Shifu Luo, Yi Pan, Hao Wu, Wenjian Wang, Jinyan Li","doi":"10.1109/TCBBIO.2025.3617798","DOIUrl":"https://doi.org/10.1109/TCBBIO.2025.3617798","url":null,"abstract":"<p><p>Although Oxford Nanopore Technologies (ONT) long-read sequencing with the Q20 chemistry has reduced the raw error rate to 1%, this erring rate in base insertions, deletions, and substitutions remains significantly inferior to that of NGS short-read sequencing (0.1% error rate). This limitation poses substantial challenges for variant calling in complicated genomic regions from deeply sequenced long-read whole-genome alignments. Current deep learning methods exhibit 20,000-30,000 false variant calls per chromosome in Q20-calibrated ONT longread data. For Guppy v5.0.14 basecalled data, the number of errors or false calls exceeds 30,000 in structurally complicated regions. We present Attdeepcaller, a spatiotemporal attention based deep learning model that dynamically disentangles genuine sequencing errors from true germline variants in the alignment data, thereby significantly improving the prediction robustness at the complicated regions. On the HG002 chr1 Q20 data, the false calls are decreased from 26,043 to 22,739 (12.69% improvement); on the HG003 and HG004 datasets, there are 16.49% and 23.58% reductions in misidentifications, respectively. In addition, Attdeepcaller has significantly improved performance on crossversion data benchmarking tests. On the Guppy v5.0.14 datasets, the accuracy on the complicated areas has increased by 3%, and the recall rate has increased by 1%. On the Guppy v3.4.5 datasets, the accuracy is improved by 16%, and the recall rate is increased by 10%, proving the algorithm's strong adaptability to low-quality sequencing data.</p>","PeriodicalId":520987,"journal":{"name":"IEEE transactions on computational biology and bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145240806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Transcription Factor Prediction via Domain Knowledge Integration with Logic Tensor Networks.","authors":"Liyuan Gao, Linpeng Sun, Victor S Sheng","doi":"10.1109/TCBBIO.2025.3617864","DOIUrl":"https://doi.org/10.1109/TCBBIO.2025.3617864","url":null,"abstract":"<p><p>Transcription factors (TFs) are crucial regulators of gene expression, making their accurate prediction essential for understanding gene regulation mechanisms. However, traditional methods suffer from limited accuracy, while deep learning approaches require large training datasets and often lack interpretability, particularly in the absence of domain-specific knowledge. To address these challenges, we propose LTN-TFpredict, a novel neurosymbolic model that integrates Logic Tensor Networks (LTNs) with deep learning to enhance both prediction accuracy and interpretability. Our approach leverages pre-trained protein language models to generate high-dimensional sequence embeddings, which are then refined using logical constraints derived from five key TF-related motifs: zinc fingers, leucine zippers, basic helix-loop-helix, forkhead, and winged helix-turn-helix domains. These biologically informed constraints improve model training by enforcing known TF characteristics. Experimental evaluations demonstrate that LTN-TFpredict outperforms traditional models (TFpredict, XGBoost), CNN-based methods (DeepTFactor, ProtCNN), and transformer-based architectures (ESM-TFpredict, ProtT5), achieving superior prediction accuracy while maintaining logical consistency with domain knowledge. By combining deep learning with symbolic reasoning, LTN-TFpredict provides a robust, interpretable, and biologically grounded approach to TF prediction, advancing the role of neurosymbolic AI in computational biology.</p>","PeriodicalId":520987,"journal":{"name":"IEEE transactions on computational biology and bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145226576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Naga Raju Gudhe, Jaana M Hartikainen, Maria Tengstrom, Katri Pylkas, Robert Winqvist, Veli-Matti Kosma, Hamid Behravan, Arto Mannermaa
{"title":"GenoGraph: an Interpretable Graph Contrastive Learning Approach for Identifying Breast Cancer Risk Variants.","authors":"Naga Raju Gudhe, Jaana M Hartikainen, Maria Tengstrom, Katri Pylkas, Robert Winqvist, Veli-Matti Kosma, Hamid Behravan, Arto Mannermaa","doi":"10.1109/TCBBIO.2025.3617088","DOIUrl":"https://doi.org/10.1109/TCBBIO.2025.3617088","url":null,"abstract":"<p><p>Genome-wide association studies (GWASs) have identified over 2,400 genetic variants associated to breast cancer. Conventional GWASs methods that analyze variants independently often overlook the complex genetic interactions underlying disease susceptibility. Recent advancements such as Machine learning and deep learning approaches present promising alternatives, yet encounter challenges, including overfitting due to high dimensionality (∼10 million variants) and limited sample sizes, as well as limited interpretability. Here, we present GenoGraph, a graph-based contrastive learning framework designed to address these limitations by modeling high-dimensional genetic data in low-sample-size scenarios. We demonstrate GenoGraph's efficacy in breast cancer case-control classification task, achieving accuracy of 0.96 using the Biobank of Eastern Finland dataset. GenoGraph identified rs11672773 as a key risk variant in Finnish population, with significant interactions with rs10759243 and rs3803662. Furthermore, in-silico validation confirmed the biological relevance of these findings, underscoring GenoGraph's potential to advance breast cancer risk prediction and inform genetic interaction discoveries within population-specific contexts, with future extensions toward personalized medicine.</p>","PeriodicalId":520987,"journal":{"name":"IEEE transactions on computational biology and bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145215471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HiADN:Lightweight Resolution Enhancement of Hi-C Data Using High Information Attention Distillation Network.","authors":"Pingjing Li, Jiuxin Feng, Jun Guo, Jian Liu","doi":"10.1109/TCBBIO.2025.3614663","DOIUrl":"https://doi.org/10.1109/TCBBIO.2025.3614663","url":null,"abstract":"<p><p>Due to limitations in experimental library preparation and in practical sequencing cost, currently available Hi-C data is often sparse, affecting the precise characterization of complex 3D chromatin structures. Providing efficaciously computational models to elevate the quality of sparse Hi-C sequencing data for restoring the fundamental traits of 3D chromatin is of substantial significance. Herein, we introduce HiADN, a deep learning-based approach to infer dense high-resolution matrices from sparse Hi-C matrices. In particular, we firstly design a specialized architecture HiFM to captures local spatial structures and the patterns of Hi-C data. Then, we develop large kernel convolutional decomposition and attention mechanisms to effectively explore global patterns across longer genomic distances. Using HiADN, it is possible to construct biologically significant regions at high-resolution (e.g., 10Kb) while only using the 1/100 of original sequencing reads. The experimental results demonstrate that the effect of in silico libraries forecasted by computational models using HiADN is commensurate with that of experimental libraries, surpassing the state-of-the-art (SOTA) models. We further validated the effectiveness of HiADN in reconstructing the three-dimensional spatial structure of chromosomes on the GM12878, K562, and CH12-LX cell line datasets.</p>","PeriodicalId":520987,"journal":{"name":"IEEE transactions on computational biology and bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145152993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MTF-hERG: a Multi-type Features Fusion-based Framework for Predicting hERG Cardiotoxicity of Compounds.","authors":"Liwei Liu, Qi Zhang, Yuxiao Wei","doi":"10.1109/TCBBIO.2025.3614696","DOIUrl":"https://doi.org/10.1109/TCBBIO.2025.3614696","url":null,"abstract":"<p><p>The human ether-a-go-go-related gene (hERG) cardiac toxicity of a compound refers to its inhibitory effect on the hERG potassium channel. The hERG channel is crucial for cardiac depolarization, and its blockage can lead to prolongation of the QT interval, triggering arrhythmias and posing life-threatening risks. Therefore, assessing hERG cardiac toxicity is a vital consideration in drug development. Traditional assessment methods are complex and have low throughput, making the development of deep learning models to predict this toxicity essential for enhancing drug development efficiency, reducing risks, and promoting personalized treatment. In this paper, we propose a novel multi-type feature fusion framework, MTF-hERG, for accurately predicting the cardiac toxicity of hERG compounds. This framework integrates various molecular features such as molecular fingerprints, 2D molecular images, and 3D molecular graphs to comprehensively capture the intrinsic structures and properties of compounds. By utilizing fully connected neural networks, DenseNet, and Equivariant Graph Neural Networks for feature extraction, we ensure that the model can precisely identify molecular characteristics associated with hERG blocking activity. Through deep fusion of extracted features and the construction of fully connected layers with different activation functions, we achieve classification predictions of whether a compound is an hERG blocker and regression predictions of its hERG inhibitory capacity. When comparing MTF-hERG with other state-of-the-art methods using benchmark datasets, we found that the average ACC, AUC, AUPR, RMSE, and R² values of MTF-hERG were 0.926, 0.943, 0.913, 0.453, and 0.681, respectively. The results demonstrate that MTF-hERG exhibits excellent predictive performance in various scenarios, significantly outperforming the existing baseline models. Furthermore, the visualization results of MTF-hERG not only reveal the key features and decision mechanisms of the model but also provide valuable support for further optimization of molecular structures. Therefore, the MTF-hERG framework is poised to become a powerful tool for predicting the hERG cardiac toxicity of compounds, offering robust support for drug development and exerting a profound impact on human health.</p>","PeriodicalId":520987,"journal":{"name":"IEEE transactions on computational biology and bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145152949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Graph Attention Meets Pretrained Language Models: Adaptive K-Mer Decomposition for LncRNA-Protein Interaction Prediction.","authors":"Zeyuan Zeng, Jingxian Zeng, Defu Li, Qinke Peng, Haozhou Li, Ruimeng Li, Wentong Sun, Qingbo Zhang, Jinzhi Wang","doi":"10.1109/TCBBIO.2025.3614443","DOIUrl":"https://doi.org/10.1109/TCBBIO.2025.3614443","url":null,"abstract":"<p><p>Protein-RNA complexes, particularly those involving RNA-binding proteins and long non-coding RNAs (lncRNA), are commonly found to influence gene expression and mediate fundamental cellular processes. Despite significant advances in representations for these biological sequences, sequence decomposition based on k-mer generally results in fix-length substrings, failing to detect the information of variable-length biological functional regions. In this paper, we develop a concept of expressiveness for k-mer decompositions as a theoretical underpinning for traversing all k-mer decompositions. Based on this concept, we propose an advanced approach, BERTDGA-LPI, to detect the information of variable-length biological functional regions utilizing dynamic graph attention and to capture the influence of RNA and protein context leveraging pretrained language models. The experimental results demonstrate the outperformance of BERTDGA-LPI over state-of-the-art methods across two homo sapiens datasets, one plant species dataset, and two species-unspecific datasets. Furthermore, BERTDGA-LPI is validated as effective in predicting unknown RNA-protein interactions (RPI) with 100% prediction accuracy in six independent validation sets from different species. This study lays a theoretical underpinning for traversing all k-mer decompositions and innovatively offers a broadly applicable and efficient tool for LPI prediction and RPI prediction based only on sequences.</p>","PeriodicalId":520987,"journal":{"name":"IEEE transactions on computational biology and bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145153013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gihyeon Kim, Geonhui Jo, Minjeong Kim, Soo Young Cho, Jang-Hwan Choi
{"title":"SeqDA-HLA: Language model and dual attention-based network to predict peptide-HLA class I binding.","authors":"Gihyeon Kim, Geonhui Jo, Minjeong Kim, Soo Young Cho, Jang-Hwan Choi","doi":"10.1109/TCBBIO.2025.3614457","DOIUrl":"https://doi.org/10.1109/TCBBIO.2025.3614457","url":null,"abstract":"<p><p>Accurate prediction of peptide-HLA class I binding is crucial for immunotherapy and vaccine development, but existing methods often struggle to capture the intricate biological relationships between peptides and diverse HLA alleles. Here, we introduce SeqDA-HLA, a pan-specific prediction model that combines language model-based embeddings (ELMo) with a dual attention mechanism-self-aligned cross-attention and self-attention-to capture rich contextual features and pairwise interactions. Evaluations against 14 state-of-the-art methods on multiple benchmark datasets demonstrate that SeqDA-HLA consistently outperforms competing approaches, achieving an AUC value up to 0.9856 and accuracy as high as 0.9408. Notably, SeqDA-HLA maintains robust performance across peptide lengths (8-14) and HLA alleles, showcasing its generalizability. Beyond predictive accuracy, SeqDA-HLA offers interpretability by highlighting essential anchor residues and revealing key binding motifs, thereby aligning with experimentally validated biological insights. As a further demonstration of practical impact, we fine-tune SeqDA-HLA on an Influenza virus dataset, successfully predicting binding changes induced by single amino acid mutations. Overall, SeqDA-HLA serves as a powerful and interpretable tool for peptide-HLA binding prediction, with potential applications in epitope-based vaccine design and precision immunotherapy. The software is available open-source at https://github.com/Ewha-AI/SeqDA-HLA and as a web server at http://runai.ewha.ac.kr/seqda.</p>","PeriodicalId":520987,"journal":{"name":"IEEE transactions on computational biology and bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145152964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiang Zhang, Mingjie Yang, Xunhang Yin, Yining Qian, Fei Sun
{"title":"DeepGene: An Efficient Foundation Model for Genomics based on Pan-genome Graph Transformer.","authors":"Xiang Zhang, Mingjie Yang, Xunhang Yin, Yining Qian, Fei Sun","doi":"10.1109/TCBBIO.2025.3614354","DOIUrl":"https://doi.org/10.1109/TCBBIO.2025.3614354","url":null,"abstract":"<p><p>Decoding the language of DNA sequences is a fundamental problem in genome research. Mainstream pre-trained models like DNABERT-2 and Nucleotide Transformer have demonstrated remarkable achievements across a spectrum of DNA analysis tasks. Yet, these models still face the pivotal challenge of (1) genetic language diversity, or the capability to capture genetic variations across individuals or populations in the foundation models; (2) model efficiency, specifically how to enhance performance at scalable costs for large-scale genetic foundational models; (3) length extrapolation, or the ability to accurately interpret sequences ranging from short to long within a unified model framework. In response, we introduce DeepGene, a model leveraging Pan-genome and Minigraph representations to encompass the broad diversity of genetic language. DeepGene employs the rotary position embedding to improve the length extrapolation in various genetic analysis tasks. On the 28 tasks in Genome Understanding Evaluation, DeepGene achieves the overall best score. DeepGene outperforms other cutting-edge models for its compact model size and superior efficiency in processing sequences of varying lengths. The datasets and source code of DeepGene are available at GitHub (https://github.com/wds-seu/DeepGene).</p>","PeriodicalId":520987,"journal":{"name":"IEEE transactions on computational biology and bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145152969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}