{"title":"Accurately Predicting Cell Type Abundance from Spatial Histology Image Through HPCell.","authors":"Yongkang Zhao, Youyang Li, Weijiang Yu, Hongyu Zhang, Zheng Wang, Yuedong Yang, Yuansong Zeng","doi":"10.1007/s12539-025-00757-9","DOIUrl":"https://doi.org/10.1007/s12539-025-00757-9","url":null,"abstract":"<p><p>Recent advancements in spatial transcriptomics (ST) have revolutionized our ability to simultaneously profile gene expression, spatial location, and tissue morphology, enabling the precise mapping of cell types and signaling pathways within their native tissue context. However, the high cost of sequencing remains a significant barrier to its widespread adoption. Although existing methods often leverage histopathological images to predict transcriptomic profiles and identify cellular heterogeneity, few approaches directly estimate cell-type abundance from these images. To address this gap, we propose HPCell, a deep learning framework for inferring cell-type abundance directly from H&E-stained histology images. HPCell comprises three key modules: a pathology foundation module, a hypergraph module, and a Transformer module. It begins by dividing whole-slide images (WSIs) into patches, which are processed by the pathology foundation module using a teacher-student framework to extract robust morphological features. These features are used to construct a hypergraph, where each patch (node) connects to its spatial neighbors to model complex many-to-many relationships. The Transformer module applies attention to the hypergraph features to capture long-range dependencies. Finally, features from all modules are integrated to estimate cell-type abundance. Extensive experiments show that HPCell consistently outperforms state-of-the-art methods across multiple spatial transcriptomics datasets, offering a scalable and cost-effective approach for investigating tissue structure and cellular interactions.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144992470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hot-Spot-Guided Generative Deep Learning for Drug-Like PPI Inhibitor Design.","authors":"Heqi Sun, Jiayi Li, Yufang Zhang, Shenggeng Lin, Junwei Chen, Hong Tan, Ruixuan Wang, Xueying Mao, Jianwei Zhao, Rongpei Li, Dong-Qing Wei","doi":"10.1007/s12539-025-00756-w","DOIUrl":"https://doi.org/10.1007/s12539-025-00756-w","url":null,"abstract":"<p><p>Protein-protein interactions (PPIs) are essential therapeutic targets, yet their large and relatively flat interfaces hinder the development of small-molecule inhibitors. Traditional computational approaches rely heavily on existing chemical libraries or expert heuristics, restricting exploration of novel chemical space. To address these challenges, we present Hot2Mol, a generative deep learning framework for the de novo design of target-specific and drug-like PPI inhibitors. Hot2Mol captures crucial pharmacophoric features from hot-spot residues, allowing precise targeting of PPI interfaces while eliminating the need for known bioactive ligands. The framework integrates three main components: a conditional transformer for pharmacophore-guided, property-constrained molecular generation; an E(n)-equivariant graph neural network to ensure accurate spatial alignment with PPI hot-spot pharmacophores; a variational autoencoder to sample novel and diverse molecular structures. Comprehensive assessments demonstrate that Hot2Mol outperforms state-of-the-art models in binding affinity, drug-likeness, synthetic accessibility, novelty, and uniqueness. Molecular dynamics simulations further confirm the strong binding stability of generated compounds. Case studies underscore Hot2Mol's ability to design high-affinity and selective PPI inhibitors, highlighting its potential to accelerate rational PPI-targeted drug discovery.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144953019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yue Yu, Wei Zhang, Xiaoying Zheng, Juan Shen, Yuanyuan Li
{"title":"Clustering Single-Cell RNA-Seq Data with Low-Rank Matrix Factorization and Local Graph Regularization.","authors":"Yue Yu, Wei Zhang, Xiaoying Zheng, Juan Shen, Yuanyuan Li","doi":"10.1007/s12539-025-00762-y","DOIUrl":"10.1007/s12539-025-00762-y","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) offers significant opportunities to reveal cellular heterogeneity and diversity. Accurate cell type identification is critical for downstream analyses and understanding the mechanisms of heterogeneity. However, challenges arise from the high dimensionality, sparsity, and noise of scRNA-seq data. While various low-rank representation (LRR)-based clustering methods have been developed, many existing approaches may inaccurately capture relationships or conflate true patterns with noise. To address these limitations, we introduce a novel clustering algorithm that integrates low-rank matrix decomposition with local graph regularization (LRMGC). This approach applies a tri-decomposition strategy to the representation matrix to derive an aligned core matrix, and characterizes the \"distance\" between cells in a lower-dimensional space through a local manifold regularization term. Rather than relying on the kernel norm of the representation matrix, the Schatten p-norm is applied to the core matrix to robustly learn the similarity matrix against noise and outliers, while maintaining the high-dimensional noisy data's underlying subspace structure for accurate and robust clustering. Additionally, the final similarity matrix is obtained by applying the angular alignment strategy on the similarity matrix. Comprehensive experiments and comparisons with advanced methods on scRNA-seq datasets demonstrate LRMGC's superior performance and reliability in uncovering cell type composition. Furthermore, a variety of downstream analyses, such as marker gene identification, functional enrichment analysis, rare cell recognition, and cell-cell communication, also demonstrate the effectiveness of LRMGC.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":""},"PeriodicalIF":3.9,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144952966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EnDM-CPP: A Multi-view Explainable Framework Based on Deep Learning and Machine Learning for Identifying Cell-Penetrating Peptides with Transformers and Analyzing Sequence Information.","authors":"Lun Zhu, Zehua Chen, Sen Yang","doi":"10.1007/s12539-024-00673-4","DOIUrl":"10.1007/s12539-024-00673-4","url":null,"abstract":"<p><p>Cell-Penetrating Peptides (CPPs) are a crucial carrier for drug delivery. Since the process of synthesizing new CPPs in the laboratory is both time- and resource-consuming, computational methods to predict potential CPPs can be used to find CPPs to enhance the development of CPPs in therapy. In this study, EnDM-CPP is proposed, which combines machine learning algorithms (SVM and CatBoost) with convolutional neural networks (CNN and TextCNN). For dataset construction, three previous CPP benchmark datasets, including CPPsite 2.0, MLCPP 2.0, and CPP924, are merged to improve the diversity and reduce homology. For feature generation, two language model-based features obtained from the Transformer architecture, including ProtT5 and ESM-2, are employed in CNN and TextCNN. Additionally, sequence features, such as CPRS, Hybrid PseAAC, KSC, etc., are input to SVM and CatBoost. Based on the result of each predictor, Logistic Regression (LR) is built to predict the final decision. The experiment results indicate that ProtT5 and ESM-2 fusion features significantly contribute to predicting CPP and that combining employed features and models demonstrates better association. On an independent test dataset comparison, EnDM-CPP achieved an accuracy of 0.9495 and a Matthews correlation coefficient of 0.9008 with an improvement of 2.23%-9.48% and 4.32%-19.02%, respectively, compared with other state-of-the-art methods. Code and data are available at https://github.com/tudou1231/EnDM-CPP.git .</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"744-769"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142876973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advancing the Accuracy of Anti-MRSA Peptide Prediction Through Integrating Multi-Source Protein Language Models.","authors":"Watshara Shoombuatong, Pakpoom Mookdarsanit, Lawankorn Mookdarsanit, Nalini Schaduangrat, Saeed Ahmed, Muhammad Kabir, Pramote Chumnanpuen","doi":"10.1007/s12539-025-00696-5","DOIUrl":"10.1007/s12539-025-00696-5","url":null,"abstract":"<p><p>The emergence of methicillin-resistant Staphylococcus aureus (MRSA) as a recognized cause of community-acquired and hospital infections has brought about a need for the efficient and accurate identification of peptides with anti-MRSA properties in drug discovery and development pipelines. However, current experimental methods often tend to be labor- and resource-intensive. Thus, there is an immediate requirement to develop practical computational solutions for identifying sequence-based anti-MRSA peptides. Lately, pre-trained protein language models (pLMs) have emerged as a remarkable advancement for encoding peptide sequences as discriminative feature embeddings, uncovering plentiful protein-level information and successfully repurposing it for in silico peptide property prediction. In this study, we present pLM4MRSA, a framework based on pLMs designed to enhance the accuracy of predicting anti-MRSA peptides. In this framework, we combine feature embeddings from various pLMs, such as ProtTrans, and evolutionary-scale modeling (ESM-2) which provide complementary information for prediction. These individual pLM strengths are integrated to form hybrid feature embeddings. Next, we apply principal component analysis (PCA) to process these hybrid embeddings. The resulting PCA-transformed feature vectors are then used as inputs for constructing the predictive model. Experimental results on the independent test dataset showed that the proposed pLM4MRSA approach achieved a balanced accuracy and Matthew correlation coefficient of 0.983 and 0.980, respectively, representing remarkable improvements over the state-of-the-art methods by 2.53%-4.83% and 7.73%-13.23%, respectively. This indicates that pLM4MRSA is a high-performance prediction model with excellent scope of applicability. Additionally, comparison with well-known hand-crafted features demonstrated that the proposed hybrid feature embeddings complement each other effectively, capturing discriminative patterns for more accurate anti-MRSA peptide prediction. We anticipate that pLM4MRSA will serve as an effective solution for accurate and high-capacity prediction of anti-MRSA peptides from peptide sequences.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"716-729"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143604811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qiguo Dai, Wuhao Liu, Xianhai Yu, Xiaodong Duan, Ziqiang Liu
{"title":"Self-Supervised Graph Representation Learning for Single-Cell Classification.","authors":"Qiguo Dai, Wuhao Liu, Xianhai Yu, Xiaodong Duan, Ziqiang Liu","doi":"10.1007/s12539-025-00700-y","DOIUrl":"10.1007/s12539-025-00700-y","url":null,"abstract":"<p><p>Accurately identifying cell types in single-cell RNA sequencing data is critical for understanding cellular differentiation and pathological mechanisms in downstream analysis. As traditional biological approaches are laborious and time-intensive, it is imperative to develop computational biology methods for cell classification. However, it remains a challenge for existing methods to adequately utilize the potential gene expression information within the vast amount of unlabeled cell data, which limits their classification and generalization performance. Therefore, we propose a novel self-supervised graph representation learning framework for single-cell classification, named scSSGC. Specifically, in the pre-training stage of self-supervised learning, multiple K-means clustering tasks conducted on unlabeled cell data are jointly employed for model training, thereby mitigating the issue of limited labeled data. To effectively capture the potential interactions among cells, we introduce a locally augmented graph neural network to enhance the information aggregation capability for nodes with fewer neighbors in the cell graph. A range of benchmark experiments demonstrates that scSSGC outperforms existing state-of-the-art cell classification methods. More importantly, scSSGC provides stable performance when faced with cross-datasets, indicating better generalization ability.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"566-575"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143780053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predicting CircRNA-Disease Associations Based on Heterogeneous Graph Neural Network and Knowledge Graph Attribute Mining Attention.","authors":"Wei Lan, Cong Peng, Hongyu Zhang, Chunling Li, Qingfeng Chen, Xin Xiao, Zhiqiang Wang","doi":"10.1007/s12539-025-00706-6","DOIUrl":"10.1007/s12539-025-00706-6","url":null,"abstract":"<p><p>The exploration of associations between circular RNAs (circRNAs) and diseases contributes to a deeper understanding of the pathogenesis of diseases. Many computational methods have been proposed for circRNA-disease associations identification. However, these methods still exhibit some limitations such as ignoring the effect of noise. In this paper, we proposed a new knowledge graph attribute mining attention network (KAATCDA) to predict circRNA-disease associations based on knowledge graph attribute network (KGA) and attribute mining attention network (AMA). Firstly, KGA is used to learn the feature representation of diseases. Then, the features of circRNAs are obtained using AMA, which are similar to disease feature representations. Finally, the scores of circRNA-disease associations are predicted based on circRNA feature representation and disease feature representation. Experiments of five-fold cross-validation on two datasets demonstrate that KAATCDA outperforms other state-of-the-art methods. In addition, the case study shows our method can effectively predict unknown circRNA-disease associations.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"586-597"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144010476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hatice Busra Luleci, Selcen Ari Yuka, Alper Yilmaz
{"title":"Efficient Storage and Analysis of Genomic Data: A k-mer Frequency Mapping and Image Representation Method.","authors":"Hatice Busra Luleci, Selcen Ari Yuka, Alper Yilmaz","doi":"10.1007/s12539-024-00659-2","DOIUrl":"10.1007/s12539-024-00659-2","url":null,"abstract":"<p><p>k-mer frequencies are crucial for understanding DNA sequence patterns and structure, with applications in motif discovery, genome classification, and short read assembly. However, the exponential increase in the dimension of frequency tables with increasing k-mer length poses storage challenges. In this study, we present a novel method for compressing k-mer data without information loss, aiming to optimize storage and analysis processes. We employed Chaos Game Representation (CGR) to map k-mers to coordinates and used these components to generate raster images of k-mers. The CGR maps were partitioned and labeled based on substrings, with each substring mapped to a subframe, creating a fractal-like structure. The entire k-mer frequency set of each genomic sequence was represented as a single image, with each pixel corresponding to a specific k-mer and its occurrence. This approach reduced file size by up to 16-fold compared to plain text and 3-fold compared to binary format. Furthermore, we demonstrated the feasibility of performing alignment-free similarity analyses on images derived from k-mer frequencies of whole genome sequences from 14 plant species. Our results highlight the potential of this method as a fast and efficient tool for accessing, processing, and analyzing large biological sequence datasets, enabling the retrieval of k-mer frequencies and image reconstruction.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"691-697"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142464357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Riqian Hu, Ruiquan Ge, Guojian Deng, Jin Fan, Bowen Tang, Changmiao Wang
{"title":"MultiKD-DTA: Enhancing Drug-Target Affinity Prediction Through Multiscale Feature Extraction.","authors":"Riqian Hu, Ruiquan Ge, Guojian Deng, Jin Fan, Bowen Tang, Changmiao Wang","doi":"10.1007/s12539-025-00697-4","DOIUrl":"10.1007/s12539-025-00697-4","url":null,"abstract":"<p><p>The discovery and development of novel pharmaceutical agents is characterized by high costs, lengthy timelines, and significant safety concerns. Traditional drug discovery involves pharmacologists manually screening drug molecules against protein targets, focusing on binding within protein cavities. However, this manual process is slow and inherently limited. Given these constraints, the use of deep learning techniques to predict drug-target interaction (DTI) affinities is both significant and promising for future applications. This paper introduces an innovative deep learning architecture designed to enhance the prediction of DTI affinities. The model ingeniously combines graph neural networks, pre-trained large-scale protein models, and attention mechanisms to improve performance. In this framework, molecular structures are represented as graphs and processed through graph neural networks and multiscale convolutional networks to facilitate feature extraction. Simultaneously, protein sequences are encoded using pre-trained ESM-2 large models and processed with bidirectional long short-term memory networks. Subsequently, the molecular and protein embeddings derived from these processes are integrated within a fusion module to compute affinity scores. Experimental results demonstrate that our proposed model outperforms existing methods on two publicly available datasets.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"555-565"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143523301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconstructing Waddington Landscape from Cell Migration and Proliferation.","authors":"Yourui Han, Bolin Chen, Zhongwen Bi, Jianjun Zhang, Youpeng Hu, Jun Bian, Ruiming Kang, Xuequn Shang","doi":"10.1007/s12539-024-00686-z","DOIUrl":"10.1007/s12539-024-00686-z","url":null,"abstract":"<p><p>The Waddington landscape was initially proposed to depict cell differentiation, and has been extended to explain phenomena such as reprogramming. The landscape serves as a concrete representation of cellular differentiation potential, yet the precise representation of this potential remains an unsolved problem, posing significant challenges to reconstructing the Waddington landscape. The characterization of cellular differentiation potential relies on transcriptomic signatures of known markers typically. Numerous computational models based on various energy indicators, such as Shannon entropy, have been proposed. While these models can effectively characterize cellular differentiation potential, most of them lack corresponding dynamical interpretations, which are crucial for enhancing our understanding of cell fate transitions. Therefore, from the perspective of cell migration and proliferation, a feasible framework was developed for calculating the dynamically interpretable energy indicator to reconstruct Waddington landscape based on sparse autoencoders and the reaction diffusion advection equation. Within this framework, typical cellular developmental processes, such as hematopoiesis and reprogramming processes, were dynamically simulated and their corresponding Waddington landscapes were reconstructed. Furthermore, dynamic simulation and reconstruction were also conducted for special developmental processes, such as embryogenesis and Epithelial-Mesenchymal Transition process. Ultimately, these diverse cell fate transitions were amalgamated into a unified Waddington landscape.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"541-554"},"PeriodicalIF":3.9,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142948366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}