{"title":"SpatialCTD: A Large-Scale Tumor Microenvironment Spatial Transcriptomic Dataset to Evaluate Cell Type Deconvolution for Immuno-Oncology.","authors":"Jiayuan Ding, Lingxiao Li, Qiaolin Lu, Julian Venegas, Yixin Wang, Lidan Wu, Wei Jin, Hongzhi Wen, Renming Liu, Wenzhuo Tang, Xinnan Dai, Zhaoheng Li, Wangyang Zuo, Yi Chang, Yu Leo Lei, Lulu Shang, Patrick Danaher, Yuying Xie, Jiliang Tang","doi":"10.1089/cmb.2024.0532","DOIUrl":"10.1089/cmb.2024.0532","url":null,"abstract":"<p><p>Recent technological advancements have enabled spatially resolved transcriptomic profiling but at a multicellular resolution that is more cost-effective. The task of cell type deconvolution has been introduced to disentangle discrete cell types from such multicellular spots. However, existing benchmark datasets for cell type deconvolution are either generated from simulation or limited in scale, predominantly encompassing data on mice and are not designed for human immuno-oncology. To overcome these limitations and promote comprehensive investigation of cell type deconvolution for human immuno-oncology, we introduce a large-scale spatial transcriptomic deconvolution benchmark dataset named SpatialCTD, encompassing 1.8 million cells and 12,900 pseudo spots from the human tumor microenvironment across the lung, kidney, and liver. In addition, SpatialCTD provides more realistic reference than those generated from single-cell RNA sequencing (scRNA-seq) data for most reference-based deconvolution methods. To utilize the location-aware SpatialCTD reference, we propose a graph neural network-based deconvolution method (i.e., GNNDeconvolver). Extensive experiments show that GNNDeconvolver often outperforms existing state-of-the-art methods by a substantial margin, without requiring scRNA-seq data. To enable comprehensive evaluations of spatial transcriptomics data from flexible protocols, we provide an online tool capable of converting spatial transcriptomic data from various platforms (e.g., 10× Visium, MERFISH, and sci-Space) into pseudo spots, featuring adjustable spot size. The SpatialCTD dataset and GNNDeconvolver implementation are available at https://github.com/OmicsML/SpatialCTD, and the online converter tool can be accessed at https://omicsml.github.io/SpatialCTD/.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"871-885"},"PeriodicalIF":1.4,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141906766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SuperTAD-Fast: Accelerating Topologically Associating Domains Detection Through Discretization.","authors":"Zhao Ling, Yu Wei Zhang, Shuai Cheng Li","doi":"10.1089/cmb.2024.0490","DOIUrl":"10.1089/cmb.2024.0490","url":null,"abstract":"<p><p>High-throughput chromosome conformation capture (Hi-C) technology captures spatial interactions of DNA sequences into matrices, and software tools are developed to identify topologically associating domains (TADs) from the Hi-C matrices. With structural information theory, SuperTAD adopted a dynamic programming approach to find the TAD hierarchy with minimal structural entropy. However, the algorithm suffers from high time complexity. To accelerate this algorithm, we design and implement an approximation algorithm with a theoretical performance guarantee. We implemented a package, SuperTAD-Fast. Using Hi-C matrices and simulated data, we demonstrated that SuperTAD-Fast achieved great runtime improvement compared with SuperTAD. SuperTAD-Fast shows high consistency and significant enrichment of structural proteins from Hi-C data of human cell lines in comparison with the existing six hierarchical TADs detecting methods.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"784-796"},"PeriodicalIF":1.4,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141759026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiao-Yan Sun, Zhen-Jie Hou, Wen-Guang Zhang, Yan Chen, Hai-Bin Yao
{"title":"HTFSMMA: Higher-Order Topological Guided Small Molecule-MicroRNA Associations Prediction.","authors":"Xiao-Yan Sun, Zhen-Jie Hou, Wen-Guang Zhang, Yan Chen, Hai-Bin Yao","doi":"10.1089/cmb.2024.0587","DOIUrl":"10.1089/cmb.2024.0587","url":null,"abstract":"<p><p>Small molecules (SMs) play a pivotal role in regulating microRNAs (miRNAs). Existing prediction methods for associations between SM-miRNA have overlooked crucial aspects: the incorporation of local topological features between nodes, which represent either SMs or miRNAs, and the effective fusion of node features with topological features. This study introduces a novel approach, termed high-order topological features for SM-miRNA association prediction (HTFSMMA), which specifically addresses these limitations. Initially, an association graph is formed by integrating SM-miRNA association data, SM similarity, and miRNA similarity. Subsequently, we focus on the local information of links and propose target neighborhood graph convolutional network for extracting local topological features. Then, HTFSMMA employs graph attention networks to amalgamate these local features, thereby establishing a platform for the acquisition of high-order features through random walks. Finally, the extracted features are integrated into the multilayer perceptron to derive the association prediction scores. To demonstrate the performance of HTFSMMA, we conducted comprehensive evaluations including five-fold cross-validation, leave-one-out cross-validation (LOOCV), SM-fixed local LOOCV, and miRNA-fixed local LOOCV. The area under receiver operating characteristic curve values were 0.9958 ± 0.0024 (0.8722 ± 0.0021), 0.9986 (0.9504), 0.9974 (0.9111), and 0.9977 (0.9074), respectively. Our findings demonstrate the superior performance of HTFSMMA over existing approaches. In addition, three case studies and the DeLong test have confirmed the effectiveness of the proposed method. These results collectively underscore the significance of HTFSMMA in facilitating the inference of associations between SMs and miRNAs.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"886-906"},"PeriodicalIF":1.4,"publicationDate":"2024-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141897568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
JūratĖ ŠaltytĖ Benth, Fred Espen Benth, Espen Rostrup Nakstad
{"title":"Nearly Instantaneous Time-Varying Reproduction Number for Contagious Diseases-a Direct Approach Based on Nonlinear Regression.","authors":"JūratĖ ŠaltytĖ Benth, Fred Espen Benth, Espen Rostrup Nakstad","doi":"10.1089/cmb.2023.0414","DOIUrl":"10.1089/cmb.2023.0414","url":null,"abstract":"<p><p>While the world recovers from the COVID-19 pandemic, another outbreak of contagious disease remains the most likely future risk to public safety. Now is therefore the time to equip health authorities with effective tools to ensure they are operationally prepared for future events. We propose a direct approach to obtain reliable nearly instantaneous time-varying reproduction numbers for contagious diseases, using only the number of infected individuals as input and utilising the dynamics of the susceptible-infected-recovered (SIR) model. Our approach is based on a multivariate nonlinear regression model simultaneously assessing parameters describing the transmission and recovery rate as a function of the SIR model. Shortly after start of a pandemic, our approach enables estimation of daily reproduction numbers. It avoids numerous sources of additional variation and provides a generic tool for monitoring the instantaneous reproduction numbers. We use Norwegian COVID-19 data as case study and demonstrate that our results are well aligned with changes in the number of infected individuals and the change points following policy interventions. Our estimated reproduction numbers are notably less volatile, provide more credible short-time predictions for the number of infected individuals, and are thus clearly favorable compared with the results obtained by two other popular approaches used for monitoring a pandemic. The proposed approach contributes to increased preparedness to future pandemics of contagious diseases, as it can be used as a simple yet powerful tool to monitor the pandemics, provide short-term predictions, and thus support decision making regarding timely and targeted control measures.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"727-741"},"PeriodicalIF":1.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141457118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M Tamilarasi, S Kumarganesh, K Martin Sagayam, J Andrew
{"title":"Detection and Segmentation of Glioma Tumors Utilizing a UNet Convolutional Neural Network Approach with Non-Subsampled Shearlet Transform.","authors":"M Tamilarasi, S Kumarganesh, K Martin Sagayam, J Andrew","doi":"10.1089/cmb.2023.0339","DOIUrl":"10.1089/cmb.2023.0339","url":null,"abstract":"<p><p>The prompt and precise identification and delineation of tumor regions within glioma brain images are critical for mitigating the risks associated with this life-threatening ailment. In this study, we employ the UNet convolutional neural network (CNN) architecture for glioma tumor detection. Our proposed methodology comprises a transformation module, a feature extraction module, and a tumor segmentation module. The spatial domain representation of brain magnetic resonance imaging images undergoes decomposition into low- and high-frequency subbands via a non-subsampled shearlet transform. Leveraging the selective and directive characteristics of this transform enhances the classification efficacy of our proposed system. Shearlet features are extracted from both low- and high-frequency subbands and subsequently classified using the UNet-CNN architecture to identify tumor regions within glioma brain images. We validate our proposed glioma tumor detection methodology using publicly available datasets, namely Brain Tumor Segmentation (BRATS) 2019 and The Cancer Genome Atlas (TCGA). The mean classification rates achieved by our system are 99.1% for the BRATS 2019 dataset and 97.8% for the TCGA dataset. Furthermore, our system demonstrates notable performance metrics on the BRATS 2019 dataset, including 98.2% sensitivity, 98.7% specificity, 98.9% accuracy, 98.7% intersection over union, and 98.5% disc similarity coefficient. Similarly, on the TCGA dataset, our system achieves 97.7% sensitivity, 98.2% specificity, 98.7% accuracy, 98.6% intersection over union, and 98.4% disc similarity coefficient. Comparative analysis against state-of-the-art methods underscores the efficacy of our proposed glioma brain tumor detection approach.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"757-768"},"PeriodicalIF":1.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141457116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Combining Complementarity and Binding Energetics in the Assessment of Protein Interactions: EnCPdock-A Practical Manual.","authors":"Gargi Biswas, Debasish Mukherjee, Sankar Basu","doi":"10.1089/cmb.2024.0554","DOIUrl":"10.1089/cmb.2024.0554","url":null,"abstract":"<p><p>The combined effect of shape and electrostatic complementarities (Sc, EC) at the interface of the interacting protein partners (PPI) serves as the physical basis for such associations and is a strong determinant of their binding energetics. EnCPdock (https://www.scinetmol.in/EnCPdock/) presents a comprehensive web platform for the direct conjoint comparative analyses of complementarity and binding energetics in PPIs. It elegantly interlinks the dual nature of local (Sc) and nonlocal complementarity (EC) in PPIs using the complementarity plot. It further derives an AI-based ΔG<sub>binding</sub> with a prediction accuracy comparable to the <i>state of the art</i>. This book chapter presents a practical manual to conceptualize and implement EnCPdock with its various features and functionalities, collectively having the potential to serve as a valuable protein engineering tool in the design of novel protein interfaces.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"769-781"},"PeriodicalIF":1.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141419324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NPI-DCGNN: An Accurate Tool for Identifying ncRNA-Protein Interactions Using a Dual-Channel Graph Neural Network.","authors":"Xin Zhang, Liangwei Zhao, Ziyi Chai, Hao Wu, Wei Yang, Chen Li, Yu Jiang, Quanzhong Liu","doi":"10.1089/cmb.2023.0449","DOIUrl":"10.1089/cmb.2023.0449","url":null,"abstract":"<p><p>Noncoding RNA (NcRNA)-protein interactions (NPIs) play fundamentally important roles in carrying out cellular activities. Although various predictors based on molecular features and graphs have been published to boost the identification of NPIs, most of them often ignore the information between known NPIs or exhibit insufficient learning ability from graphs, posing a significant challenge in effectively identifying NPIs. To develop a more reliable and accurate predictor for NPIs, in this article, we propose NPI-DCGNN, an end-to-end NPI predictor based on a dual-channel graph neural network (DCGNN). NPI-DCGNN initially treats the known NPIs as an ncRNA-protein bipartite graph. Subsequently, for each ncRNA-protein pair, NPI-DCGNN extracts two local subgraphs centered around the ncRNA and protein, respectively, from the bipartite graph. After that, it utilizes a dual-channel graph representation learning layer based on GNN to generate high-level feature representations for the ncRNA-protein pair. Finally, it employs a fully connected network and output layer to predict whether an interaction exists between the pair of ncRNA and protein. Experimental results on four experimentally validated datasets demonstrate that NPI-DCGNN outperforms several state-of-the-art NPI predictors. Our case studies on the NPInter database further demonstrate the prediction power of NPI-DCGNN in predicting NPIs. With the availability of the source codes (https://github.com/zhangxin11111/NPI-DCGNN), we anticipate that NPI-DCGNN could facilitate the studies of ncRNA interactome by providing highly reliable NPI candidates for further experimental validation.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"742-756"},"PeriodicalIF":1.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141457119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"QMix: An Efficient Program to Automatically Estimate Multi-Matrix Mixture Models for Amino Acid Substitution Process.","authors":"Nguyen Huy Tinh, Cuong Cao Dang, Le Sy Vinh","doi":"10.1089/cmb.2023.0403","DOIUrl":"10.1089/cmb.2023.0403","url":null,"abstract":"<p><p>The single-matrix amino acid (AA) substitution models are widely used in phylogenetic analyses; however, they are unable to properly model the heterogeneity of AA substitution rates among sites. The multi-matrix mixture models can handle the site rate heterogeneity and outperform the single-matrix models. Estimating multi-matrix mixture models is a complex process and no computer program is available for this task. In this study, we implemented a computer program of the so-called QMix based on the algorithm of LG4X and LG4M with several enhancements to automatically estimate multi-matrix mixture models from large datasets. QMix employs QMaker algorithm instead of XRATE algorithm to accurately and rapidly estimate the parameters of models. It is able to estimate mixture models with different number of matrices and supports multi-threading computing to efficiently estimate models from thousands of genes. We re-estimate mixture models LG4X and LG4M from 1471 HSSP alignments. The re-estimated models (HP4X and HP4M) are slightly better than LG4X and LG4M in building maximum likelihood trees from HSSP and TreeBASE datasets. QMix program required about 10 hours on a computer with 18 cores to estimate a mixture model with four matrices from 200 HSSP alignments. It is easy to use and freely available for researchers.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"703-707"},"PeriodicalIF":1.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141300786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating Haplotype Structure and Frequencies: A Bayesian Approach to Unknown Design in Pooled Genomic Data.","authors":"Yuexuan Wang, Ritabrata Dutta, Andreas Futschik","doi":"10.1089/cmb.2023.0211","DOIUrl":"10.1089/cmb.2023.0211","url":null,"abstract":"<p><p>The estimation of haplotype structure and frequencies provides crucial information about the composition of genomes. Techniques, such as single-individual haplotyping, aim to reconstruct individual haplotypes from diploid genome sequencing data. However, our focus is distinct. We address the challenge of reconstructing haplotype structure and frequencies from pooled sequencing samples where multiple individuals are sequenced simultaneously. A frequentist method to address this issue has recently been proposed. In contrast to this and other methods that compute point estimates, our proposed Bayesian hierarchical model delivers a posterior that permits us to also quantify uncertainty. Since matching permutations in both haplotype structure and corresponding frequency matrix lead to the same reconstruction of their product, we introduce an order-preserving shrinkage prior that ensures identifiability with respect to permutations. For inference, we introduce a blocked Gibbs sampler that enforces the required constraints. In a simulation study, we assessed the performance of our method. Furthermore, by using our approach on two distinct sets of real data, we demonstrate that our Bayesian approach can reconstruct the dominant haplotypes in a challenging, high-dimensional set-up.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"708-726"},"PeriodicalIF":1.4,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141492186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BiRNN-DDI: A Drug-Drug Interaction Event Type Prediction Model Based on Bidirectional Recurrent Neural Network and Graph2Seq Representation.","authors":"GuiShen Wang, Hui Feng, Chen Cao","doi":"10.1089/cmb.2024.0476","DOIUrl":"https://doi.org/10.1089/cmb.2024.0476","url":null,"abstract":"<p><p>Research on drug-drug interaction (DDI) prediction, particularly in identifying DDI event types, is crucial for understanding adverse drug reactions and drug combinations. This work introduces a Bidirectional Recurrent Neural Network model for DDI event type prediction (BiRNN-DDI), which simultaneously considers structural relationships and contextual information. Our BiRNN-DDI model constructs drug feature graphs to mine structural relationships. For contextual information, it transforms drug graphs into sequences and employs a two-channel structure, integrating BiRNN, to obtain contextual representations of drug-drug pairs. The model's effectiveness is demonstrated through comparisons with state-of-the-art models on two DDI event-type benchmarks. Extensive experimental results reveal that BiRNN-DDI surpasses other models in accuracy, AUPR, AUC, F1 score, Precision, and Recall metrics on both small and large datasets. Additionally, our model exhibits a lower parameter space, indicating more efficient learning of drug feature representations and prediction of potential DDI event types.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141759025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}