IEEE/ACM Transactions on Computational Biology and Bioinformatics最新文献_第10页

Analyzing Large-Scale Single-Cell RNA-Seq Data Using Coreset 使用 Coreset 分析大规模单细胞 RNA-Seq 数据。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-06-24 DOI: 10.1109/TCBB.2024.3418078

Khalid Usman;Fangping Wan;Dan Zhao;Jian Peng;Jianyang Zeng

{"title":"Analyzing Large-Scale Single-Cell RNA-Seq Data Using Coreset","authors":"Khalid Usman;Fangping Wan;Dan Zhao;Jian Peng;Jianyang Zeng","doi":"10.1109/TCBB.2024.3418078","DOIUrl":"10.1109/TCBB.2024.3418078","url":null,"abstract":"The recent boom in single-cell sequencing technologies provides valuable insights into the transcriptomes of individual cells. Through single-cell data analyses, a number of biological discoveries, such as novel cell types, developmental cell lineage trajectories, and gene regulatory networks, have been uncovered. However, the massive and increasingly accumulated single-cell datasets have also posed a seriously computational and analytical challenge for researchers. To address this issue, one typically applies dimensionality reduction approaches to reduce the large-scale datasets. However, these approaches are generally computationally infeasible for tall matrices. In addition, the downstream data analysis tasks such as clustering still take a large time complexity even on the dimension-reduced datasets. We present single-cell Coreset (scCoreset), a data summarization framework that extracts a small weighted subset of cells from a huge sparse single-cell RNA-seq data to facilitate the downstream data analysis tasks. Single-cell data analyses run on the extracted subset yield similar results to those derived from the original uncompressed data. Tests on various single-cell datasets show that scCoreset outperforms the existing data summarization approaches for common downstream tasks such as visualization and clustering. We believe that scCoreset can serve as a useful plug-in tool to improve the efficiency of current single-cell RNA-seq data analyses.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1784-1793"},"PeriodicalIF":3.6,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141446052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BLAM6A-Merge: Leveraging Attention Mechanisms and Feature Fusion Strategies to Improve the Identification of RNA N6-Methyladenosine Sites BLAM6A-Merge：利用注意机制和特征融合策略改进 RNA N6-甲基腺苷位点的鉴定。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-06-24 DOI: 10.1109/TCBB.2024.3418490

Yunpeng Xia;Ying Zhang;Dian Liu;Yi-Heng Zhu;Zhikang Wang;Jiangning Song;Dong-Jun Yu

{"title":"BLAM6A-Merge: Leveraging Attention Mechanisms and Feature Fusion Strategies to Improve the Identification of RNA N6-Methyladenosine Sites","authors":"Yunpeng Xia;Ying Zhang;Dian Liu;Yi-Heng Zhu;Zhikang Wang;Jiangning Song;Dong-Jun Yu","doi":"10.1109/TCBB.2024.3418490","DOIUrl":"10.1109/TCBB.2024.3418490","url":null,"abstract":"RNA N6-methyladenosine is a prevalent and abundant type of RNA modification that exerts significant influence on diverse biological processes. To date, numerous computational approaches have been developed for predicting methylation, with most of them ignoring the correlations of different encoding strategies and failing to explore the adaptability of various attention mechanisms for methylation identification. To solve the above issues, we proposed an innovative framework for predicting RNA m6A modification site, termed BLAM6A-Merge. Specifically, it utilized a multimodal feature fusion strategy to combine the classification results of four features and Blastn tool. Apart from this, different attention mechanisms were employed for extracting higher-level features on specific features after the screening process. Extensive experiments on 12 benchmarking datasets demonstrated that BLAM6A-Merge achieved superior performance (average AUC: 0.849 for the full transcript mode and 0.784 for the mature mRNA mode). Notably, the Blastn tool was employed for the first time in the identification of methylation sites.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1803-1815"},"PeriodicalIF":3.6,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141446053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SeedHit: A GPU Friendly Pre-Align Filtering Algorithm SeedHit：GPU友好型预对齐过滤算法

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-06-21 DOI: 10.1109/TCBB.2024.3417517

Zhen Ju;Jingjing Zhang;Xuelei Li;Jintao Meng;Yanjie Wei

{"title":"SeedHit: A GPU Friendly Pre-Align Filtering Algorithm","authors":"Zhen Ju;Jingjing Zhang;Xuelei Li;Jintao Meng;Yanjie Wei","doi":"10.1109/TCBB.2024.3417517","DOIUrl":"10.1109/TCBB.2024.3417517","url":null,"abstract":"The amount of genetic data generated by Next Generation Sequencing (NGS) technologies grows faster than Moore's law. This necessitates the development of efficient NGS data processing and analysis algorithms. A filter before the computationally-costly analysis step can significantly reduce the run time of the NGS data analysis. As GPUs are orders of magnitude more powerful than CPUs, this paper proposes a GPU-friendly pre-align filtering algorithm named SeedHit for the fast processing of NGS data. Inspired by BLAST, SeedHit counts seed hits between two sequences to determine their similarity. In SeedHit, a nucleic acid in a gene sequence is presented in binary format. By packaging data and generating a lookup table that fits into the L1 cache, SeedHit is GPU-friendly and high-throughput. Using three 16 s rRNA datasets from Greengenes as input SeedHit can reject 84%–89% dissimilar sequence pairs on average when the similarity is 0.9–0.99. The throughput of SeedHit achieved 1 T/s (Tera base per second) on 3080 Ti. Compared with the other two GPU-based filtering algorithms, GateKeeper and SneakySnake, SeedHit has the highest rejection rate and throughput. By incorporating SeedHit into our in-house clustering algorithm nGIA, the modified nGIA achieved a 1.6–2.1 times speedup compared to the original version.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1794-1802"},"PeriodicalIF":3.6,"publicationDate":"2024-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141436850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DDI Prediction With Heterogeneous Information Network - Meta-Path Based Approach 利用异构信息网络进行 DDI 预测--基于元路径的方法。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-06-21 DOI: 10.1109/TCBB.2024.3417715

Farhan Tanvir;Khaled Mohammed Saifuddin;Muhammad Ifte Khairul Islam;Esra Akbas

引用次数: 0

Calculation of the Weight of Evidence for Combined Single-Cell and Extracellular Forensic DNA 计算单细胞和细胞外法医 DNA 的综合证据权重。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-06-19 DOI: 10.1109/TCBB.2024.3416877

Desmond S. Lun;Catherine M. Grgicak

{"title":"Calculation of the Weight of Evidence for Combined Single-Cell and Extracellular Forensic DNA","authors":"Desmond S. Lun;Catherine M. Grgicak","doi":"10.1109/TCBB.2024.3416877","DOIUrl":"10.1109/TCBB.2024.3416877","url":null,"abstract":"The weight of DNA evidence for forensic applications is typically assessed through the calculation of the likelihood ratio (LR). In the standard workflow, DNA is extracted from a collection of cells where the cells of an unknown number of donors are mixed. The DNA is then genotyped, and the LR is calculated through well-established methods. Recently, a method for calculating the LR from single-cell data has been presented. Rather than extracting the DNA while the cells are still mixed, single-cell data is procured by first isolating each cell. Extraction and fragment analysis of relevant forensic loci follows such that individual cells are genotyped. This workflow leads to significantly stronger weights of evidence, but it does not account for extracellular DNA that could also be present in the sample. In this paper, we present a method for calculation of an LR that combines single-cell and extracellular data. We demonstrate the calculation on example data and show that the combined LR can lead to stronger conclusions than would be obtained from calculating LRs on the single-cell and extracellular DNA separately.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2587-2591"},"PeriodicalIF":3.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141426798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Intra-Inter Graph Representation Learning for Protein-Protein Binding Sites Prediction 用于蛋白质-蛋白质结合位点预测的内部图表示学习。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-06-19 DOI: 10.1109/TCBB.2024.3416341

Wenting Zhao;Gongping Xu;Long Wang;Zhen Cui;Tong Zhang;Jian Yang

{"title":"Intra-Inter Graph Representation Learning for Protein-Protein Binding Sites Prediction","authors":"Wenting Zhao;Gongping Xu;Long Wang;Zhen Cui;Tong Zhang;Jian Yang","doi":"10.1109/TCBB.2024.3416341","DOIUrl":"10.1109/TCBB.2024.3416341","url":null,"abstract":"Graph neural networks have drawn increasing attention and achieved remarkable progress recently due to their potential applications for a large amount of irregular data. It is a natural way to represent protein as a graph. In this work, we focus on protein-protein binding sites prediction between the ligand and receptor proteins. Previous work just simply adopts graph convolution to learn residue representations of ligand and receptor proteins, then concatenates them and feeds the concatenated representation into a fully connected layer to make predictions, losing much of the information contained in complexes and failing to obtain an optimal prediction. In this paper, we present Intra-Inter Graph Representation Learning for protein-protein binding sites prediction (IIGRL). Specifically, for intra-graph learning, we maximize the mutual information between local node representation and global graph summary to encourage node representation to embody the global information of protein graph. Then we explore fusing two separate ligand and receptor graphs as a whole graph and learning affinities between their residues/nodes to propagate information to each other, which could effectively capture inter-protein information and further enhance the discrimination of residue pairs. Extensive experiments on multiple benchmarks demonstrate that the proposed IIGRL model outperforms state-of-the-art methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1685-1696"},"PeriodicalIF":3.6,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141426799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SAGCN: Using Graph Convolutional Network With Subgraph-Aware for circRNA-Drug Sensitivity Identification SAGCN：使用具有子图感知功能的图卷积网络进行 circRNA 药物敏感性识别。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-06-17 DOI: 10.1109/TCBB.2024.3415058

Weicheng Sun;Chengjuan Ren;Jinsheng Xu;Ping Zhang

{"title":"SAGCN: Using Graph Convolutional Network With Subgraph-Aware for circRNA-Drug Sensitivity Identification","authors":"Weicheng Sun;Chengjuan Ren;Jinsheng Xu;Ping Zhang","doi":"10.1109/TCBB.2024.3415058","DOIUrl":"10.1109/TCBB.2024.3415058","url":null,"abstract":"Circular RNAs (circRNAs) play a significant role in cancer development and therapy resistance. There is substantial evidence indicating that the expression of circRNAs affects the sensitivity of cells to drugs. Identifying circRNAs-drug sensitivity association (CDA) is helpful for disease treatment and drug discovery. However, the identification of CDA through conventional biological experiments is both time-consuming and costly. Therefore, it is urgent to develop computational methods to predict CDA. In this study, we propose a new computational method, the subgraph-aware graph convolutional network (SAGCN), for predicting CDA. SAGCN first constructs a heterogeneous network composed of circRNA similarity network, drug similarity network, and circRNA-drug bipartite network. Then, a subgraph extractor is proposed to learn the latent subgraph structure of the heterogeneous network using a graph convolutional network. The extractor can capture 1-hop and 2-hop information and then a fusing attention mechanism is designed to integrate them adaptively. Simultaneously, a novel subgraph-aware attention mechanism is proposed to detect intrinsic subgraph structure. The final node feature representation is obtained to make the CDA prediction. Experimental results demonstrate that SAGCN obtained an average AUC of 0.9120 and AUPR of 0.8693, exceeding the performance of the most advanced models under 10-fold cross-validation. Case studies have demonstrated the potential of SAGCN in identifying associations between circRNA and drug sensitivity.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1765-1774"},"PeriodicalIF":3.6,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141418756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Recursive Self-Composite Approach Toward Structural Understanding of Boolean Networks 实现布尔网络结构理解的递归自复合方法

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-06-17 DOI: 10.1109/TCBB.2024.3415352

Jongrae Kim;Woojeong Lee;Kwang-Hyun Cho

引用次数: 0

SIG: Graph-Based Cancer Subtype Stratification With Gene Mutation Structural Information SIG：利用基因突变结构信息进行基于图谱的癌症亚型分层。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-06-14 DOI: 10.1109/TCBB.2024.3414498

Chengcheng Zhang;Wei Li;Ming Deng;Yizhang Jiang;Xiaohui Cui;Ping Chen

{"title":"SIG: Graph-Based Cancer Subtype Stratification With Gene Mutation Structural Information","authors":"Chengcheng Zhang;Wei Li;Ming Deng;Yizhang Jiang;Xiaohui Cui;Ping Chen","doi":"10.1109/TCBB.2024.3414498","DOIUrl":"10.1109/TCBB.2024.3414498","url":null,"abstract":"Somatic tumors have a high-dimensional, sparse, and small sample size nature, making cancer subtype stratification based on somatic genomic data a challenge. Current methods for improving cancer clustering performance focus on dimension reduction, integrating multi-omics data, or generating realistic samples, yet ignore the associations between mutated genes within the patient-gene matrix. We refer to these associations as gene mutation structural information, which implicitly includes cancer subtype information and can enhance subtype clustering. We introduce a novel method for cancer subtype clustering called SIG(Structural Information within Graph). As cancer is driven by a combination of genes, we establish associations between mutated genes within the same patient sample, pair by pair, and use a graph to represent them. An association between two mutated genes corresponds to an edge in the graph. We then merge these associations among all mutated genes to obtain a structural information graph, which enriches the gene network and improves its relevance to cancer clustering. We integrate the somatic tumor genome with the enriched gene network and propagate it to cluster patients with mutations in similar network regions. Our method achieves superior clustering performance compared to SOTA methods, as demonstrated by clustering experiments on ovarian and LUAD datasets.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1752-1764"},"PeriodicalIF":3.6,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141320781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PRFold-TNN: Protein Fold Recognition With an Ensemble Feature Selection Method Using PageRank Algorithm Based on Transformer PRFold-TNN：使用基于变换器的 PageRank 算法的集合特征选择方法识别蛋白质折叠。

IF 3.6 3区生物学

IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-06-14 DOI: 10.1109/TCBB.2024.3414497

Xinyi Qin;Lu Zhang;Min Liu;Guangzhong Liu

{"title":"PRFold-TNN: Protein Fold Recognition With an Ensemble Feature Selection Method Using PageRank Algorithm Based on Transformer","authors":"Xinyi Qin;Lu Zhang;Min Liu;Guangzhong Liu","doi":"10.1109/TCBB.2024.3414497","DOIUrl":"10.1109/TCBB.2024.3414497","url":null,"abstract":"Understanding the tertiary structures of proteins is of great benefit to function in many aspects of human life. Protein fold recognition is a vital and salient means to know protein structure. Until now, researchers have successively proposed a variety of methods to realize protein fold recognition, but the novel and effective computational method is still needed to handle this problem with the continuous updating of protein structure databases. In this study, we develop a new protein structure dataset named AT and propose the PRFold-TNN model for protein fold recognition. First, different types of feature extraction methods including AAC, HMM, HMM-Bigram and ACC are selected to extract corresponding features for protein sequences. Then an ensemble feature selection method based on PageRank algorithm integrating various tree-based algorithms is used to screen the fusion features. Ultimately, the classifier based on the Transformer model achieves the final prediction. Experiments show that the prediction accuracy is 86.27% on the AT dataset and 88.91% on the independent test set, indicating that the model can demonstrate superior performance and generalization ability in the problem of protein fold recognition. Furthermore, we also carry out research on the DD, EDD and TG benchmark datasets, and make them achieve prediction accuracy of 88.41%, 97.91% and 95.16%, which are at least 3.0%, 0.8% and 2.5% higher than those of the state-of-the-art methods. It can be concluded that the PRFold-TNN model is more prominent.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1740-1751"},"PeriodicalIF":3.6,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141320780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0