IEEE/ACM Transactions on Computational Biology and Bioinformatics最新文献

筛选
英文 中文
Bi-SeqCNN: A Novel Light-Weight Bi-Directional CNN Architecture for Protein Function Prediction Bi-SeqCNN:用于蛋白质功能预测的新型轻量级双向 CNN 架构
IF 3.6 3区 生物学
IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-11 DOI: 10.1109/TCBB.2024.3426491
Vikash Kumar;Akshay Deepak;Ashish Ranjan;Aravind Prakash
{"title":"Bi-SeqCNN: A Novel Light-Weight Bi-Directional CNN Architecture for Protein Function Prediction","authors":"Vikash Kumar;Akshay Deepak;Ashish Ranjan;Aravind Prakash","doi":"10.1109/TCBB.2024.3426491","DOIUrl":"10.1109/TCBB.2024.3426491","url":null,"abstract":"Deep learning approaches, such as convolution neural networks (CNNs) and deep recurrent neural networks (RNNs), have been the backbone for predicting protein function, with promising state-of-the-art (SOTA) results. RNNs with an in-built ability (i) focus on past information, (ii) collect both \u0000<i>short-and-long</i>\u0000 range dependency information, and (iii) bi-directional processing offers a strong sequential processing mechanism. CNNs, however, are confined to focusing on \u0000<i>short-term</i>\u0000 information from both the past and the future, although they offer parallelism. Therefore, a novel \u0000<i>bi-directional CNN</i>\u0000 that strictly complies with the sequential processing mechanism of RNNs is introduced and is used for developing a protein function prediction framework, Bi-SeqCNN. This is a sub-sequence-based framework. Further, Bi-SeqCNN\u0000<inline-formula><tex-math>$^+$</tex-math></inline-formula>\u0000 is an ensemble approach to better the prediction results. To our knowledge, this is the first time \u0000<i>bi-directional CNNs</i>\u0000 are employed for general temporal data analysis and not just for protein sequences. The proposed architecture produces improvements up to +5.5% over contemporary SOTA methods on three benchmark protein sequence datasets. Moreover, it is substantially lighter and attain these results with (0.50–0.70 times) fewer parameters than the SOTA methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1922-1933"},"PeriodicalIF":3.6,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141590210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SCRN: Single-Cell Gene Regulatory Network Identification in Alzheimer's Disease SCRN:阿尔茨海默病的单细胞基因调控网络鉴定。
IF 3.6 3区 生物学
IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-08 DOI: 10.1109/TCBB.2024.3424400
Wentao Zhu;Zhiqiang Du;Ziang Xu;Defu Yang;Minghan Chen;Qianqian Song
{"title":"SCRN: Single-Cell Gene Regulatory Network Identification in Alzheimer's Disease","authors":"Wentao Zhu;Zhiqiang Du;Ziang Xu;Defu Yang;Minghan Chen;Qianqian Song","doi":"10.1109/TCBB.2024.3424400","DOIUrl":"10.1109/TCBB.2024.3424400","url":null,"abstract":"Alzheimer's disease (AD) is the most common neurodegenerative disease, and it consumes considerable medical resources with increasing number of patients every year. Mounting evidence show that the regulatory disruptions altering the intrinsic activity of genes in brain cells contribute to AD pathogenesis. To gain insights into the underlying gene regulation in AD, we proposed a graph learning method, Single-Cell based Regulatory Network (SCRN), to identify the regulatory mechanisms based on single-cell data. SCRN implements the γ-decaying heuristic link prediction based on graph neural networks and can identify reliable gene regulatory networks using locally closed subgraphs. In this work, we first performed UMAP dimension reduction analysis on single-cell RNA sequencing (scRNA-seq) data of AD and normal samples. Then we used SCRN to construct the gene regulatory network based on three well-recognized AD genes (APOE, CX3CR1, and P2RY12). Enrichment analysis of the regulatory network revealed significant pathways including NGF signaling, ERBB2 signaling, and hemostasis. These findings demonstrate the feasibility of using SCRN to uncover potential biomarkers and therapeutic targets related to AD.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1886-1896"},"PeriodicalIF":3.6,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141558630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improved Fuzzy Cognitive Maps for Gene Regulatory Networks Inference Based on Time Series Data 基于时间序列数据的基因调控网络推断的改进型模糊认知图。
IF 3.6 3区 生物学
IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-04 DOI: 10.1109/TCBB.2024.3423383
Marzieh Emadi;Farsad Zamani Boroujeni;Jamshid Pirgazi
{"title":"Improved Fuzzy Cognitive Maps for Gene Regulatory Networks Inference Based on Time Series Data","authors":"Marzieh Emadi;Farsad Zamani Boroujeni;Jamshid Pirgazi","doi":"10.1109/TCBB.2024.3423383","DOIUrl":"10.1109/TCBB.2024.3423383","url":null,"abstract":"Microarray data provide lots of information regarding gene expression levels. Due to the large amount of such data, their analysis requires sufficient computational methods for identifying and analyzing gene regulation networks; however, researchers in this field are faced with numerous challenges such as consideration for too many genes and at the same time, the limited number of samples and their noisy nature of the data. In this paper, a hybrid method base on fuzzy cognitive map and compressed sensing is used to identify interactions between genes. For this purpose, in inference of the gene regulation network, the Ensemble Kalman filtered compressed sensing is used to learn the fuzzy cognitive map. Using the Ensemble Kalman filter and compressed sensing, the fuzzy cognitive map will be robust against noise. The proposed algorithm is evaluated using several metrics and compared with several well know methods such as LASSOFCM, KFRegular, CMI2NI. The experimental results show that the proposed method outperforms methods proposed in recent years in terms of SSmean, Data Error and accuracy.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1816-1829"},"PeriodicalIF":3.6,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141534365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AnglesRefine: Refinement of 3D Protein Structures Using Transformer Based on Torsion Angles. AnglesRefine:利用基于扭转角的变换器完善三维蛋白质结构
IF 3.6 3区 生物学
IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-03 DOI: 10.1109/TCBB.2024.3422288
Lei Zhang, Junyong Zhu, Sheng Wang, Jie Hou, Dong Si, Renzhi Cao
{"title":"AnglesRefine: Refinement of 3D Protein Structures Using Transformer Based on Torsion Angles.","authors":"Lei Zhang, Junyong Zhu, Sheng Wang, Jie Hou, Dong Si, Renzhi Cao","doi":"10.1109/TCBB.2024.3422288","DOIUrl":"10.1109/TCBB.2024.3422288","url":null,"abstract":"<p><p>The goal of protein structure refinement is to enhance the precision of predicted protein models, particularly at the residue level of the local structure. Existing refinement approaches primarily rely on physics, whereas molecular simulation methods are resource-intensive and time-consuming. In this study, we employ deep learning methods to extract structural constraints from protein structure residues to assist in protein structure refinement. We introduce a novel method, AnglesRefine, which focuses on a protein's secondary structure and employs transformer to refine various protein structure angles (psi, phi, omega, CA_C_N_angle, C_N_CA_angle, N_CA_C_angle), ultimately generating a superior protein model based on the refined angles. We evaluate our approach against other cutting-edge methods using the CASP11-14 and CASP15 datasets. Experimental outcomes indicate that our method generally surpasses other techniques on the CASP11-14 test dataset, while performing comparably or marginally better on the CASP15 test dataset. Our method consistently demonstrates the least likelihood of model quality degradation, e.g., the degradation percentage of our method is less than 10%, while other methods are about 50%. Furthermore, as our approach eliminates the need for conformational search and sampling, it significantly reduces computational time compared to existing refinement methods.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141497925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Potential miRNA-Disease Associations Based on a Masked Graph Autoencoder 基于屏蔽图自动编码器的潜在 miRNA 与疾病关联预测
IF 3.6 3区 生物学
IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-02 DOI: 10.1109/TCBB.2024.3421924
Hailin Feng;Chenchen Ke;Quan Zou;Zhechen Zhu;Tongcun Liu
{"title":"Prediction of Potential miRNA-Disease Associations Based on a Masked Graph Autoencoder","authors":"Hailin Feng;Chenchen Ke;Quan Zou;Zhechen Zhu;Tongcun Liu","doi":"10.1109/TCBB.2024.3421924","DOIUrl":"10.1109/TCBB.2024.3421924","url":null,"abstract":"Biomedical evidence has demonstrated the relevance of microRNA (miRNA) dysregulation in complex human diseases, and determining the relationship between miRNAs and diseases can aid in the early detection and prevention of diseases. Traditional biological experimental methods have the disadvantages of high cost and low efficiency, which are well compensated by computational methods. However, many computational methods have the challenge of excessively focusing on the neighbor relationship, ignoring the structural information of the graph, and belittling the redundant information of the graph structure. This study proposed a computational model based on a graph-masking autoencoder named MGAEMDA. MGAEMDA is an asymmetric framework in which the encoder maps partially observed graphs into latent representations. The decoder reconstructs the masked structural information based on the edge and node levels and combines it with linear matrices to obtain the result. The empirical results on the two datasets reveal that the MGAEMDA model performs better than its counterparts. We also demonstrated the predictive performance of MGAEMDA using a case study of four diseases, and all the top 30 predicted miRNAs were validated in the database, providing further evidence of the excellent performance of the model.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1874-1885"},"PeriodicalIF":3.6,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141491810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph Convolutional Network With Self-Supervised Learning for Brain Disease Classification 基于自我监督学习的图卷积网络用于脑疾病分类
IF 3.6 3区 生物学
IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-02 DOI: 10.1109/TCBB.2024.3422152
Guangyu Wang;Ying Chu;Qianqian Wang;Limei Zhang;Lishan Qiao;Mingxia Liu
{"title":"Graph Convolutional Network With Self-Supervised Learning for Brain Disease Classification","authors":"Guangyu Wang;Ying Chu;Qianqian Wang;Limei Zhang;Lishan Qiao;Mingxia Liu","doi":"10.1109/TCBB.2024.3422152","DOIUrl":"10.1109/TCBB.2024.3422152","url":null,"abstract":"Brain functional network (BFN) analysis has become a popular method for identifying neurological diseases at their early stages and revealing sensitive biomarkers related to these diseases. Due to the fact that BFN is a graph with complex structure, graph convolutional networks (GCNs) can be naturally used in the identification of BFN, and can generally achieve an encouraging performance if given large amounts of training data. In practice, however, it is very difficult to obtain sufficient brain functional data, especially from subjects with brain disorders. As a result, GCNs usually fail to learn a reliable feature representation from limited BFNs, leading to overfitting issues. In this paper, we propose an improved GCN method to classify brain diseases by introducing a self-supervised learning (SSL) module for assisting the graph feature representation. We conduct experiments to classify subjects with mild cognitive impairment (MCI) and autism spectrum disorder (ASD) respectively from normal controls (NCs). Experimental results on two benchmark databases demonstrate that our proposed scheme tends to obtain higher classification accuracy than the baseline methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1830-1841"},"PeriodicalIF":3.6,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141491809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Structured Matrix Approximation for Robustness to Incomplete Biosequence Data 针对不完整生物序列数据的最佳结构化矩阵近似。
IF 3.6 3区 生物学
IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-01 DOI: 10.1109/TCBB.2024.3420903
Chris Salahub;Jeffrey Uhlmann
{"title":"Optimal Structured Matrix Approximation for Robustness to Incomplete Biosequence Data","authors":"Chris Salahub;Jeffrey Uhlmann","doi":"10.1109/TCBB.2024.3420903","DOIUrl":"10.1109/TCBB.2024.3420903","url":null,"abstract":"We propose a general method for optimally approximating an arbitrary matrix \u0000<inline-formula><tex-math>$mathbf {M}$</tex-math></inline-formula>\u0000 by a structured matrix \u0000<inline-formula><tex-math>$mathbf {T}$</tex-math></inline-formula>\u0000 (circulant, Toeplitz/Hankel, etc.) and examine its use for estimating the spectra of genomic linkage disequilibrium matrices. This application is prototypical of a variety of genomic and proteomic problems that demand robustness to incomplete biosequence information. We perform a simulation study and corroborative test of our method using real genomic data from the Mouse Genome Database (Bult et al., 2019). The results confirm the predicted utility of the method and provide strong evidence of its potential value to a wide range of bioinformatics applications. Our optimal general matrix approximation method is expected to be of independent interest to an even broader range of applications in applied mathematics and engineering.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2592-2597"},"PeriodicalIF":3.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141476497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ense-i6mA: Identification of DNA N6-Methyladenine Sites Using XGB-RFE Feature Selection and Ensemble Machine Learning Ense-i6mA:利用 XGB-RFE 特征选择和集合机器学习识别 DNA N6-甲基腺嘌呤位点。
IF 3.6 3区 生物学
IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-07-01 DOI: 10.1109/TCBB.2024.3421228
Xueqiang Fan;Bing Lin;Jun Hu;Zhongyi Guo
{"title":"Ense-i6mA: Identification of DNA N6-Methyladenine Sites Using XGB-RFE Feature Selection and Ensemble Machine Learning","authors":"Xueqiang Fan;Bing Lin;Jun Hu;Zhongyi Guo","doi":"10.1109/TCBB.2024.3421228","DOIUrl":"10.1109/TCBB.2024.3421228","url":null,"abstract":"DNA N\u0000<sup>6</sup>\u0000-methyladenine (6mA) is an important epigenetic modification that plays a vital role in various cellular processes. Accurate identification of the 6mA sites is fundamental to elucidate the biological functions and mechanisms of modification. However, experimental methods for detecting 6mA sites are high-priced and time-consuming. In this study, we propose a novel computational method, called Ense-i6mA, to predict 6mA sites. Firstly, five encoding schemes, i.e., one-hot encoding, gcContent, Z-Curve, \u0000<italic>K</i>\u0000-mer nucleotide frequency, and \u0000<italic>K</i>\u0000-mer nucleotide frequency with gap, are employed to extract DNA sequence features. Secondly, eXtreme gradient boosting coupled with recursive feature elimination is applied to remove noisy features for avoiding over-fitting, reducing computing time and complexity. Then, the best subset of features is fed into base-classifiers composed of Extra Trees, eXtreme Gradient Boosting, Light Gradient Boosting Machine, and Support Vector Machine. Finally, to minimize generalization errors, the prediction probabilities of the base-classifiers are aggregated by averaging for inferring the final 6mA sites results. We conduct experiments on two species, i.e., Arabidopsis thaliana and Drosophila melanogaster, to compare the performance of Ense-i6mA against the recent 6mA sites prediction methods. The experimental results demonstrate that the proposed Ense-i6mA achieves area under the receiver operating characteristic curve values of 0.967 and 0.968, accuracies of 91.4% and 92.0%, and Mathew's correlation coefficient values of 0.829 and 0.842 on two benchmark datasets, respectively, and outperforms several existing state-of-the-art methods.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1842-1854"},"PeriodicalIF":3.6,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141476496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Haplotype Frequency Inference From Pooled Genetic Data With a Latent Multinomial Model 利用潜在多项式模型从集合遗传数据中推断单倍型频率。
IF 3.6 3区 生物学
IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-06-28 DOI: 10.1109/TCBB.2024.3420430
Yong See Foo;Jennifer Flegg
{"title":"Haplotype Frequency Inference From Pooled Genetic Data With a Latent Multinomial Model","authors":"Yong See Foo;Jennifer Flegg","doi":"10.1109/TCBB.2024.3420430","DOIUrl":"10.1109/TCBB.2024.3420430","url":null,"abstract":"In genetic association studies, haplotype data provide more refined information than data about separate genetic markers. However, large-scale studies that genotype hundreds to thousands of individuals may only provide results of pooled data. Methods for inferring haplotype frequencies from pooled genetic data that scale well with pool size rely on a normal approximation, which we observe to produce unreliable inference when applied to real data. We illustrate cases where the approximation fails, due to the normal covariance matrix being near-singular. As an alternative to approximate methods, in this paper we propose two exact methods to infer haplotype frequencies from pooled genetic data based on a latent multinomial model, where the pooled results are considered integer combinations of latent, unobserved haplotype counts. One of our methods, latent count sampling via Markov bases, achieves approximately linear runtime with respect to pool size. Our exact methods produce more accurate inference over existing approximate methods for synthetic data and for haplotype data from the 1000 Genomes Project. We also demonstrate how our methods can be applied to time-series of pooled genetic data, as a proof of concept of how our methods are relevant to more complex hierarchical settings, such as spatiotemporal models.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1864-1873"},"PeriodicalIF":3.6,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141467650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tropical Density Estimation of Phylogenetic Trees 系统发生树的热带密度估计
IF 3.6 3区 生物学
IEEE/ACM Transactions on Computational Biology and Bioinformatics Pub Date : 2024-06-28 DOI: 10.1109/TCBB.2024.3420815
Ruriko Yoshida;David Barnhill;Keiji Miura;Daniel Howe
{"title":"Tropical Density Estimation of Phylogenetic Trees","authors":"Ruriko Yoshida;David Barnhill;Keiji Miura;Daniel Howe","doi":"10.1109/TCBB.2024.3420815","DOIUrl":"10.1109/TCBB.2024.3420815","url":null,"abstract":"Much evidence from biological theory and empirical data indicates that, gene trees, phylogenetic trees reconstructed from different genes (loci), do not have to have exactly the same tree topologies. Such incongruence between gene trees might be caused by some “unusual” evolutionary events, such as meiotic sexual recombination in eukaryotes or horizontal transfers of genetic material in prokaryotes. However, most of the gene trees are constrained by the tree topology of the underlying species tree, that is, the phylogenetic tree depicting the evolutionary history of the set of species under consideration. In order to discover “outlying” gene trees which do not follow the “main distribution(s)” of trees, we propose to apply the “tropical metric” with the max-plus algebra from tropical geometry to a non-parametric estimation of gene trees over the space of phylogenetic trees. In this research we apply the “tropical metric,” a well-defined metric over the space of phylogenetic trees under the max-plus algebra, to non-parametric estimation of gene trees distribution over the tree space. Kernel density estimator (KDE) is one of the most popular non-parametric estimation of a distribution from a given sample, and we propose an analogue of the classical KDE in the setting of tropical geometry with the tropical metric which measures the length of an intrinsic geodesic between trees over the tree space. We estimate the probability of an observed tree by empirical frequencies of nearby trees, with the level of influence determined by the tropical metric. Then, with simulated data generated from the multispecies coalescent model, we show that the non-parametric estimation of the gene tree distribution using the tropical metric performs better than one using the Billera-Holmes-Vogtmann (BHV) metric developed by Weyenberg et al. in terms of computational times and accuracy. We then apply it to Apicomplexa data.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"1855-1863"},"PeriodicalIF":3.6,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10577088","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141467651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信