Journal of Computational Biology最新文献

筛选
英文 中文
On Minimizers and Convolutional Filters: Theoretical Connections and Applications to Genome Analysis. 论最小化和卷积滤波器:基因组分析的理论联系与应用》。
IF 1.7 4区 生物学
Journal of Computational Biology Pub Date : 2024-05-01 Epub Date: 2024-04-30 DOI: 10.1089/cmb.2024.0483
Yun William Yu
{"title":"On Minimizers and Convolutional Filters: Theoretical Connections and Applications to Genome Analysis.","authors":"Yun William Yu","doi":"10.1089/cmb.2024.0483","DOIUrl":"10.1089/cmb.2024.0483","url":null,"abstract":"<p><p>\u0000 <b>Minimizers and convolutional neural networks (CNNs) are two quite distinct popular techniques that have both been employed to analyze categorical biological sequences. At face value, the methods seem entirely dissimilar. Minimizers use min-wise hashing on a rolling window to extract a single important k-mer feature per window. CNNs start with a wide array of randomly initialized convolutional filters, paired with a pooling operation, and then multiple additional neural layers to learn both the filters themselves and how they can be used to classify the sequence. In this study, our main result is a careful mathematical analysis of hash function properties showing that for sequences over a categorical alphabet, random Gaussian initialization of convolutional filters with max-pooling is equivalent to choosing a minimizer ordering such that selected k-mers are (in Hamming distance) far from the k-mers within the sequence but close to other minimizers. In empirical experiments, we find that this property manifests as decreased density in repetitive regions, both in simulation and on real human telomeres. We additionally train from scratch a CNN embedding of synthetic short-reads from the SARS-CoV-2 genome into 3D Euclidean space that locally recapitulates the linear sequence distance of the read origins, a modest step toward building a deep learning assembler, although it is at present too slow to be practical. In total, this article provides a partial explanation for the effectiveness of CNNs in categorical sequence analysis.<sup></sup></b>\u0000 </p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"381-395"},"PeriodicalIF":1.7,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140870311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Singular Value Decomposition-Based Penalized Multinomial Regression for Classifying Imbalanced Medulloblastoma Subgroups Using Methylation Data. 基于奇异值分解的惩罚性多项式回归利用甲基化数据对不平衡髓母细胞瘤亚组进行分类
IF 1.7 4区 生物学
Journal of Computational Biology Pub Date : 2024-05-01 Epub Date: 2024-05-14 DOI: 10.1089/cmb.2023.0198
Isra Mohammed, Murtada K Elbashir, Areeg S Faggad
{"title":"Singular Value Decomposition-Based Penalized Multinomial Regression for Classifying Imbalanced Medulloblastoma Subgroups Using Methylation Data.","authors":"Isra Mohammed, Murtada K Elbashir, Areeg S Faggad","doi":"10.1089/cmb.2023.0198","DOIUrl":"10.1089/cmb.2023.0198","url":null,"abstract":"<p><p><b>Medulloblastoma (MB) is a molecularly heterogeneous brain malignancy with large differences in clinical presentation. According to genomic studies, there are at least four distinct molecular subgroups of MB: sonic hedgehog (SHH), wingless/INT (WNT), Group 3, and Group 4. The treatment and outcomes depend on appropriate classification. It is difficult for the classification algorithms to identify these subgroups from an imbalanced MB genomic data set, where the distribution of samples among the MB subgroups may not be equal. To overcome this problem, we used singular value decomposition (SVD) and group lasso techniques to find DNA methylation probe features that maximize the separation between the different imbalanced MB subgroups. We used multinomial regression as a classification method to classify the four different molecular subgroups of MB using the reduced DNA methylation data. Coordinate descent is used to solve our loss function associated with the group lasso, which promotes sparsity. By using SVD, we were able to reduce the 321,174 probe features to just 200 features. Less than 40 features were successfully selected after applying the group lasso, which we then used as predictors for our classification models. Our proposed method achieved an average overall accuracy of 99% based on fivefold cross-validation technique. Our approach produces improved classification performance compared with the state-of-the-art methods for classifying MB molecular subgroups</b>.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"458-471"},"PeriodicalIF":1.7,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140943774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
More Is Faster: Why Population Size Matters in Biological Search. 越多越快:为什么生物搜索中种群数量很重要》(Why Population Size Matters in Biological Search.
IF 1.7 4区 生物学
Journal of Computational Biology Pub Date : 2024-05-01 Epub Date: 2024-05-16 DOI: 10.1089/cmb.2023.0296
Jannatul Ferdous, George Matthew Fricke, Melanie E Moses
{"title":"More Is Faster: Why Population Size Matters in Biological Search.","authors":"Jannatul Ferdous, George Matthew Fricke, Melanie E Moses","doi":"10.1089/cmb.2023.0296","DOIUrl":"10.1089/cmb.2023.0296","url":null,"abstract":"<p><p>\u0000 <b>Many biological scenarios have multiple cooperating searchers, and the timing of the initial first contact between any one of those searchers and its target is critically important. However, we are unaware of biological models that predict how long it takes for the first of many searchers to discover a target. We present a novel mathematical model that predicts initial first contact times between searchers and targets distributed at random in a volume. We compare this model with the extreme first passage time approach in physics that assumes an infinite number of searchers all initially positioned at the same location. We explore how the number of searchers, the distribution of searchers and targets, and the initial distances between searchers and targets affect initial first contact times. Given a constant density of uniformly distributed searchers and targets, the initial first contact time decreases linearly with both search volume and the number of searchers. However, given only a single target and searchers placed at the same starting location, the relationship between the initial first contact time and the number of searchers shifts from a linear decrease to a logarithmic decrease as the number of searchers grows very large. More generally, we show that initial first contact times can be dramatically faster than the average first contact times and that the initial first contact times decrease with the number of searchers, while the average search times are independent of the number of searchers. We suggest that this is an underappreciated phenomenon in biology and other collective search problems.</b>\u0000 </p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"429-444"},"PeriodicalIF":1.7,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140957699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Floor Is Lava: Halving Natural Genomes with Viaducts, Piers, and Pontoons. 地板是熔岩:用高架桥、桥墩和浮桥将自然基因组减半。
IF 1.4 4区 生物学
Journal of Computational Biology Pub Date : 2024-04-01 Epub Date: 2024-04-15 DOI: 10.1089/cmb.2023.0330
Leonard Bohnenkämper
{"title":"The Floor Is Lava: Halving Natural Genomes with Viaducts, Piers, and Pontoons.","authors":"Leonard Bohnenkämper","doi":"10.1089/cmb.2023.0330","DOIUrl":"10.1089/cmb.2023.0330","url":null,"abstract":"<p><p><b>Whole Genome Duplications (WGDs) are events that double the content and structure of a genome. In some organisms, multiple WGD events have been observed while loss of genetic material is a typical occurrence following a WGD event. The requirement of classic rearrangement models that every genetic marker has to occur exactly two times in a given problem instance, therefore, poses a serious restriction in this context. The Double</b>-<b>Cut and Join (DCJ) model is a simple and powerful model for the analysis of large structural rearrangements. After being extended to the DCJ-Indel model, capable of handling gains and losses of genetic material, research has shifted in recent years toward enabling it to handle natural genomes, for which no assumption about the distribution of markers has to be made. The traditional theoretical framework for studying WGD events is the Genome Halving Problem (GHP). While the GHP is solved for the DCJ model for genomes without losses, there are currently no exact algorithms utilizing the DCJ-Indel model that are able to handle natural genomes. In this work, we present a general view on the DCJ-Indel model that we apply to derive an exact polynomial time and space solution for the GHP on genomes with at most two genes per family before generalizing the problem to an integer linear program solution for natural genomes.</b></p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"31 4","pages":"294-311"},"PeriodicalIF":1.4,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11057688/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140848856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Computing the Bounds of the Number of Reticulations in a Tree-Child Network That Displays a Set of Trees. 计算显示一组树的树-子网络中网状结构的数量边界
IF 1.7 4区 生物学
Journal of Computational Biology Pub Date : 2024-04-01 Epub Date: 2024-01-29 DOI: 10.1089/cmb.2023.0309
Yufeng Wu, Louxin Zhang
{"title":"Computing the Bounds of the Number of Reticulations in a Tree-Child Network That Displays a Set of Trees.","authors":"Yufeng Wu, Louxin Zhang","doi":"10.1089/cmb.2023.0309","DOIUrl":"10.1089/cmb.2023.0309","url":null,"abstract":"<p><p>\u0000 <b>Phylogenetic network is an evolutionary model that uses a rooted directed acyclic graph (instead of a tree) to model an evolutionary history of species in which reticulate events (e.g., hybrid speciation or horizontal gene transfer) occurred. Tree-child network is a kind of phylogenetic network with structural constraints. Existing approaches for tree-child network reconstruction can be slow for large data. In this study, we present several computational approaches for bounding from below the number of reticulations in a tree-child network that displays a given set of rooted binary phylogenetic trees. In addition, we also present some theoretical results on bounding from above the number of reticulations. Through simulation, we demonstrate that the new lower bounds on the reticulation number for tree-child networks can practically be computed for large tree data. The bounds can provide estimates of reticulation for relatively large data.</b>\u0000 </p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"345-359"},"PeriodicalIF":1.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139576061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Asymmetric Cluster-Based Measures for Comparative Phylogenetics. 基于非对称聚类的比较系统发生学测量方法
IF 1.7 4区 生物学
Journal of Computational Biology Pub Date : 2024-04-01 Epub Date: 2024-04-17 DOI: 10.1089/cmb.2023.0338
Sanket Wagle, Alexey Markin, Paweł Górecki, Tavis K Anderson, Oliver Eulenstein
{"title":"Asymmetric Cluster-Based Measures for Comparative Phylogenetics.","authors":"Sanket Wagle, Alexey Markin, Paweł Górecki, Tavis K Anderson, Oliver Eulenstein","doi":"10.1089/cmb.2023.0338","DOIUrl":"10.1089/cmb.2023.0338","url":null,"abstract":"<p><p><b>Phylogenetic inference and reconstruction methods generate hypotheses on evolutionary history. Competing inference methods are frequently used, and the evaluation of the generated hypotheses is achieved using tree comparison costs. The Robinson</b>-<b>Foulds (RF) distance is a widely used cost to compare the topology of two trees, but this cost is sensitive to tree error and can overestimate tree differences. To overcome this limitation, a refined version of the RF distance called the Cluster Affinity (CA) distance was introduced. However, CA distances are symmetric and cannot compare different types of trees. These asymmetric comparisons occur when gene trees are compared with species trees, when disparate datasets are integrated into a supertree, or when tree comparison measures are used to infer a phylogenetic network. In this study, we introduce a relaxation of the original Affinity distance to compare heterogeneous trees called the asymmetric CA cost. We also develop a biologically interpretable cost, the Cluster Support cost that normalizes by cluster size across gene trees. The characteristics of these costs are similar to the symmetric CA cost. We describe efficient algorithms, derive the exact diameters, and use these to standardize the cost to be applicable in practice. These costs provide objective, fine-scale, and biologically interpretable values that can assess differences and similarities between phylogenetic trees.</b></p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"31 4","pages":"312-327"},"PeriodicalIF":1.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11057527/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140863219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RECOMB Satellite Conference on Comparative Genomics (RECOMB-CG 2023). RECOMB 比较基因组学卫星会议(RECOMB-CG 2023)。
IF 1.7 4区 生物学
Journal of Computational Biology Pub Date : 2024-04-01 Epub Date: 2024-04-09 DOI: 10.1089/cmb.2024.29113.tv
Tomas Vinar
{"title":"RECOMB Satellite Conference on Comparative Genomics (RECOMB-CG 2023).","authors":"Tomas Vinar","doi":"10.1089/cmb.2024.29113.tv","DOIUrl":"10.1089/cmb.2024.29113.tv","url":null,"abstract":"","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"31 4","pages":"275-276"},"PeriodicalIF":1.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140849793","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Fixed-Parameter Tractable Algorithm for Finding Agreement Cherry-Reduced Subnetworks in Level-1 Orchard Networks. 在一级果园网络中寻找协议樱桃还原子网络的固定参数可实现算法
IF 1.7 4区 生物学
Journal of Computational Biology Pub Date : 2024-04-01 Epub Date: 2023-12-20 DOI: 10.1089/cmb.2023.0317
Kaari Landry, Olivier Tremblay-Savard, Manuel Lafond
{"title":"A Fixed-Parameter Tractable Algorithm for Finding Agreement Cherry-Reduced Subnetworks in Level-1 Orchard Networks.","authors":"Kaari Landry, Olivier Tremblay-Savard, Manuel Lafond","doi":"10.1089/cmb.2023.0317","DOIUrl":"10.1089/cmb.2023.0317","url":null,"abstract":"<p><p><b>Phylogenetic networks are increasingly being considered better suited to represent the complexity of the evolutionary relationships between species. One class of phylogenetic networks that have received a lot of attention recently is the class of orchard networks, which is composed of networks that can be reduced to a single leaf using cherry reductions. Cherry reductions, also called cherry-picking operations, remove either a leaf of a simple cherry (sibling leaves sharing a parent) or a reticulate edge of a reticulate cherry (two leaves whose parents are connected by a reticulate edge). In this article, we present a fixed-parameter tractable algorithm to solve the problem of finding a maximum agreement cherry-reduced subnetwork (MACRS) between two rooted binary level-1 networks. This is the first exact algorithm proposed to solve the MACRS problem. As proven in an earlier work, there is a direct relationship between finding an MACRS and calculating a distance based on cherry operations. As a result, the proposed algorithm also provides a distance that can be used for the comparison of level-1 networks</b>.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"360-379"},"PeriodicalIF":1.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138830002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The k-Robinson-Foulds Dissimilarity Measures for Comparison of Labeled Trees. 用于比较标记树的 k-Robinson-Foulds 差异度量。
IF 1.7 4区 生物学
Journal of Computational Biology Pub Date : 2024-04-01 Epub Date: 2024-01-25 DOI: 10.1089/cmb.2023.0312
Elahe Khayatian, Gabriel Valiente, Louxin Zhang
{"title":"The <i>k</i>-Robinson-Foulds Dissimilarity Measures for Comparison of Labeled Trees.","authors":"Elahe Khayatian, Gabriel Valiente, Louxin Zhang","doi":"10.1089/cmb.2023.0312","DOIUrl":"10.1089/cmb.2023.0312","url":null,"abstract":"<p><p>\u0000 <b>Understanding the mutational history of tumor cells is a critical endeavor in unraveling the mechanisms that drive the onset and progression of cancer. Modeling tumor cell evolution with labeled trees motivates researchers to develop different measures to compare labeled trees. Although the Robinson-Foulds (RF) distance is widely used for comparing species trees, its applicability to labeled trees reveals certain limitations. This study introduces the <i>k</i>-RF dissimilarity measures, tailored to address the challenges of labeled tree comparison. The RF distance is succinctly expressed as <i>n</i>-RF in the space of labeled trees with <i>n</i> nodes. Like the RF distance, the <i>k</i>-RF is a pseudometric for multiset-labeled trees and becomes a metric in the space of 1-labeled trees. By setting <i>k</i> to a small value, the <i>k</i>-RF dissimilarity can capture analogous local regions in two labeled trees with different size or different labels.</b>\u0000 </p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"328-344"},"PeriodicalIF":1.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11057537/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139564180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Orthology and Paralogy Relationships at Transcript Level. 转录本水平的正交和旁系关系
IF 1.7 4区 生物学
Journal of Computational Biology Pub Date : 2024-04-01 Epub Date: 2024-04-16 DOI: 10.1089/cmb.2023.0400
Wend Yam D D Ouedraogo, Aida Ouangraoua
{"title":"Orthology and Paralogy Relationships at Transcript Level.","authors":"Wend Yam D D Ouedraogo, Aida Ouangraoua","doi":"10.1089/cmb.2023.0400","DOIUrl":"10.1089/cmb.2023.0400","url":null,"abstract":"<p><p>\u0000 <b>Eukaryotic genes undergo a mechanism called alternative processing, resulting in transcriptome diversity by allowing the production of multiple distinct transcripts from a gene. More than half of human genes are affected, and the resulting transcripts are highly conserved among orthologous genes of distinct species. In this work, we present the definition of orthology and paralogy between transcripts of homologous genes, together with an algorithm to compute clusters of conserved orthologous and paralogous transcripts. Gene-level homology relationships are utilized to define various types of homology relationships between transcripts originating from the same ancestral transcript. A Reciprocal Best Hits approach is employed to infer clusters of isoorthologous and recent paralogous transcripts. We applied this method to transcripts from simulated gene families as well as real gene families from the Ensembl-Compara database. The results are consistent with those from previous studies that compared orthologous gene transcripts. Furthermore, our findings provide evidence that searching for conserved transcripts between homologous genes, beyond the scope of orthologous genes, is likely to yield valuable information.</b>\u0000 </p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":"31 4","pages":"277-293"},"PeriodicalIF":1.7,"publicationDate":"2024-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140861411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信