{"title":"A Cross-Order Comparative Genomics Analysis of Insect Innate Immunity Suggests Niche-Specific Adaptation.","authors":"Triveni Shelke, Vanika Gupta, Ishaan Gupta","doi":"10.1177/15578666261426005","DOIUrl":"10.1177/15578666261426005","url":null,"abstract":"<p><p>Insects thrive in highly diverse environmental niches, exposing them to varying pathogens. This complex interaction has led to the development of a variety of defense mechanisms collectively termed as innate immunity. Innate immunity is the first line of defense against pathogens in most organisms. Despite the availability of genomic and protein data of insects, we do not understand how innate immunity has evolved. This study reports class-level analysis of innate immunity in <i>Insecta</i> spanning approximately 300 million years ago of evolution. We used the available data on 27 insect species of five predominant orders to track the evolutionary paths of innate immune proteins. We analyzed orthogroups and gene family dynamics to identify core conserved components and lineage-specific innovations in immune genes. Through orthogroup analysis, we find an asymmetrical or incoherent distribution of orthologs within the immune orthogroups among the orders that present intriguing events such as gene duplications, losses and functional diversification. For example, instances of missing orthologs in corresponding immune orthogroups were noted for specific orders, such as Defensin in <i>Lepidoptera</i>, transferrin in <i>Hemiptera,</i> and Cathepsin B in <i>Hymenoptera</i>. Lineage-specific conservation of antimicrobial peptides was observed in <i>Lepidoptera</i> and <i>Hymenoptera</i>. Interestingly, proteins like Peptidoglycan recognition protein (PGRP)-SA, Spatzle, and others have undergone major family expansion and contraction events in the insect species studied. By constructing phylogenies based on immune-related proteins and assessing the signatures of positive selection, we gained insights into diversification and adaptive evolution. Signatures of positive selection were only demonstrated by PGRP-SA and diptericin proteins, with a specific site under selection found in the latter. Our findings present a broad picture of how the five closely placed yet highly divergent orders of the class <i>Insecta</i> have maintained innate immunity in close reference to their ecological niches. These findings can have practical implications in strategizing pest management and insect conservation in the wild.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"439-455"},"PeriodicalIF":1.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147512495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martin C Frith, Patrick Boppert, Patrick Styll, Milot Mirdita
{"title":"Simple and Thorough Detection of Related Sequences with Position-Varying Probabilities of Substitutions, Insertions, and Deletions.","authors":"Martin C Frith, Patrick Boppert, Patrick Styll, Milot Mirdita","doi":"10.1177/15578666261428480","DOIUrl":"10.1177/15578666261428480","url":null,"abstract":"<p><p>One way to understand biology is by finding genetic sequences that are related to each other. Often, a family of related sequences has position-varying probabilities of substitutions, insertions, and deletions: we can use these to find distantly related sequences. There are popular software tools to do this, which all have limitations. They either do not use all probability evidence (e.g., PSI-BLAST, MMseqs2) or have excessive complexity and minor biases (e.g., HMMER). This complexity inhibits fertile development of alternative tools.This study describes a simplest reasonable way to find related sequences, making full use of position-varying probabilities. The algorithms likely use the fewest operations that such algorithms possibly could, so they are fast and simple. This has been implemented in prototype software named DUMMER (Dumb Uncomplicated Match ModelER). Its sensitivity and specificity are competitive with HMMER. It finds evidence that the human genome has many more relics of some ancient transposons, including LF-SINE, which was co-opted for various functions in common ancestors of all land vertebrates.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"420-438"},"PeriodicalIF":1.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147512471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Branching-Process Modeling of Homology Distribution in Salmonid Genomes.","authors":"Yue Zhang, David Sankoff","doi":"10.1177/15578666261426392","DOIUrl":"10.1177/15578666261426392","url":null,"abstract":"<p><p>Comparative analysis of sequence similarity distributions reveals evolutionary mechanisms shaping gene families. In Salmonidae, whole-genome duplication (WGD) and rapid speciation pose a challenge for modeling retained homologs and sequence divergence. We introduce a stochastic branching-process framework that models sequence similarity decay over evolutionary time and quantifies fractionation rates across successive duplication events. We derive moment-generating functions of pairwise similarity scores and carry out simulation-based validation. Applying our model to multiple salmonid genomes (Atlantic salmon, rainbow trout, Chinook salmon, …), we not only recapitulate observed bimodal similarity distributions, but we also quantify gene retention across evolutionary branches. Results indicate that the estimated fractionation rates for both WGDs (<math><mrow><mrow><msub><mrow><mi>μ</mi></mrow><mn>1</mn></msub></mrow><mo>,</mo><mrow><msub><mrow><mi>μ</mi></mrow><mn>2</mn></msub></mrow><mo>≈</mo><mn>0.0009</mn></mrow></math>-0.0013 per Myr) remain highly consistent across species and insensitive to synteny block size, supporting a conserved post-WGD gene loss dynamic. In contrast, lineage-specific differences in duplicate retention arise primarily in the temporal gap between duplication events rather than differences in instantaneous loss rates. These findings underscore the stability of fractionation dynamics and the critical role of structural genome decay in shaping retention patterns in salmonid evolution and sucker fish.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"558-572"},"PeriodicalIF":1.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147574269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CherryRed: A Software Implementation of Cherry Distance with a New Optimization and Heuristic.","authors":"Kaari Landry, Olivier Tremblay-Savard","doi":"10.1177/15578666261424919","DOIUrl":"10.1177/15578666261424919","url":null,"abstract":"<p><p>Representing complex evolutionary relationships, such as hybridization and horizontal gene transfer, increasingly requires phylogenetic networks (over phylogenetic trees). Methods of construction of such networks rely on a measure of difference (a distance) between them to identify discrepancies between the newly built networks and a reference. Here, we focus on the cherry distance, a newly developed distance based on the number of cherry operations required to transform one input network into the other. Our work takes an existing algorithm design to calculate cherry distance on level-1 orchards and refines it using a preprocessing filter that maps reticulated elements of the input networks. We also present a heuristic strategy, which operates on only the most promising substructures of the input. CherryRed is a new, publicly available Rust package, which includes both of these improvements. Using CherryRed, we experimentally show how effective our refinement to the exact algorithm is (and when it is most effective), and we show how our heuristic maintains a high degree of accuracy while making large runtime efficiency gains. Characteristics of cherry distance are explored as well, with experiments on a real data set from the Rose family. Particularly, we compare cherry distance with a network adaptation of the ubiquitous Robinson-Foulds (RF) distance on trees, the soft RF distance (softwired distance). We do so with a common rearrangement operation (rooted nearest-neighbor interchange) and a leaf-moving operation, to show a higher degree of sensitivity in cherry distance, and a natural reflection of the number of taxa that are impacted by changes in the network.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"518-534"},"PeriodicalIF":1.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147574233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A REsampling and Visual EvALuation Method to Detect and Map Local Model Violations During Biomolecular Sequence Analysis.","authors":"Meijun Gao, Kevin J Liu","doi":"10.1177/15578666261424921","DOIUrl":"10.1177/15578666261424921","url":null,"abstract":"<p><p>A fundamental assumption in phylogenetics and phylogenomics is that a single, global evolutionary model can adequately characterize the substitution processes operating across all sites in a molecular sequence alignment. However, this assumption is frequently violated in practice due to heterogeneity in evolutionary processes, leading to local model mis-specification and potential bias in downstream inference. While a variety of statistical and machine learning-based approaches have been developed to address this issue, these methods often rely on restrictive model assumptions or are designed for narrowly scoped applications, limiting their generalizability across diverse datasets and evolutionary contexts. Here, we present REVEAL (\"REsampling and Visual EvALuation\"), a general-purpose statistical framework for detecting and localizing model mis-specification in biomolecular sequence data. REVEAL operates without introducing additional assumptions beyond those inherent to standard global model-based analyses. It employs sequence-aware statistical resampling to construct a local support matrix along the sequence alignment, facilitating the identification of site-level model violations. Through extensive simulation experiments, we demonstrate that REVEAL achieves robust control of both type I and type II errors, with precision of <math><mrow><mn>90</mn><mi>%</mi></mrow></math> or greater and recall of <math><mrow><mn>85</mn><mi>%</mi></mrow></math> or greater across diverse evolutionary scenarios involving different sources of model heterogeneity, varying dataset sizes in terms of sequence length and number of taxa, and other experimental factors. We further apply REVEAL to genomic data from mouse and mosquito, uncovering localized model violations that are consistent with previously reported biological signals. These results establish REVEAL as a flexible and effective tool for evaluating model adequacy in phylogenetic and phylogenomic analyses.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"482-498"},"PeriodicalIF":1.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146258234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exact and Asymptotic Counts of Binary Phylogenetic Networks with a Few Reticulation Events.","authors":"Hao Yu, Michael Fuchs, Guan-Ru Yu, Louxin Zhang","doi":"10.1177/15578666251400786","DOIUrl":"10.1177/15578666251400786","url":null,"abstract":"<p><p>Phylogenetic networks are powerful models for representing evolutionary history in studies of genome evolution involving reticulation events. The complex structure of phylogenetic networks makes their inference challenging, especially for large numbers of taxa. To better understand the full space of phylogenetic networks, we derive closed-form formulas for counting binary phylogenetic networks with two and three reticulation events. In addition, we also use our approach to give a simple proof of the known asymptotic result for a fixed number of reticulations.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"386-400"},"PeriodicalIF":1.6,"publicationDate":"2026-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146197671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast Hashing of Spaced Seeds with DuoHash.","authors":"Leonardo Gemin, Cinzia Pizzi, Matteo Comin","doi":"10.1177/15578666261423555","DOIUrl":"https://doi.org/10.1177/15578666261423555","url":null,"abstract":"<p><p>Many state-of-the-art tools for sequence analysis are based on alignment-free techniques to manage high-throughput processing. Several routine tasks such as querying, indexing, and similarity search are based on k-mer statistics. In order to accommodate errors or mutations, spaced seeds have been increasingly used instead of <i>k</i>-mers, enhancing sensitivity in various applications. However, spaced seed hashing is computationally intensive, introducing significant slowdown in the processing.This article addresses the challenge of efficient spaced seed hashing, which is functional for the computation of spaced k-mers counting. We present DuoHash, a framework that enables the efficient computation of hash functions for spaced seeds. DuoHash exploits an efficient spaced seed binary encoding and precomputed tables to speedup the computation of the hash value for both the forward and reverse strands of a DNA sequence. In our experiments, DuoHash substantially outperforms existing algorithms, achieving speedups of up to 11x on short reads with a spaced seed of medium density. Furthermore, we show the applicability of DuoHash to the problem of spaced k-mers counting. The code of DuoHash is available at https://github.com/CominLab/DuoHash/.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"15578666261423555"},"PeriodicalIF":1.6,"publicationDate":"2026-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147574217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Assessment of PHRED Score Characteristics in Illumina MiSeq Amplicon Sequencing.","authors":"Seth Sims, Yury Khudyakov, Alexander Zelikovsky","doi":"10.1177/15578666261436820","DOIUrl":"10.1177/15578666261436820","url":null,"abstract":"<p><p>PHRED scores are confidence values associated with each basecall generated by sequencers. The score is defined as a monotonic function of the probability that the basecall is incorrect. The calibration of PHRED scores has previously been examined by evaluating errors made in reading known sequences. We investigated the calibration of the Illumina MiSeq instrument PHRED model using data from a large dataset. We also derive calibration methods for the PHRED scores in datasets similar to those produced by the Global Hepatitis Outbreak and Surveillance Technology (GHOST). The GHOST protocol uses a short amplicon, resulting in many positions having two base calls, one coming from each of the paired reads. A maximum likelihood model of redundant base calls that match each other was used to estimate corrected probabilities of the PHRED scores. The PHRED scores showed only small absolute deviations from their target values. These differences are statistically significant deviations (<math><mrow><mi>p</mi><mo> </mo><mo><</mo><mo> </mo><mn>0.0001</mn></mrow></math>) from being calibrated. The accuracy of the scores varied significantly between the MiSeq instrument runs. Recalibration produced quality scores that improved Brier scores for the dataset by an average relative improvement of <math><mrow><mn>2.83</mn><mi>%</mi></mrow></math>. Methods developed to create calibration curves for PHRED scores will be useful in improving error-correction pipelines based on redundant deep sequencing of amplicon data. However, quality scores are relatively uninformative of substitution errors. The quality scores assigned are determined more by the global error rate of the sequencing run in the current machine cycle than by the characteristics of the specific base call.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"15578666261436820"},"PeriodicalIF":1.6,"publicationDate":"2026-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13051658/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147574222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FDS-CAP: Modeling Fragmented Disease Subgraphs with Component-Level Attention for Comorbidity Prediction.","authors":"Ashwag Altayyar, Li Liao","doi":"10.1177/15578666261427466","DOIUrl":"https://doi.org/10.1177/15578666261427466","url":null,"abstract":"<p><p>Understanding comorbidity between human diseases is essential for uncovering shared pathophysiological mechanisms and improving diagnostic and therapeutic strategies. Although prior studies have investigated genetic and network-based disease associations, they often overlook the fragmented nature of disease modules within the human interactome. To address this limitation, we introduce Fragmented Disease Subgraphs with Component-Level Attention for Comorbidity Prediction (FDS-CAP), a novel graph-based deep learning framework. FDS-CAP first embeds fragmented disease subgraphs using Subgraph Neural Networks (SUBGNN) with component-level attention, then applies a variational comorbidity predictor built upon a Variational Graph Auto-Encoder that is used to predict comorbid disease associations within the Human Disease Network. SUBGNN encodes disease subgraphs by propagating information at the connected component level across three property-aware channels-capturing positional, neighborhood, and structural roles-and integrates a component-level attention mechanism that weighs each connected component based on its significance to the overall subgraph representation. A core contribution of our method is the attention-based aggregation of connected component embeddings, enabling more accurate and expressive disease representations that reflect the biological complexity in fragmented disease subgraphs for improved comorbidity prediction. FDS-CAP achieves state-of-the-art performance for comorbidity prediction on a benchmark dataset, with an AUROC of 0.966. We further illustrate its biological interpretability through a single representative case study on glioma, showing that attention-weighted subgraph components capture meaningful patterns associated with disease mechanisms.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"15578666261427466"},"PeriodicalIF":1.6,"publicationDate":"2026-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147512551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}