{"title":"An Algorithm to Calculate the <i>p</i>-Value of the Monge-Elkan Distance.","authors":"Petr Ryšavý, Filip Železný","doi":"10.1089/cmb.2024.0854","DOIUrl":"https://doi.org/10.1089/cmb.2024.0854","url":null,"abstract":"<p><p>The Monge-Elkan distance is a straightforward yet popular distance measure used to estimate the mutual similarity of two sets of objects. It was initially proposed in the field of databases, and it found broad usage in other fields. Nowadays, it is especially relevant to the analysis of new-generation sequencing data as it represents a measure of dissimilarity between genomes of two distinct organisms, particularly when applied to unassembled reads. This article provides an algorithm to calculate the <i>p</i>-value associated with the Monge-Elkan distance. Given the object-level null distribution, that is, the distribution of distances between independently and identically sampled objects such as reads, the method yields the null distribution of the Monge-Elkan distance, which in turn allows for calculating the <i>p</i>-value. We also demonstrate an application on sequencing data, where individual reads are compared by the Levenshtein distance.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144248164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Spatial-Correlated Multitask Linear Mixed-Effects Model for Imaging Genetics.","authors":"Zhibin Pu, Shufei Ge","doi":"10.1089/cmb.2024.0721","DOIUrl":"https://doi.org/10.1089/cmb.2024.0721","url":null,"abstract":"<p><p>Imaging genetics aims to uncover the hidden relationship between imaging quantitative traits (QTs) and genetic markers [e.g., single nucleotide polymorphism (SNP)] and brings valuable insights into the pathogenesis of complex diseases, such as cancers and cognitive disorders (e.g., Alzheimer's disease). However, most linear models in imaging genetics did not explicitly model the inner relationship among QTs, which might miss some potential efficiency gains from information borrowing across brain regions. In this work, we developed a novel Bayesian regression framework for identifying significant associations between QTs and genetic markers while explicitly modeling spatial dependency between QTs, with the main contributions as follows. First, we developed a spatial-correlated multitask linear mixed-effects model to account for dependencies between QTs. We incorporated a population-level mixed-effects term into the model, taking full advantage of the dependent structure of brain imaging-derived QTs. Second, we implemented the model in the Bayesian framework and derived a Markov chain Monte Carlo (MCMC) algorithm to achieve the model inference. Further, we incorporated the MCMC samples with the Cauchy combination test to examine the association between SNPs and QTs, which avoided computationally intractable multitest issues. The simulation studies indicated improved power of our proposed model compared with classical models where inner dependencies of QTs were not modeled. We also applied the new spatial model to an imaging dataset obtained from the Alzheimer's Disease Neuroimaging Initiative database (https://adni.loni.usc.edu). The implementation of our method is available at https://github.com/ZhibinPU/spatialmultitasklmm.git.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144234309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Janet B Jones-Oliveira, Hans-Joseph B Oliveira, Joseph S Oliveira, David A Dixon
{"title":"Using Partition Information Entropy to Computationally Rank Order Critical Subreactions in a Petri Net Model of a Biochemical Signaling Network.","authors":"Janet B Jones-Oliveira, Hans-Joseph B Oliveira, Joseph S Oliveira, David A Dixon","doi":"10.1089/cmb.2024.0849","DOIUrl":"https://doi.org/10.1089/cmb.2024.0849","url":null,"abstract":"<p><p>Improved computational methods to analyze the mathematical structure and function of biochemical networks are needed when the biomolecular connectivity is known but when a complete set of the equilibrium and rate constants may not be available. We use Petri nets, which are equivalently bipartite digraphs, to analyze the rule-based flow of information through the network. We present several computational improvements to Petri net modeling as an aid to improve this approach, previously limited by the combinatorics of network size and complexity. The generation of Petri nets using equations for three elemental stencils (molecular reaction, synthesis complex formation, and decomposition complex formation) has been automated. A set of finite probability measures is defined in terms of a partition information entropy, where the complete listing of unique minimal cycles (UMCs) of the Petri net provides the natural partitioning. This enables the ranking of the UMC listing that covers all possible information flows in the reaction network; the information entropy measure enables the identification of which UMCs are more significant than others. In terms of the information entropy, forward cycles are less surprising and carry less information entropy, whereas backward cycles carry more information entropy and serve as regulators by providing feedback to control the network. As the systems analyzed increase in size and complexity, the automatic rank ordering of the UMCs provides a mechanism to highlight the globally most important information without the need to make local simplifying modeling choices. The information entropy metric is also used to compute source-to-sink information costs and is related to knockout analyses. The hybrid Petri net approach shows the most important species and where it is easiest to disrupt or otherwise affect the network. As exemplar, the enhanced methodology is applied to a model of the initial subnetwork in the EGFR network.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144248128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Mapper Algorithm with Implicit Intervals and Its Optimization.","authors":"Yuyang Tao, Shufei Ge","doi":"10.1089/cmb.2024.0919","DOIUrl":"https://doi.org/10.1089/cmb.2024.0919","url":null,"abstract":"<p><p>The Mapper algorithm is an essential tool for visualizing complex, high-dimensional data in topological data analysis and has been widely used in biomedical research. It outputs a combinatorial graph whose structure encodes the shape of the data. However, the need for manual parameter tuning and fixed (implicit) intervals, along with fixed overlapping ratios, may impede the performance of the standard Mapper algorithm. Variants of the standard Mapper algorithms have been developed to address these limitations, yet most of them still require manual tuning of parameters. Additionally, many of these variants, including the standard version found in the literature, were built within a deterministic framework and overlooked the uncertainty inherent in the data. To relax these limitations, in this work, we introduce a novel framework that implicitly represents intervals through a hidden assignment matrix, enabling automatic parameter optimization via stochastic gradient descent (SGD). In this work, we develop a soft Mapper framework based on a Gaussian mixture model for flexible and implicit interval construction. We further illustrate the robustness of the soft Mapper algorithm by introducing the Mapper graph mode as a point estimation for the output graph. Moreover, a SGD algorithm with a specific topological loss function is proposed for optimizing parameters in the model. Both simulation and application studies demonstrate its effectiveness in capturing the underlying topological structures. In addition, the application to an RNA expression dataset obtained from the Mount Sinai/JJ Peters VA Medical Center Brain Bank successfully identifies a distinct subgroup of Alzheimer's Disease. The implementation of our method is available at https://github.com/FarmerTao/Implicit-interval-Mapper.git.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144225638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CNB-MAC 2023 Special Issue.","authors":"Anna Ritz","doi":"10.1089/cmb.2025.0141","DOIUrl":"https://doi.org/10.1089/cmb.2025.0141","url":null,"abstract":"","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144208682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"FedOpenHAR: Federated Multitask Transfer Learning for Sensor-Based Human Activity Recognition.","authors":"Egemen İşgÜder, Özlem Durmaz İncel","doi":"10.1089/cmb.2024.0631","DOIUrl":"10.1089/cmb.2024.0631","url":null,"abstract":"<p><p>Wearable and mobile devices equipped with motion sensors offer important insights into user behavior. Machine learning and, more recently, deep learning techniques have been applied to analyze sensor data. Typically, the focus is on a single task, such as human activity recognition (HAR), and the data is processed centrally on a server or in the cloud. However, the same sensor data can be leveraged for multiple tasks, and distributed machine learning methods can be employed without the need for transmitting data to a central location. In this study, we introduce the FedOpenHAR framework, which explores federated transfer learning in a multitask setting for both sensor-based HAR and device position identification tasks. This approach utilizes transfer learning by training task-specific and personalized layers in a federated manner. The OpenHAR framework, which includes ten smaller datasets, is used for training the models. The main challenge is developing robust models that are applicable to both tasks across different datasets, which may contain only a subset of label types. Multiple experiments are conducted in the Flower federated learning environment using the DeepConvLSTM architecture. Results are presented for both federated and centralized training under various parameters and constraints. By employing transfer learning and training task-specific and personalized federated models, we achieve a higher accuracy (72.4%) compared to a fully centralized training approach (64.5%), and similar accuracy to a scenario where each client performs individual training in isolation (72.6%). However, the advantage of FedOpenHAR over individual training is that, when a new client joins with a new label type (representing a new task), it can begin training from the already existing common layer. Furthermore, if a new client wants to classify a new class in one of the existing tasks, FedOpenHAR allows training to begin directly from the task-specific layers.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"558-572"},"PeriodicalIF":1.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143972829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The 2nd International Workshop on Pattern Recognition in Healthcare Analytics 2023 Preface.","authors":"Inci M Baytas","doi":"10.1089/cmb.2025.0117","DOIUrl":"10.1089/cmb.2025.0117","url":null,"abstract":"","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"557"},"PeriodicalIF":1.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143985486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martí Cortada Garcia, Adrià Diéguez Moscardó, Marta Casanellas
{"title":"Generating Heterogeneous Data on Gene Trees.","authors":"Martí Cortada Garcia, Adrià Diéguez Moscardó, Marta Casanellas","doi":"10.1089/cmb.2024.0843","DOIUrl":"10.1089/cmb.2024.0843","url":null,"abstract":"<p><p>We introduce GenPhylo, a Python module that simulates nucleotide sequence data along a phylogeny avoiding the restriction of continuous-time Markov processes. GenPhylo uses directly a general Markov model and therefore naturally incorporates heterogeneity across lineages. We solve the challenge of generating transition matrices with a pre-given expected number of substitutions (the branch length information) by providing an algorithm that can be incorporated in other simulation software.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"626-630"},"PeriodicalIF":1.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143972956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generative Adversarial Networks for Neuroimage Translation.","authors":"Cassandra Czobit, Reza Samavi","doi":"10.1089/cmb.2024.0635","DOIUrl":"10.1089/cmb.2024.0635","url":null,"abstract":"<p><p>Image-to-image translation has gained popularity in the medical field to transform images from one domain to another. Medical image synthesis via domain transformation is advantageous in its ability to augment an image dataset where images for a given class are limited. From the learning perspective, this process contributes to the data-oriented robustness of the model by inherently broadening the model's exposure to more diverse visual data and enabling it to learn more generalized features. In the case of generating additional neuroimages, it is advantageous to obtain unidentifiable medical data and augment smaller annotated datasets. This study proposes the development of a cycle-consistent generative adversarial network (CycleGAN) model for translating neuroimages from one field strength to another (e.g., 3 Tesla [T] to 1.5 T). This model was compared with a model based on a deep convolutional GAN model architecture. CycleGAN was able to generate the synthetic and reconstructed images with reasonable accuracy. The mapping function from the source (3 T) to the target domain (1.5 T) performed optimally with an average peak signal-to-noise ratio value of 25.69 ± 2.49 dB and a mean absolute error value of 2106.27 ± 1218.37. The codes for this study have been made publicly available in the following GitHub repository.<sup>a</sup>.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"573-583"},"PeriodicalIF":1.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142894857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shunqin Zhang, Wei Kong, Shuaiqun Wang, Kai Wei, Kun Liu, Gen Wen, Yaling Yu
{"title":"Effective Integration of Single-Cell Multi-Omics Data Using Improved Network-Based Integrative Clustering with Multigraph Regularization.","authors":"Shunqin Zhang, Wei Kong, Shuaiqun Wang, Kai Wei, Kun Liu, Gen Wen, Yaling Yu","doi":"10.1089/cmb.2023.0460","DOIUrl":"10.1089/cmb.2023.0460","url":null,"abstract":"<p><p>The purpose of integrating different omics data is to study cellular heterogeneity at the level of transcriptional regulation from different gene levels, which can effectively identify cell types and reveal the pathogenesis of Alzheimer's disease (AD) from two perspectives. However, implementing such algorithms faces challenges such as high data noise levels, increased dimensionality, and computational complexity. In this study, multigraph regularization constraints were introduced in the network-based integrative clustering algorithm (MGR-NIC) to remove redundant features and keep the geometry structures underlying the data by fusing two types of data (snRNA-seq and snATAC-seq) of glial cells from AD samples. The effectiveness of the MGR-NIC algorithm was validated using both simulation datasets and real datasets derived from various tissues. The MGR-NIC algorithm can improve clustering accuracy by selecting features that better represent the dataset's structure. The clustering results obtained with the MGR-NIC algorithm show strong consistency with the clustering results inherent to the published DLPFC dataset, while the classification results generated using the NIC algorithm often lead to cluster overlap when applied to the DLPFC dataset. We will use the same state-of-the-art algorithms for a comprehensive evaluation with our proposed MGR-NIC algorithm, including NIC, scAI, Multi-Omics Factor Analysis v2, and JSNMF. MGR-NIC is the most stable and reliable method, implying its robustness across different datasets and its reliability in yielding consistent and accurate results.</p>","PeriodicalId":15526,"journal":{"name":"Journal of Computational Biology","volume":" ","pages":"601-614"},"PeriodicalIF":1.4,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144119822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}