BMC Bioinformatics最新文献

筛选
英文 中文
Cancer detection via one-shot learning: integrating gene expression and genomic mutation analysis. 通过一次性学习检测癌症:整合基因表达和基因组突变分析。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-10-06 DOI: 10.1186/s12859-025-06257-3
Alessia Petescia, Gerardo Benevento, Anna Falanga, Alessandro Macaro, Delfina Malandrino, Alberto Montefusco, Rosalinda Sorrentino, Rocco Zaccagnino
{"title":"Cancer detection via one-shot learning: integrating gene expression and genomic mutation analysis.","authors":"Alessia Petescia, Gerardo Benevento, Anna Falanga, Alessandro Macaro, Delfina Malandrino, Alberto Montefusco, Rosalinda Sorrentino, Rocco Zaccagnino","doi":"10.1186/s12859-025-06257-3","DOIUrl":"https://doi.org/10.1186/s12859-025-06257-3","url":null,"abstract":"<p><strong>Background: </strong>Cancer is a complex disease influenced by numerous concurrent genetic factors that result in diverse tumor microenvironments (TMEs) across different cancer types. Large-scale genomic projects, such as The Cancer Genome Atlas, have underscored the need for molecular classification of cancer to enable more precise therapeutic strategies. Yet, traditional machine learning (ML) approaches currently face several limitations. First, while effective, they predominantly rely on gene expression data and often overlook critical genomic alterations such as copy number alterations, single nucleotide polymorphisms, and other mutational profiles, limiting the scope of biomarker discovery. Most importantly, they are usually limited by the need of large sample sizes.</p><p><strong>Results: </strong>Building on the hypothesis that type-agnostic representations integrating gene expression with genomic mutations can comprehensively characterize TMEs and capture the similarity or dissimilarity between samples of the same or different types, we propose a novel ML-based method for cancer detection using a one-shot learning framework implemented through Siamese Neural Networks. Our method redefines cancer detection as a similarity-based classification task, allowing the model to generalize to unseen cancer types, a critical advantage in genomics where data scarcity and frequent updates pose significant challenges. To enhance interpretability, we introduce a robust explainability technique founded on SHapley Additive exPlanations (SHAP) values, to provide clear insights into the contributions of gene expression and mutational data, enabling a deeper understanding of the key factors driving cancer detection decisions.</p><p><strong>Conclusions: </strong>Our experimental results show that integrating mutational profiles with gene expression data allows for more accurate cancer type detection and reveals significant mutation patterns. These findings indicate that the proposed method has the potential to significantly enhance cancer type detection by leveraging a more comprehensive understanding of TMEs. Beyond merely classifying cancer types, the proposed SHAP-based explainability technique enables the identification and the analysis of key biomarkers relevant for immunotherapy success, thereby addressing limitations of existing approaches.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"239"},"PeriodicalIF":3.3,"publicationDate":"2025-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145237752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Statistically principled feature selection for single cell transcriptomics. 单细胞转录组学的统计学原则特征选择。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-10-02 DOI: 10.1186/s12859-025-06240-y
Emmanuel P Dollinger, Kai Silkwood, Scott Atwood, Qing Nie, Arthur D Lander
{"title":"Statistically principled feature selection for single cell transcriptomics.","authors":"Emmanuel P Dollinger, Kai Silkwood, Scott Atwood, Qing Nie, Arthur D Lander","doi":"10.1186/s12859-025-06240-y","DOIUrl":"10.1186/s12859-025-06240-y","url":null,"abstract":"<p><strong>Background: </strong>The high dimensionality of data in single cell transcriptomics (scRNAseq) requires investigators to choose subsets of genes (\"feature selection\") for downstream analysis (e.g., unsupervised cell clustering). The evaluation of different approaches to feature selection is hampered by the fact that, as we show here, the difficulty of feature selection can vary greatly, depending on the dataset being analyzed.</p><p><strong>Results: </strong>For routine cell type identification, even randomly chosen features can perform well, but for cell type differences that are subtle, both number of features and selection strategy matter strongly. We present a simple feature selection method grounded in an analytical model that allows for interpretable delineation of how many and which features to choose, facilitating identification of biologically meaningful rare cell types. We compare this method to default methods in scanpy and Seurat, as well as SCTransform, showing how greater accuracy can often be achieved with surprisingly few, well-chosen features.</p><p><strong>Conclusions: </strong>Feature selection is a critical step in scRNAseq for downstream analyses. We explore the pitfalls that can arise from incautious feature selection and present a statistical method to facilitate improved outcomes.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"238"},"PeriodicalIF":3.3,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12490061/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145211501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HEPAD: enhancing hemolytic peptide prediction with adaptive feature engineering and diverse sequence descriptors. HEPAD:利用自适应特征工程和多种序列描述子增强溶血肽预测。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-10-01 DOI: 10.1186/s12859-025-06254-6
Sih-Han Chen, Jen-Chieh Yu, Yi-Hsiang Lin, Shao-Chun Kuo, Kuan Ni, Ching-Tai Chen
{"title":"HEPAD: enhancing hemolytic peptide prediction with adaptive feature engineering and diverse sequence descriptors.","authors":"Sih-Han Chen, Jen-Chieh Yu, Yi-Hsiang Lin, Shao-Chun Kuo, Kuan Ni, Ching-Tai Chen","doi":"10.1186/s12859-025-06254-6","DOIUrl":"10.1186/s12859-025-06254-6","url":null,"abstract":"<p><strong>Background: </strong>Peptides have emerged as promising therapeutic agents for drug development against cancer, immune disorders, hypertension, and microbial infections. Peptide drugs have the advantage of high selectivity, low production cost, and fewer side effects compared to traditional small molecule-based drugs. However, one main challenge that hinders the adoption of peptide therapeutics is that some peptides are prone to be hemolytic, leading to the disruption of erythrocyte membranes and decreasing the life span of red blood cells. A computational model for hemolytic peptide identification would be a valuable tool for peptide drug discovery.</p><p><strong>Results: </strong>In this study, we present HEPAD, a machine learning predictor to identify hemolytic peptides based on adaptive feature engineering and diverse sequence descriptors. Sequence descriptors were applied for feature encoding, generating a feature vector of nearly 4000 numeric values for each peptide. Next, an adaptive feature engineering method was proposed to produce a customized feature subset for a given dataset. The four datasets considered in this study were associated with 250, 350, 90, and 130 selected features. Five machine learning methods of different rationale were employed to perform cross validation and independent tests. HEPAD yields Matthew's correlation coefficients (MCCs) of 0.973, 0.643, and 0.609, respectively, for three independent datasets. The improvements in MCC compared to existing approaches range from 1.9 to 13.3% for three independent tests. Moreover, data visualization reveals that the customized feature subsets can effectively separate hemolytic peptides from random peptides.</p><p><strong>Conclusions: </strong>HEPAD offers efficient identification of potential hemolytic peptides, thereby expediting experimental procedures in drug discovery. The source code, datasets, and machine learning models are available at https://github.com/csh07/HEPAD .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"234"},"PeriodicalIF":3.3,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12486866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145204971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HolomiRA: a reproducible pipeline for miRNA binding site prediction in microbial genomes. HolomiRA:微生物基因组中miRNA结合位点预测的可重复管道。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-10-01 DOI: 10.1186/s12859-025-06241-x
Jennifer Jessica Bruscadin, Tainã Figueiredo Cardoso, Liliane Costa Conteville, Juliana Virginio da Silva, Adriana Mércia Guaratini Ibelli, Gabriel Alexander Colmenarez Pena, Thanny Porto, Priscila Silva Neubern de Oliveira, Bruno Gabriel Nascimento Andrade, Adhemar Zerlotini, Luciana Correia de Almeida Regitano
{"title":"HolomiRA: a reproducible pipeline for miRNA binding site prediction in microbial genomes.","authors":"Jennifer Jessica Bruscadin, Tainã Figueiredo Cardoso, Liliane Costa Conteville, Juliana Virginio da Silva, Adriana Mércia Guaratini Ibelli, Gabriel Alexander Colmenarez Pena, Thanny Porto, Priscila Silva Neubern de Oliveira, Bruno Gabriel Nascimento Andrade, Adhemar Zerlotini, Luciana Correia de Almeida Regitano","doi":"10.1186/s12859-025-06241-x","DOIUrl":"10.1186/s12859-025-06241-x","url":null,"abstract":"<p><strong>Background: </strong>Small RNAs, such as microRNAs (miRNAs), are candidates for mediating communication between the host and its microbiota, regulating bacterial gene expression and influencing microbiome functions and dynamics. Here, we introduce HolomiRA (Holobiome miRNA Affinity Predictor), a computational pipeline developed to predict target sites for host miRNAs in microbiome genomes. HolomiRA operates within a Snakemake workflow, processes microbial genomic sequences in FASTA format using freely available bioinformatics software and incorporates built-in data processing methods. The pipeline begins by annotating protein-coding sequences from microbial genomes using Prokka. It then identifies candidate regions, evaluates them for potential host miRNA binding sites and the accessibility of these target sites using RNAHybrid and RNAup software. The predicted results that meet the quality filter parameters are further summarized and used to perform a functional analysis of the affected genes using SUPER-FOCUS software.</p><p><strong>Results: </strong>In this paper, we demonstrate the use of the HolomiRA pipeline by applying it to publicly available metagenome-assembled genomes obtained from human feces, as well as from bovine feces and ruminal content. This approach enables the prediction of bacterial genes and biological pathways within microbiomes that could be influenced by host miRNAs. It also allows for the identification of shared or unique miRNAs, target genes, and taxonomies across phenotypes, environments, or host species.</p><p><strong>Conclusions: </strong>HolomiRA is a practical and user-friendly pipeline designed as a hypothesis-generating tool to support the prediction of host miRNA binding sites in prokaryotic genomes, providing insights into host-microbiota communication mediated by miRNA regulation. HolomiRA is publicly available on GitHub: https://github.com/JBruscadin/HolomiRA .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"236"},"PeriodicalIF":3.3,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12487068/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145205033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MysteryMaster: scraping the bottom of the barrel of barcoded Oxford nanopore reads. 神秘大师:刮掉牛津纳米孔条形码的桶底。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-10-01 DOI: 10.1186/s12859-025-06266-2
Abdolrahman Khezri, Sverre Branders, Anurag Basavaraj Bellankimath, Jawad Ali, Crystal Chapagain, Fatemeh Asadi, Manfred G Grabherr, Rafi Ahmad
{"title":"MysteryMaster: scraping the bottom of the barrel of barcoded Oxford nanopore reads.","authors":"Abdolrahman Khezri, Sverre Branders, Anurag Basavaraj Bellankimath, Jawad Ali, Crystal Chapagain, Fatemeh Asadi, Manfred G Grabherr, Rafi Ahmad","doi":"10.1186/s12859-025-06266-2","DOIUrl":"10.1186/s12859-025-06266-2","url":null,"abstract":"<p><strong>Background: </strong>The high error rate associated with Oxford Nanopore sequencing technology adversely affects demultiplexing. To improve demultiplexing and reduce unclassified reads from nanopore sequencing data, we developed MysteryMaster, a demultiplexer that utilizes the optimal sequence aligner, Cola.</p><p><strong>Results: </strong>When compared to Oxford Nanopore´s Dorado and Guppy demultiplexing tools across three datasets of 37 diverse samples with established ground truth, we found that MysteryMaster accurately identifies a similar or greater percentage of reads among the different basecalling models: Fast, HAC, and SUP. MysteryMaster performs slightly better than the other tools on data that was basecalled using the Fast basecalled model, while its performance in HAC and SUP data is similar to Dorado's. MysteryMaster has a false positive rate of just 0.41% with default settings.</p><p><strong>Conclusions: </strong>While MysteryMaster can function as a standalone demultiplexer tool, the sequential application of Dorado and MysteryMaster produced the best overall performance.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"235"},"PeriodicalIF":3.3,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12487470/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145205110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Can geometric combinatorics improve RNA branching predictions? 几何组合能改善RNA分支预测吗?
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-10-01 DOI: 10.1186/s12859-025-06155-8
Svetlana Poznanović, Owen Cardwell, Christine Heitsch
{"title":"Can geometric combinatorics improve RNA branching predictions?","authors":"Svetlana Poznanović, Owen Cardwell, Christine Heitsch","doi":"10.1186/s12859-025-06155-8","DOIUrl":"10.1186/s12859-025-06155-8","url":null,"abstract":"<p><strong>Background: </strong>Prior results for tRNA and 5S rRNA demonstrated that secondary structure prediction accuracy can be significantly improved by modifying the parameters in the multibranch loop entropic penalty function. However, for reasons not well understood at the time, the scale of improvement possible across both families was well below the level for each family when considered separately.</p><p><strong>Results: </strong>We resolve this dichotomy here by showing that each family has a characteristic target region geometry, which is distinct from the other and significantly different from their own dinucleotide shuffles. This required a much more efficient approach to computing the necessary information from the branching parameter space, and a new theoretical characterization of the region geometries.</p><p><strong>Conclusions: </strong>The insights gained point strongly to considering multiple possible secondary structures generated by varying the multiloop parameters. We provide proof-of-principle results that this significantly improves prediction accuracy across all 8 additional families in the Archive II benchmarking dataset.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"237"},"PeriodicalIF":3.3,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12487464/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145204955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting K-nearest neighbor regression performance for longitudinal data through a novel learning approach. 通过一种新的学习方法提高纵向数据的k近邻回归性能。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-09-30 DOI: 10.1186/s12859-025-06205-1
Mohammad Sadegh Loeloe, Seyyed Mohammad Tabatabaei, Reyhane Sefidkar, Amir Houshang Mehrparvar, Sara Jambarsang
{"title":"Boosting K-nearest neighbor regression performance for longitudinal data through a novel learning approach.","authors":"Mohammad Sadegh Loeloe, Seyyed Mohammad Tabatabaei, Reyhane Sefidkar, Amir Houshang Mehrparvar, Sara Jambarsang","doi":"10.1186/s12859-025-06205-1","DOIUrl":"10.1186/s12859-025-06205-1","url":null,"abstract":"<p><strong>Background: </strong>Longitudinal studies often require flexible methodologies for predicting response trajectories based on time-dependent and time-independent covariates. To address the complexities of longitudinal data, this study proposes a novel extension of K-Nearest Neighbor (KNN) regression, referred to as Clustering-based KNN Regression for Longitudinal Data (CKNNRLD).</p><p><strong>Methods: </strong>In CKNNRLD, data are first clustered using the KML algorithm (K-means for longitudinal data), and the nearest neighbors are then searched within the relevant cluster rather than across the entire dataset. The theoretical framework of CKNNRLD was developed and evaluated through extensive simulation studies. Ultimately, the method was applied to a real longitudinal spirometry dataset.</p><p><strong>Result: </strong>Compared to the standard KNN, CKNNRLD demonstrated improved prediction accuracy, shorter execution time, and reduced computational burden. According to the simulation findings, using the CKNNRLD method for this purpose took less time compared to using the KNN implementation (for N > 100). It predicted the longitudinal responses more accurately and precisely than the equivalent algorithm. For instance, CKNNRLD execution time was approximately 3.7 times faster than the typical KNN execution time in the scenario with N = 2000, T = 5, D = 2, C = 4, E = 1, and R = 1. Since the KNN method needs all of the training data to identify the nearest neighbors, it tends to operate slowly as the number of individuals in longitudinal research increases (for N > 500).</p><p><strong>Conclusion: </strong>The CKNNRLD algorithm significantly improves accuracy and computational efficiency for predicting longitudinal responses compared to traditional KNN methods. These findings highlight its potential as a valuable tool for researchers with large longitudinal datasets.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"232"},"PeriodicalIF":3.3,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482645/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145197976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MLGCN-Driver: a cancer driver gene identification method based on multi-layer graph convolutional neural network. MLGCN-Driver:一种基于多层图卷积神经网络的癌症驱动基因识别方法。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-09-30 DOI: 10.1186/s12859-025-06260-8
Pi-Jing Wei, Jingxin Zhou, Rui-Fen Cao, Yun Ding, Zhenyu Yue, Chun-Hou Zheng
{"title":"MLGCN-Driver: a cancer driver gene identification method based on multi-layer graph convolutional neural network.","authors":"Pi-Jing Wei, Jingxin Zhou, Rui-Fen Cao, Yun Ding, Zhenyu Yue, Chun-Hou Zheng","doi":"10.1186/s12859-025-06260-8","DOIUrl":"10.1186/s12859-025-06260-8","url":null,"abstract":"<p><strong>Background: </strong>The progression of cancer is driven by the accumulation of mutations in driver genes. Many researches promote to identify cancer driver genes. However, most of them ignore the high-order features in the network.</p><p><strong>Result: </strong>In this study, we propose a novel method MLGCN-Driver based on multi-layer graph convolutional neural networks (GCN) to boost driver gene identification. MLGCN-Driver employs multi-layer GCN with initial residual connections and identity mappings to learn biological multi-omics features within biological networks. In addition, node2vec algorithm is used to extract the topological structure features of the biological network, and then the features are fed into another multi-layer GCN for feature learning. Meanwhile, the initial residual connections and identity mappings mitigate the over-smooth of features. Finally, the probability of each gene being a driver gene is calculated based on low-dimensional biological features and topological features.</p><p><strong>Conclusion: </strong>We applied the MLGCN-Driver on pan-cancer dataset and cancer type-specific datasets. Experimental results demonstrate the excellent performance of MLGCN-Driver in terms of the area under the ROC curve (AUC) and the area under the precision-recall curve (AUPRC) when compared with state-of-the-art approaches.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"233"},"PeriodicalIF":3.3,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12482577/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145198039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Structure-sensitive transformer and multi-view graph contrastive learning enhanced prediction of drug-related microbes. 结构敏感变压器和多视图图对比学习增强了药物相关微生物的预测。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-09-26 DOI: 10.1186/s12859-025-06199-w
Ping Xuan, Rui Wang, Jing Gu, Hui Cui, Tiangang Zhang
{"title":"Structure-sensitive transformer and multi-view graph contrastive learning enhanced prediction of drug-related microbes.","authors":"Ping Xuan, Rui Wang, Jing Gu, Hui Cui, Tiangang Zhang","doi":"10.1186/s12859-025-06199-w","DOIUrl":"10.1186/s12859-025-06199-w","url":null,"abstract":"<p><strong>Background: </strong>The human microbiome plays a crucial role in regulating the efficacy and toxicity of drugs as well as in developing the drugs. Therefore, predicting the drug-related microbes is beneficial for analyzing the functional mechanisms of drugs. Recently, the graph learning based methods demonstrated their advantages in extracting the node features from the biological heterogeneous graphs. However, the previous methods failed to completely preserve the intrinsic structures of biological data and did not fully utilize the topological and positional information for predicting the drug-microbe associations.</p><p><strong>Results: </strong>We propose a new prediction model, structure-sensitive transformer and multi-view graph contrastive learning for microbe-drug association prediction (SMMDA), to encode and integrate the topological structures, semantics, and multiple-view embedding features of the drugs and microbes. Considering the sparsity of the original features of drugs and microbes, the learnable data augmentation strategy is designed to learn their global representations. Since similar drugs are more likely to associate with the similar microbes, a structure-sensitive transformer is proposed to integrate the topology structures composed of drugs (microbes) to form the multi-view embedding features. We design two contrastive learning strategies to exploit the complementary semantics across multiple views. As the embedding features from multiple views have various semantics, we design view-level attention to adaptively integrate these features.</p><p><strong>Conclusions: </strong>The extensive experimental results show that SMMDA outperforms several state-of-the-art methods for predicting the drug-related candidate microbes. The ablation studies show the effectiveness of the major innovations which include the learnable data augmentation, structure-sensitive transformer-based node feature learning, and multi-view contrastive learning. The case studies on three drugs also demonstrate SMMDA's capability in retrieving the potential microbe candidates for the drugs.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"231"},"PeriodicalIF":3.3,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12465207/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145173386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Glucostats: an efficient Python library for glucose time series feature extraction and visual analysis. Glucostats:用于葡萄糖时间序列特征提取和可视化分析的高效Python库。
IF 3.3 3区 生物学
BMC Bioinformatics Pub Date : 2025-09-24 DOI: 10.1186/s12859-025-06250-w
Pablo Peiro-Corbacho, Francisco J Lara-Abelenda, David Chushig-Muzo, Ana M Wägner, Conceição Granja, Cristina Soguero-Ruiz
{"title":"Glucostats: an efficient Python library for glucose time series feature extraction and visual analysis.","authors":"Pablo Peiro-Corbacho, Francisco J Lara-Abelenda, David Chushig-Muzo, Ana M Wägner, Conceição Granja, Cristina Soguero-Ruiz","doi":"10.1186/s12859-025-06250-w","DOIUrl":"10.1186/s12859-025-06250-w","url":null,"abstract":"<p><strong>Background: </strong>The advancement of technology and continuous glucose monitoring (CGM) systems has introduced several computational and technical challenges for clinicians and researchers. The growing volume of CGM data necessitates the development of efficient computational tools capable of handling and processing this information effectively. This paper introduces GlucoStats, an open-source and multi-processing Python library designed for efficient computation and visualization of a comprehensive set of glucose metrics derived from CGM. It simplifies the traditionally time-consuming and error-prone process of manual CGM metrics calculation, making it a valuable tool for both clinical and research applications.</p><p><strong>Results: </strong>Its modular design ensures easy integration into predefined workflows, while its user-friendly interface and extensive documentation make it accessible to a broad audience, including clinicians and researchers. GlucoStats offers several key features: (i) window-based time series analysis, enabling time series division into smaller 'windows' for detailed temporal analysis, particularly beneficial for CGM data; (ii) advanced visualization tools, providing intuitive, high-quality visualizations that facilitate pattern recognition, trend analysis, and anomaly detection in CGM data; (iii) parallelization, leveraging parallel computing to efficiently handle large CGM datasets by distributing computations across multiple processors; and (iv) scikit-learn compatibility, adhering to the standardized interface of scikit-learn to allow an easy integration into machine learning pipelines for end-to-end analysis.</p><p><strong>Conclusions: </strong>GlucoStats demonstrates high efficiency in processing large-scale medical datasets in minimal time. Its modular design enables easy customization and extension, making it adaptable to diverse research and clinical needs. By offering precise CGM data analysis and user-friendly visualization tools, it serves both technical researchers and non-technical users, such as physicians and patients, with practical and research-driven applications.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"230"},"PeriodicalIF":3.3,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462135/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145136400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信