BMC Bioinformatics最新文献

筛选
英文 中文
A Dirichlet-multinomial mixed model for determining differential abundance of mutational signatures. 确定突变特征差分丰度的dirichlet -多项式混合模型。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-02-18 DOI: 10.1186/s12859-025-06055-x
Lena Morrill Gavarró, Dominique-Laurent Couturier, Florian Markowetz
{"title":"A Dirichlet-multinomial mixed model for determining differential abundance of mutational signatures.","authors":"Lena Morrill Gavarró, Dominique-Laurent Couturier, Florian Markowetz","doi":"10.1186/s12859-025-06055-x","DOIUrl":"10.1186/s12859-025-06055-x","url":null,"abstract":"<p><strong>Background: </strong>Mutational processes of diverse origin leave their imprints in the genome during tumour evolution. These imprints are called mutational signatures and they have been characterised for point mutations, structural variants and copy number changes. Each signature has an exposure, or abundance, per sample, which indicates how much a process has contributed to the overall genomic change. Mutational processes are not static, and a better understanding of their dynamics is key to characterise tumour evolution and identify cancer cell vulnerabilities that can be exploited during treatment. However, the structure of the data typically collected in this context makes it difficult to test whether signature exposures differ between conditions or time-points when comparing groups of samples. In general, the data consists of multivariate count mutational data (e.g. signature exposures) with two observations per patient, each reflecting a group.</p><p><strong>Results: </strong>We propose a mixed-effects Dirichlet-multinomial model: within-patient correlations are taken into account with random effects, possible correlations between signatures by making such random effects multivariate, and a group-specific dispersion parameter can deal with particularities of the groups. Moreover, the model is flexible in its fixed-effects structure, so that the two-group comparison can be generalised to several groups, or to a regression setting. We apply our approach to characterise differences of mutational processes between clonal and subclonal mutations across 23 cancer types of the PCAWG cohort. We find ubiquitous differential abundance of clonal and subclonal signatures across cancer types, and higher dispersion of signatures in the subclonal group, indicating higher variability between patients at subclonal level, possibly due to the presence of different clones with distinct active mutational processes.</p><p><strong>Conclusions: </strong>Mutational signature analysis is an expanding field and we envision our framework to be used widely to detect global changes in mutational process activity. Our methodology is available in the R package CompSign and offers an ample toolkit for the analysis and visualisation of differential abundance of compositional data such as, but not restricted to, mutational signatures.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"59"},"PeriodicalIF":2.9,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11837616/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143447956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MFCADTI: improving drug-target interaction prediction by integrating multiple feature through cross attention mechanism. MFCADTI:通过交叉注意机制整合多种特征,提高药物-靶标相互作用预测。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-02-18 DOI: 10.1186/s12859-025-06075-7
Na Quan, Shicheng Ma, Kai Zhao, Xuehua Bi, Linlin Zhang
{"title":"MFCADTI: improving drug-target interaction prediction by integrating multiple feature through cross attention mechanism.","authors":"Na Quan, Shicheng Ma, Kai Zhao, Xuehua Bi, Linlin Zhang","doi":"10.1186/s12859-025-06075-7","DOIUrl":"10.1186/s12859-025-06075-7","url":null,"abstract":"<p><p>Accurately identifying potential drug-target interactions (DTIs) is a critical step in drug discovery. Multiple heterogeneous biological data provide abundant features for DTI prediction. Many computational methods have been proposed based on these data. However, most of these methods either extract features from sequences or from networks, utilizing only one aspect of the characteristics of drugs and targets, neglecting the complementary information between these two types of features. In fact, integrating different types of features will provide more valuable information for DTI prediction. In this article, we propose a novel method to improve the predictive capability for DTIs, named MFCADTI, by integrating multi-source feature through cross-attention mechanisms. The method extracts network topological features from the heterogeneous network and attribute features from sequences of drugs and targets. Considering the complementarity and heterogeneity between network and attribute features, cross-attention mechanisms are used to integrate the network and attribute features of drugs and targets. To capture the correlations between drugs and targets, cross-attention is used to learn the interaction features of each drug-target pair. We evaluate MFCADTI on two datasets and experimental results demonstrate a significant improvement in the performance of MFCADTI compared to state-of-the-art methods. Finally, case studies illustrate that MFCADTI is an effective DTI prediction way that provides valuable guidance for drug development. The data and source code used in this study are available at: https://github.com/Dejavun/MFCADTI .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"57"},"PeriodicalIF":2.9,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834641/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143448006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel weighted pseudo-labeling framework based on matrix factorization for adverse drug reaction prediction. 基于矩阵分解的药物不良反应预测加权伪标记框架。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-02-17 DOI: 10.1186/s12859-025-06053-z
Junheng Chen, Fangfang Han, Mingxiu He, Yiyang Shi, Yongming Cai
{"title":"A novel weighted pseudo-labeling framework based on matrix factorization for adverse drug reaction prediction.","authors":"Junheng Chen, Fangfang Han, Mingxiu He, Yiyang Shi, Yongming Cai","doi":"10.1186/s12859-025-06053-z","DOIUrl":"10.1186/s12859-025-06053-z","url":null,"abstract":"<p><p>Adverse drug reactions (ADRs) are among the global public health events that seriously endanger human life and cause high economic burdens. Therefore, predicting the possibility of their occurrence and taking early and effective response measures is of great significance. Constructing a correlation matrix between drugs and their adverse reactions, followed by effective correlation data mining, is one of the current strategies to predict ADRs using accessible public data. Since the number of known ADRs in real-world data is far less than the number of their unknown counterparts, the drug-ADR association matrix is very sparse, which greatly affects the classification performance of machine learning methods. To effectively address the problem of sparsity, we proposed a novel weighted pseudo-labeling framework that mines potential unknown drug-ADR pairs by integrating multiple weighted matrix factorization (MF) models and treating them as pseudo-labeled drug-ADR pairs. Pseudo-labeled data is added to the training set, and the MF model is fine-tuned to improve the classification performance. To prevent overfitting to easily found pseudo-labels and improve the quality of pseudo-labels, a novel weighting approach for pseudo-labels was adopted. This paper reproduces the baselines under the same experimental conditions to evaluate the performance of the proposed method on sparse data from the Side Effect Resource (SIDER) database. Experimental results showed that our method outperformed other baselines in the Area Under Precision-Recall and F1-scores and still maintained the best performance in sparser scenarios. Furthermore, we conducted a case study, and the results showed that our proposed framework efficiently predicted ADRs in the real world.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"54"},"PeriodicalIF":2.9,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11831795/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143439536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying risk factors for Alzheimer's disease from multivariate longitudinal clinical data using temporal pattern mining. 利用时间模式挖掘从多变量纵向临床数据中识别阿尔茨海默病的危险因素。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-02-17 DOI: 10.1186/s12859-024-06018-8
Annette Spooner, Gelareh Mohammadi, Perminder S Sachdev, Henry Brodaty, Arcot Sowmya
{"title":"Identifying risk factors for Alzheimer's disease from multivariate longitudinal clinical data using temporal pattern mining.","authors":"Annette Spooner, Gelareh Mohammadi, Perminder S Sachdev, Henry Brodaty, Arcot Sowmya","doi":"10.1186/s12859-024-06018-8","DOIUrl":"10.1186/s12859-024-06018-8","url":null,"abstract":"<p><strong>Background: </strong>Patient data contain a wealth of information that could aid in understanding the onset and progression of disease. However, the task of modelling clinical data, which consist of multiple heterogeneous time series of different lengths, measured at different time intervals, is a complex one. A growing body of research has applied temporal pattern mining to this problem to identify common patterns in clinical attributes over time. However, the vast majority of these algorithms use techniques that are not ideally suited to clinical data. We present an efficient and scalable framework designed specifically for temporal pattern mining of real-world clinical data. Our framework combines temporal abstraction, an extended version of the efficient pattern-growth algorithm, TPMiner, the concepts of relative risk and the odds ratio to identify interesting and high-risk patterns and multiprocessing to improve computational efficiency. A complete set of cut-off values for discretisation and interpretation of the data is provided and is applicable to studies on ageing populations in general. We name this framework Clinical Temporal Pattern Mining or C-TPM.</p><p><strong>Results: </strong>The framework is applied to data from two real-world studies of Alzheimer's disease (AD). The patterns discovered were predictive of AD in survival analysis models with a Concordance index of up to 0.87 and contain clinically relevant variables. A visualisation module provides a clear picture of the discovered patterns for ease of interpretability.</p><p><strong>Conclusions: </strong>The framework provides an effective and scalable method of modelling multivariate, longitudinal clinical data and can identify patterns in uncommon diseases and those that progress slowly over time. It is generalisable to clinical data from other medical domains as well as non-clinical data.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"56"},"PeriodicalIF":2.9,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834509/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143439662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harnessing pre-trained models for accurate prediction of protein-ligand binding affinity. 利用预先训练的模型来准确预测蛋白质与配体的结合亲和力。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-02-17 DOI: 10.1186/s12859-025-06064-w
Jiashan Li, Xinqi Gong
{"title":"Harnessing pre-trained models for accurate prediction of protein-ligand binding affinity.","authors":"Jiashan Li, Xinqi Gong","doi":"10.1186/s12859-025-06064-w","DOIUrl":"10.1186/s12859-025-06064-w","url":null,"abstract":"<p><strong>Background: </strong>The binding between proteins and ligands plays a crucial role in the field of drug discovery. However, this area currently faces numerous challenges. On one hand, existing methods are constrained by the limited availability of labeled data, often performing inadequately when addressing complex protein-ligand interactions. On the other hand, many models struggle to effectively capture the flexible variations and relative spatial relationships between proteins and ligands. These issues not only significantly hinder the advancement of protein-ligand binding research but also adversely affect the accuracy and efficiency of drug discovery. Therefore, in response to these challenges, our study aims to enhance predictive capabilities through innovative approaches, providing more reliable support for drug discovery efforts.</p><p><strong>Methods: </strong>This study leverages a pre-trained model with spatial awareness to enhance the prediction of protein-ligand binding affinity. By perturbing the structures of small molecules in a manner consistent with physical constraints and employing self-supervised tasks, we improve the representation of small molecule structures, allowing for better adaptation to affinity predictions. Meanwhile, our approach enables the identification of potential binding sites on proteins.</p><p><strong>Results: </strong>Our model demonstrates a significantly higher correlation coefficient in binding affinity predictions. Extensive evaluation on the PDBBind v2019 refined set, CASF, and Merck FEP benchmarks confirms the model's robustness and strong generalization across diverse datasets. Additionally, the model achieves over 95% in classification ROC for binding site identification, underscoring its high accuracy in pinpointing protein-ligand interaction regions.</p><p><strong>Conclusion: </strong>This research presents a novel approach that not only enhances the accuracy of binding affinity predictions but also facilitates the identification of binding sites, showcasing the potential of pre-trained models in computational drug design. Data and code are available at https://github.com/MIALAB-RUC/SableBind .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"55"},"PeriodicalIF":2.9,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834573/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143439487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CamITree: a streamlined software for phylogenetic analysis of viral and mitochondrial genomes. CamITree:一款用于病毒和线粒体基因组系统发育分析的流线型软件。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-02-14 DOI: 10.1186/s12859-025-06034-2
Peng Sun, Yu Yang, Mengjie Yuan, Qin Tang
{"title":"CamITree: a streamlined software for phylogenetic analysis of viral and mitochondrial genomes.","authors":"Peng Sun, Yu Yang, Mengjie Yuan, Qin Tang","doi":"10.1186/s12859-025-06034-2","DOIUrl":"10.1186/s12859-025-06034-2","url":null,"abstract":"<p><strong>Background: </strong>Over the past decade, the continuous and rapid advances in bioinformatics have led to an increasingly common use of molecular sequence comparison for phylogenetic analysis. However, the use of multi-software and cross-platform strategies has increased the complexity of phylogenetic tree estimation. Therefore, the development and application of streamlined phylogenetic analysis tools are growing in significance in the field of biology. Particularly for genomes with relatively short sequences, there is a lack of simple and integrative tools for phylogenetic analysis.</p><p><strong>Results: </strong>In this study, we present CamlTree (Concatenated alignments maximum-likelihood tree), a user-friendly desktop software designed to simplify phylogenetic analysis for viral and mitochondrial genomes, ultimately facilitating related research. CamlTree provides a workflow including gene concatenation (or coalescence), sequence alignment, alignment optimization, and the estimation of phylogenetic trees using both maximum-likelihood (ML) and Bayesian inference (BI) methods. CamlTree was written in TypeScript and developed using the Electron framework. It offers a primarily user-friendly interface based on the React framework.</p><p><strong>Conclusions: </strong>CamlTree software has been released for the Windows OS. It integrates several popular analysis tools to optimize and simplify the process of estimating polygenic phylogenetic trees. The establishment of software can assist researchers in reducing their workload and enhancing data processing efficiency, enabling them to expedite their research progress. The software, along with a detailed user manual, is available at https://github.com/BioCrossCoder/camltree .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"53"},"PeriodicalIF":2.9,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11829546/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143424463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using individual barcodes to increase quantification power of massively parallel reporter assays. 使用单独的条形码来增加大规模平行报告分析的定量能力。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-02-13 DOI: 10.1186/s12859-025-06065-9
Pia Keukeleire, Jonathan D Rosen, Angelina Göbel-Knapp, Kilian Salomon, Max Schubach, Martin Kircher
{"title":"Using individual barcodes to increase quantification power of massively parallel reporter assays.","authors":"Pia Keukeleire, Jonathan D Rosen, Angelina Göbel-Knapp, Kilian Salomon, Max Schubach, Martin Kircher","doi":"10.1186/s12859-025-06065-9","DOIUrl":"10.1186/s12859-025-06065-9","url":null,"abstract":"<p><strong>Background: </strong>Massively parallel reporter assays (MPRAs) are an experimental technology for measuring the activity of thousands of candidate regulatory sequences or their variants in parallel, where the activity of individual sequences is measured from pools of sequence-tagged reporter genes. Activity is derived from the ratio of transcribed RNA to input DNA counts of associated tag sequences in each reporter construct, so-called barcodes. Recently, tools specifically designed to analyze MPRA data were developed that attempt to model the count data, accounting for its inherent variation. Of these tools, MPRAnalyze and mpralm are most widely used. MPRAnalyze models barcode counts to estimate the transcription rate of each sequence. While it has increased statistical power and robustness against outliers compared to mpralm, it is slow and has a high false discovery rate. Mpralm, a tool built on the R package Limma, estimates log fold-changes between different sequences. As opposed to MPRAnalyze, it is fast and has a low false discovery rate but is susceptible to outliers and has less statistical power.</p><p><strong>Results: </strong>We propose BCalm, an MPRA analysis framework aimed at addressing the limitations of the existing tools. BCalm is an adaptation of mpralm, but models individual barcode counts instead of aggregating counts per sequence. Leaving out the aggregation step increases statistical power and improves robustness to outliers, while being fast and precise. We show the improved performance over existing methods on both simulated MPRA data and a lentiviral MPRA library of 166,508 target sequences, including 82,258 allelic variants. Further, BCalm adds functionality beyond the existing mpralm package, such as preparing count input files from MPRAsnakeflow, as well as an option to test for sequences with enhancing or repressing activity. Its built-in plotting functionalities allow for easy interpretation of the results.</p><p><strong>Conclusions: </strong>With BCalm, we provide a new tool for analyzing MPRA data which is robust and accurate on real MPRA datasets. The package is available at https://github.com/kircherlab/BCalm .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"52"},"PeriodicalIF":2.9,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11827149/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143413312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CoMIT: a bioinformatic pipeline for risk-based prediction of COVID-19 test inclusivity. CoMIT:基于风险预测COVID-19检测包容性的生物信息学管道。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-02-12 DOI: 10.1186/s12859-025-06046-y
Diane M Walker, Wendy A Smith, Lia Gale, Jacob T Wolff, Connor P Healy, Hannah F Van Hollebeke, Ashlie Stephenson, Marianne Kim
{"title":"CoMIT: a bioinformatic pipeline for risk-based prediction of COVID-19 test inclusivity.","authors":"Diane M Walker, Wendy A Smith, Lia Gale, Jacob T Wolff, Connor P Healy, Hannah F Van Hollebeke, Ashlie Stephenson, Marianne Kim","doi":"10.1186/s12859-025-06046-y","DOIUrl":"10.1186/s12859-025-06046-y","url":null,"abstract":"<p><strong>Background: </strong>The global Coronavirus Disease 2019 (COVID-19) pandemic highlighted the need to quickly diagnose infections to identify and prevent viral spread in the population. In response to the pandemic, BioFire Defense leveraged its PCR-based \"lab-in-a-pouch\" technology for expedited development of the BioFire® COVID-19 Test, a novel in vitro diagnostic detecting SARS-CoV-2 nucleic acid in human samples. Following clearance of an in vitro diagnostic device, regulatory bodies such as the U.S. Food and Drug Administration (FDA) require regular post market surveillance to monitor test performance against viral lineages circulating in the field, using predictive in silico inclusivity evaluations. Exponential increases in the number of sequences deposited in bioinformatic repositories such as GISAID, during the pandemic, impeded progress in meeting these post market requirements. In response, BioFire Defense developed a new bioinformatic tool to overcome scalability problems and the loss of accuracy encountered with the standard inclusivity method.</p><p><strong>Results: </strong>The Coronavirus Monitoring for Inclusivity Tool (CoMIT) uses the Variant Sorter Algorithm to sidestep multiple sequence alignments, a significant barrier inherent in the standard inclusivity method. The implementation of CoMIT and its Variant Sorter Algorithm are described. Automated summary tables and visualizations from a typical inclusivity evaluation are presented. We report our approach to filter and display relevant information in the pipeline outputs using risk factors tied to test performance.</p><p><strong>Conclusions: </strong>BioFire Defense has developed CoMIT, an automated bioinformatic pipeline for efficient processing and reporting of variant inclusivity from the GISAID EpiCoV™ repository. This tool ensures continuous and comprehensive post market evaluations of BioFire COVID-19 Test performance even from datasets large enough to impede standard inclusivity analyses. CoMIT's low computational space complexity and modular code allow this tool to be generalized for inclusivity monitoring of multianalyte or single analyte tests with complex assay designs and/or highly variable targets. CoMIT's databasing capabilities and metadata handling hold the potential for new investigations to improve readiness for future outbreaks.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"51"},"PeriodicalIF":2.9,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11817761/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143405551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HarmonizR: blocking and singular feature data adjustment improve runtime efficiency and data preservation. HarmonizR:分块和奇异特征数据调整提高了运行效率和数据保存。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-02-11 DOI: 10.1186/s12859-025-06073-9
Simon Schlumbohm, Julia E Neumann, Philipp Neumann
{"title":"HarmonizR: blocking and singular feature data adjustment improve runtime efficiency and data preservation.","authors":"Simon Schlumbohm, Julia E Neumann, Philipp Neumann","doi":"10.1186/s12859-025-06073-9","DOIUrl":"10.1186/s12859-025-06073-9","url":null,"abstract":"<p><strong>Background: </strong>Data adjustment is an essential tool for increasing statistical power during analysis, for example in case of complex multi-experiment data from (single-cell) RNA, proteomics and other omics data. Despite its benefits, data integration introduces internal biases-so-called batch effects. Due to the inherent presence of missing values by such methods and their additional introduction by means of data integration, renowned algorithms such as ComBat and limma are unable to perform batch effect adjustment. Recently, the HarmonizR framework was presented for these cases, which is a tool for missing value tolerant data adjustment.</p><p><strong>Results: </strong>In this contribution, we provide significant improvements to the HarmonizR approach. A novel blocking strategy is introduced to severely reduce runtime, while still supporting parallel architectures. Additionally, a \"unique removal\" strategy has been integrated into HarmonizR to maintain even more features for adjustment in datasets, showing a feature rescue of up to 103.9% for our tested datasets. In this work, we show (1) severely improved runtime for both small and large, real datasets and (2) the ability retain more features from the integrated dataset during adjustment, showing a feature rescue of up to 103.9% for our tested datasets.</p><p><strong>Conclusion: </strong>The proposed improvements tackle the previous shortcomings of the published HarmonizR version. Since HarmonizR was mainly developed for dataset integration on rare tumor entities, it did not include runtime improvements beyond parallelization, which has been addressed in this update. An additionally welcome update regarding improved feature rescue furthermore enhances the algorithms ability to quickly and robustly perform batch effect reduction.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"47"},"PeriodicalIF":2.9,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11817103/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143398011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mammalian piRNA target prediction using a hierarchical attention model. 基于分层注意模型的哺乳动物piRNA靶标预测。
IF 2.9 3区 生物学
BMC Bioinformatics Pub Date : 2025-02-11 DOI: 10.1186/s12859-025-06068-6
Tianjiao Zhang, Liang Chen, Haibin Zhu, Garry Wong
{"title":"Mammalian piRNA target prediction using a hierarchical attention model.","authors":"Tianjiao Zhang, Liang Chen, Haibin Zhu, Garry Wong","doi":"10.1186/s12859-025-06068-6","DOIUrl":"10.1186/s12859-025-06068-6","url":null,"abstract":"<p><strong>Background: </strong>Piwi-interacting RNAs (piRNAs) are well established for monitoring and protecting the genome from transposons in germline cells. Recently, numerous studies provided evidence that piRNAs also play important roles in regulating mRNA transcript levels. Despite their significant role in regulating cellular RNA levels, the piRNA targeting rules are not well defined, especially in mammals, which poses obstacles to the elucidation of piRNA function.</p><p><strong>Results: </strong>Given the complexity and current limitation in understanding the mammalian piRNA targeting rules, we designed a deep learning model by selecting appropriate deep learning sub-networks based on the targeting patterns of piRNA inferred from previous experiments. Additionally, to alleviate the problem of insufficient data, a transfer learning approach was employed. Our model achieves a good discriminatory power (Accuracy: 98.5%) in predicting an independent test dataset. Finally, this model was utilized to predict the targets of all mouse and human piRNAs available in the piRNA database.</p><p><strong>Conclusions: </strong>In this research, we developed a deep learning framework that significantly advances the prediction of piRNA targets, overcoming the limitations posed by insufficient data and current incomplete targeting rules. The piRNA target prediction network and results can be downloaded from https://github.com/SofiaTianjiaoZhang/piRNATarget .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"50"},"PeriodicalIF":2.9,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11817350/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143398020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信