Statistical Applications in Genetics and Molecular Biology最新文献

Empirically adjusted fixed-effects meta-analysis methods in genomic studies. 基因组研究中的经验调整固定效应荟萃分析方法。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2024-09-30 eCollection Date: 2024-01-01 DOI: 10.1515/sagmb-2023-0041

Wimarsha T Jayanetti, Sinjini Sikdar

{"title":"Empirically adjusted fixed-effects meta-analysis methods in genomic studies.","authors":"Wimarsha T Jayanetti, Sinjini Sikdar","doi":"10.1515/sagmb-2023-0041","DOIUrl":"10.1515/sagmb-2023-0041","url":null,"abstract":"In recent years, meta-analyzing summary results from multiple studies has become a common practice in genomic research, leading to a significant improvement in the power of statistical detection compared to an individual genomic study. Meta analysis methods that combine statistical estimates across studies are known to be statistically more powerful than those combining statistical significance measures. An approach combining effect size estimates based on a fixed-effects model, called METAL, has gained extreme popularity to perform the former type of meta-analysis. In this article, we discuss the limitations of METAL due to its dependence on the theoretical null distribution, leading to incorrect significance testing results. Through various simulation studies and real genomic data application, we show how modifying the z-scores in METAL, using an empirical null distribution, can significantly improve the results, especially in presence of hidden confounders. For the estimation of the null distribution, we consider two different approaches, and we highlight the scenarios when one null estimation approach outperforms the other. This article will allow researchers to gain an insight into the importance of using an empirical null distribution in the fixed-effects meta-analysis as well as in choosing the appropriate empirical null distribution estimation approach.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"23 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142331020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A CNN-CBAM-BIGRU model for protein function prediction. 用于蛋白质功能预测的 CNN-CBAM-BIGRU 模型。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2024-07-01 eCollection Date: 2024-01-01 DOI: 10.1515/sagmb-2024-0004

Lavkush Sharma, Akshay Deepak, Ashish Ranjan, Gopalakrishnan Krishnasamy

{"title":"A CNN-CBAM-BIGRU model for protein function prediction.","authors":"Lavkush Sharma, Akshay Deepak, Ashish Ranjan, Gopalakrishnan Krishnasamy","doi":"10.1515/sagmb-2024-0004","DOIUrl":"https://doi.org/10.1515/sagmb-2024-0004","url":null,"abstract":"Understanding a protein's function based solely on its amino acid sequence is a crucial but intricate task in bioinformatics. Traditionally, this challenge has proven difficult. However, recent years have witnessed the rise of deep learning as a powerful tool, achieving significant success in protein function prediction. Their strength lies in their ability to automatically learn informative features from protein sequences, which can then be used to predict the protein's function. This study builds upon these advancements by proposing a novel model: CNN-CBAM+BiGRU. It incorporates a Convolutional Block Attention Module (CBAM) alongside BiGRUs. CBAM acts as a spotlight, guiding the CNN to focus on the most informative parts of the protein data, leading to more accurate feature extraction. BiGRUs, a type of Recurrent Neural Network (RNN), excel at capturing long-range dependencies within the protein sequence, which are essential for accurate function prediction. The proposed model integrates the strengths of both CNN-CBAM and BiGRU. This study's findings, validated through experimentation, showcase the effectiveness of this combined approach. For the human dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +1.0 % for cellular components, +1.1 % for molecular functions, and +0.5 % for biological processes. For the yeast dataset, the suggested method outperforms the CNN-BIGRU+ATT model by +2.4 % for the cellular component, +1.2 % for molecular functions, and +0.6 % for biological processes.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"23 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141471963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A heavy-tailed model for analyzing miRNA-seq raw read counts. 用于分析 miRNA-seq 原始读数的重尾模型。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2024-05-29 eCollection Date: 2024-01-01 DOI: 10.1515/sagmb-2023-0016

Annika Krutto, Therese Haugdahl Nøst, Magne Thoresen

引用次数: 0

Flexible model-based non-negative matrix factorization with application to mutational signatures. 基于模型的灵活非负矩阵因式分解，应用于突变特征。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2024-05-16 eCollection Date: 2024-01-01 DOI: 10.1515/sagmb-2023-0034

Ragnhild Laursen, Lasse Maretty, Asger Hobolth

{"title":"Flexible model-based non-negative matrix factorization with application to mutational signatures.","authors":"Ragnhild Laursen, Lasse Maretty, Asger Hobolth","doi":"10.1515/sagmb-2023-0034","DOIUrl":"10.1515/sagmb-2023-0034","url":null,"abstract":"Somatic mutations in cancer can be viewed as a mixture distribution of several mutational signatures, which can be inferred using non-negative matrix factorization (NMF). Mutational signatures have previously been parametrized using either simple mono-nucleotide interaction models or general tri-nucleotide interaction models. We describe a flexible and novel framework for identifying biologically plausible parametrizations of mutational signatures, and in particular for estimating di-nucleotide interaction models. Our novel estimation procedure is based on the expectation-maximization (EM) algorithm and regression in the log-linear quasi-Poisson model. We show that di-nucleotide interaction signatures are statistically stable and sufficiently complex to fit the mutational patterns. Di-nucleotide interaction signatures often strike the right balance between appropriately fitting the data and avoiding over-fitting. They provide a better fit to data and are biologically more plausible than mono-nucleotide interaction signatures, and the parametrization is more stable than the parameter-rich tri-nucleotide interaction signatures. We illustrate our framework in a large simulation study where we compare to state of the art methods, and show results for three data sets of somatic mutation counts from patients with cancer in the breast, Liver and urinary tract.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"23 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140945949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Choice of baseline hazards in joint modeling of longitudinal and time-to-event cancer survival data. 癌症生存数据纵向和时间到事件联合建模中基线危害的选择。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2024-05-13 eCollection Date: 2024-01-01 DOI: 10.1515/sagmb-2023-0038

Anand Hari, Edakkalathoor George Jinto, Divya Dennis, Kumarapillai Mohanan Nair Jagathnath Krishna, Preethi S George, Sivasevan Roshni, Aleyamma Mathew

{"title":"Choice of baseline hazards in joint modeling of longitudinal and time-to-event cancer survival data.","authors":"Anand Hari, Edakkalathoor George Jinto, Divya Dennis, Kumarapillai Mohanan Nair Jagathnath Krishna, Preethi S George, Sivasevan Roshni, Aleyamma Mathew","doi":"10.1515/sagmb-2023-0038","DOIUrl":"https://doi.org/10.1515/sagmb-2023-0038","url":null,"abstract":"Longitudinal time-to-event analysis is a statistical method to analyze data where covariates are measured repeatedly. In survival studies, the risk for an event is estimated using Cox-proportional hazard model or extended Cox-model for exogenous time-dependent covariates. However, these models are inappropriate for endogenous time-dependent covariates like longitudinally measured biomarkers, Carcinoembryonic Antigen (CEA). Joint models that can simultaneously model the longitudinal covariates and time-to-event data have been proposed as an alternative. The present study highlights the importance of choosing the baseline hazards to get more accurate risk estimation. The study used colon cancer patient data to illustrate and compare four different joint models which differs based on the choice of baseline hazards [piecewise-constant Gauss-Hermite (GH), piecewise-constant pseudo-adaptive GH, Weibull Accelerated Failure time model with GH & B-spline GH]. We conducted simulation study to assess the model consistency with varying sample size (N = 100, 250, 500) and censoring (20 %, 50 %, 70 %) proportions. In colon cancer patient data, based on Akaike information criteria (AIC) and Bayesian information criteria (BIC), piecewise-constant pseudo-adaptive GH was found to be the best fitted model. Despite differences in model fit, the hazards obtained from the four models were similar. The study identified composite stage as a prognostic factor for time-to-event and the longitudinal outcome, CEA as a dynamic predictor for overall survival in colon cancer patients. Based on the simulation study Piecewise-PH-aGH was found to be the best model with least AIC and BIC values, and highest coverage probability(CP). While the Bias, and RMSE for all the models showed a competitive performance. However, Piecewise-PH-aGH has shown least bias and RMSE in most of the combinations and has taken the shortest computation time, which shows its computational efficiency. This study is the first of its kind to discuss on the choice of baseline hazards.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"23 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140913025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Assessing the feasibility of statistical inference using synthetic antibody-antigen datasets. 评估使用合成抗体抗原数据集进行统计推断的可行性。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2024-04-03 eCollection Date: 2024-01-01 DOI: 10.1515/sagmb-2023-0027

Thomas Minotto, Philippe A Robert, Ingrid Hobæk Haff, Geir K Sandve

{"title":"Assessing the feasibility of statistical inference using synthetic antibody-antigen datasets.","authors":"Thomas Minotto, Philippe A Robert, Ingrid Hobæk Haff, Geir K Sandve","doi":"10.1515/sagmb-2023-0027","DOIUrl":"10.1515/sagmb-2023-0027","url":null,"abstract":"Simulation frameworks are useful to stress-test predictive models when data is scarce, or to assert model sensitivity to specific data distributions. Such frameworks often need to recapitulate several layers of data complexity, including emergent properties that arise implicitly from the interaction between simulation components. Antibody-antigen binding is a complex mechanism by which an antibody sequence wraps itself around an antigen with high affinity. In this study, we use a synthetic simulation framework for antibody-antigen folding and binding on a 3D lattice that include full details on the spatial conformation of both molecules. We investigate how emergent properties arise in this framework, in particular the physical proximity of amino acids, their presence on the binding interface, or the binding status of a sequence, and relate that to the individual and pairwise contributions of amino acids in statistical models for binding prediction. We show that weights learnt from a simple logistic regression model align with some but not all features of amino acids involved in the binding, and that predictive sequence binding patterns can be enriched. In particular, main effects correlated with the capacity of a sequence to bind any antigen, while statistical interactions were related to sequence specificity.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"23 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140337377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A global test of hybrid ancestry from genome-scale data. 从基因组尺度数据对杂交血统进行全球测试。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2024-02-19 eCollection Date: 2024-01-01 DOI: 10.1515/sagmb-2022-0061

Md Rejuan Haque, Laura Kubatko

{"title":"A global test of hybrid ancestry from genome-scale data.","authors":"Md Rejuan Haque, Laura Kubatko","doi":"10.1515/sagmb-2022-0061","DOIUrl":"10.1515/sagmb-2022-0061","url":null,"abstract":"Methods based on the multi-species coalescent have been widely used in phylogenetic tree estimation using genome-scale DNA sequence data to understand the underlying evolutionary relationship between the sampled species. Evolutionary processes such as hybridization, which creates new species through interbreeding between two different species, necessitate inferring a species network instead of a species tree. A species tree is strictly bifurcating and thus fails to incorporate hybridization events which require an internal node of degree three. Hence, it is crucial to decide whether a tree or network analysis should be performed given a DNA sequence data set, a decision that is based on the presence of hybrid species in the sampled species. Although many methods have been proposed for hybridization detection, it is rare to find a technique that does so globally while considering a data generation mechanism that allows both hybridization and incomplete lineage sorting. In this paper, we consider hybridization and coalescence in a unified framework and propose a new test that can detect whether there are any hybrid species in a set of species of arbitrary size. Based on this global test of hybridization, one can decide whether a tree or network analysis is appropriate for a given data set.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"23 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2024-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139747669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Integrative pathway analysis with gene expression, miRNA, methylation and copy number variation for breast cancer subtypes. 利用基因表达、miRNA、甲基化和拷贝数变异对乳腺癌亚型进行整合通路分析。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2024-02-19 eCollection Date: 2024-01-01 DOI: 10.1515/sagmb-2019-0050

Henry Linder, Yuping Zhang, Yunqi Wang, Zhengqing Ouyang

引用次数: 0

Bayesian LASSO for population stratification correction in rare haplotype association studies. 贝叶斯 LASSO 用于稀有单倍型关联研究中的人群分层校正。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2024-01-19 eCollection Date: 2024-01-01 DOI: 10.1515/sagmb-2022-0034

Zilu Liu, Asuman Seda Turkmen, Shili Lin

{"title":"Bayesian LASSO for population stratification correction in rare haplotype association studies.","authors":"Zilu Liu, Asuman Seda Turkmen, Shili Lin","doi":"10.1515/sagmb-2022-0034","DOIUrl":"10.1515/sagmb-2022-0034","url":null,"abstract":"Population stratification (PS) is one major source of confounding in both single nucleotide polymorphism (SNP) and haplotype association studies. To address PS, principal component regression (PCR) and linear mixed model (LMM) are the current standards for SNP associations, which are also commonly borrowed for haplotype studies. However, the underfitting and overfitting problems introduced by PCR and LMM, respectively, have yet to be addressed. Furthermore, there have been only a few theoretical approaches proposed to address PS specifically for haplotypes. In this paper, we propose a new method under the Bayesian LASSO framework, QBLstrat, to account for PS in identifying rare and common haplotypes associated with a continuous trait of interest. QBLstrat utilizes a large number of principal components (PCs) with appropriate priors to sufficiently correct for PS, while shrinking the estimates of unassociated haplotypes and PCs. We compare the performance of QBLstrat with the Bayesian counterparts of PCR and LMM and a current method, haplo.stats. Extensive simulation studies and real data analyses show that QBLstrat is superior in controlling false positives while maintaining competitive power for identifying true positives under PS.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"23 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10794901/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139486664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mediation analysis method review of high throughput data. 高通量数据的中介分析方法综述。

IF 0.9 4区数学

Statistical Applications in Genetics and Molecular Biology Pub Date : 2023-11-29 eCollection Date: 2023-01-01 DOI: 10.1515/sagmb-2023-0031

Qiang Han, Yu Wang, Na Sun, Jiadong Chu, Wei Hu, Yueping Shen

{"title":"Mediation analysis method review of high throughput data.","authors":"Qiang Han, Yu Wang, Na Sun, Jiadong Chu, Wei Hu, Yueping Shen","doi":"10.1515/sagmb-2023-0031","DOIUrl":"10.1515/sagmb-2023-0031","url":null,"abstract":"High-throughput technologies have made high-dimensional settings increasingly common, providing opportunities for the development of high-dimensional mediation methods. We aimed to provide useful guidance for researchers using high-dimensional mediation analysis and ideas for biostatisticians to develop it by summarizing and discussing recent advances in high-dimensional mediation analysis. The method still faces many challenges when extended single and multiple mediation analyses to high-dimensional settings. The development of high-dimensional mediation methods attempts to address these issues, such as screening true mediators, estimating mediation effects by variable selection, reducing the mediation dimension to resolve correlations between variables, and utilizing composite null hypothesis testing to test them. Although these problems regarding high-dimensional mediation have been solved to some extent, some challenges remain. First, the correlation between mediators are rarely considered when the variables are selected for mediation. Second, downscaling without incorporating prior biological knowledge makes the results difficult to interpret. In addition, a method of sensitivity analysis for the strict sequential ignorability assumption in high-dimensional mediation analysis is still lacking. An analyst needs to consider the applicability of each method when utilizing them, while a biostatistician could consider extensions and improvements in the methodology.","PeriodicalId":49477,"journal":{"name":"Statistical Applications in Genetics and Molecular Biology","volume":"22 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138452936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0