April Shen, Marcos Casado Barbero, Baron Koylass, Kirill Tsukanov, Tim Cezard, Thomas M Keane
{"title":"CMAT: ClinVar Mapping and Annotation Toolkit","authors":"April Shen, Marcos Casado Barbero, Baron Koylass, Kirill Tsukanov, Tim Cezard, Thomas M Keane","doi":"10.1093/bioadv/vbae018","DOIUrl":"https://doi.org/10.1093/bioadv/vbae018","url":null,"abstract":"\u0000 \u0000 \u0000 Semantic ontology mapping of clinical descriptors with disease outcome is essential. ClinVar is a key resource for human variation with known clinical significance. We present CMAT, a software toolkit and curation protocol for accurately enriching ClinVar releases with disease ontology associations and complex functional consequences.\u0000 \u0000 \u0000 \u0000 The software and ontology mappings can be obtained from: https://github.com/EBIvariation/CMAT\u0000 \u0000 \u0000 \u0000 Supplementary data are available at Bioinformatics Advances online.\u0000","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139854953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Joint Regression Analysis of Multiple Traits Based on Genetic Relationships","authors":"Ann-Sophie Buchardt, Xiang Zhou, Claus Thorn Ekstrøm","doi":"10.1093/bioadv/vbad192","DOIUrl":"https://doi.org/10.1093/bioadv/vbad192","url":null,"abstract":"\u0000 Polygenic scores (PGSs) are widely available and employed in genomic data analyses for predicting and understanding genetic architectures. We propose a novel clustering and estimation method using PGSs for inferring a genetic relationship among multiple, simultaneously measured and potentially correlated traits in a multivariate GWAS.\u0000 Using graphical lasso, we estimate a sparse covariance matrix of the PGSs and obtain clusters of traits sharing genetic characteristics. We use the clusters to specify the structure of the error covariance matrix of a generalised least squares (GLS) model and use the feasible GLS estimator for estimating a linear regression model with a certain unknown degree of correlation between the residuals.\u0000 The method suits many biology studies well with traits embedded in some genetic functioning groups and facilitates developement of the PGS research. We compare the method with fully parametric techniques on simulated data and illustrate the utility of the methods by examining a heterogeneous stock mouse data set from the Wellcome Trust Centre for Human Genetics. We demonstrate that the method successfully identifies clusters of traits and increases precision, power and computational efficiency.","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":"45 19","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139386980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Batch correction of single cell sequencing data via an autoencoder architecture","authors":"Reut Danino, I. Nachman, R. Sharan","doi":"10.1093/bioadv/vbad186","DOIUrl":"https://doi.org/10.1093/bioadv/vbad186","url":null,"abstract":"Technical differences between gene expression sequencing experiments can cause variations in the data in the form of batch effect biases. These do not represent true biological variations between samples and can lead to false conclusions, or hinder the ability to integrate multiple datasets. Since there is a growing need for the joint analysis of single cell sequencing datasets from different sources, there is also a need to correct the resulting batch effects while maintaining the true biological variations in the data. Here we develop a semi-supervised deep learning architecture called Autoencoder-based Batch Correction (ABC) for integrating single cell sequencing datasets. Our method removes batch effects through a guided process of data compression using supervised cell type classifier branches for biological signal retention. It aligns the different batches using an adversarial training approach. We comprehensively evaluate the performance of our method using four single cell sequencing datasets and multiple measures for batch effect removal and biological variation conservation. ABC outperforms ten state-of-art methods for this task including Seurat, scGen, ComBat, scanorama, scVI, scANVI, AutoClass, Harmony, scDREAMER and CLEAR, correcting various types of batch effects while preserving intricate biological variations.","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":"41 8","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139150359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BannMI deciphers potential n-to-1 information transduction in signaling pathways to unravel message of intrinsic apoptosis","authors":"Bettina Schmidt, Christine Sers, Nadja Klein","doi":"10.1093/bioadv/vbad175","DOIUrl":"https://doi.org/10.1093/bioadv/vbad175","url":null,"abstract":"Cell fate decisions, such as apoptosis or proliferation, are communicated via signaling pathways. The pathways are heavily intertwined and often consist of sequential interaction of proteins (kinases). Information integration takes place on the protein level via n-to-1 interactions. A state-of-the-art procedure to quantify information flow (edges) between signaling proteins (nodes) is network inference. However, edge weight calculation typically refers to 1-to-1 interactions only and relies on mean protein phosphorylation levels instead of single cell distributions. Information theoretic measures such as the mutual information (MI) have the potential to overcome these shortcomings but are still rarely used. This work proposes a Bayesian nearest neighbor (NN)-based MI estimator (BannMI) to quantify n-to-1 kinase dependency in signaling pathways. BannMI outperforms the state-of-the-art MI estimator on protein-like data in terms of mean squared error and Pearson correlation. Using BannMI, we analyse apoptotic signaling in phosphoproteomic cancerous and non-cancerous breast cell line data. Our work provides evidence for cooperative signaling of several kinases in programmed cell death and identifies a potential key role of the mitogen-activated protein (MAP) kinase p38.","PeriodicalId":505477,"journal":{"name":"Bioinformatics Advances","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139212849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}