Bioinformatics advances最新文献

筛选
英文 中文
RNAelem: an algorithm for discovering sequence-structure motifs in RNA bound by RNA-binding proteins. RNAelem:一种发现由 RNA 结合蛋白结合的 RNA 中序列结构图案的算法。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-28 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae144
Hiroshi Miyake, Risa Karakida Kawaguchi, Hisanori Kiryu
{"title":"RNAelem: an algorithm for discovering sequence-structure motifs in RNA bound by RNA-binding proteins.","authors":"Hiroshi Miyake, Risa Karakida Kawaguchi, Hisanori Kiryu","doi":"10.1093/bioadv/vbae144","DOIUrl":"https://doi.org/10.1093/bioadv/vbae144","url":null,"abstract":"<p><strong>Motivation: </strong>RNA-binding proteins (RBPs) play a crucial role in the post-transcriptional regulation of RNA. Given their importance, analyzing the specific RNA patterns recognized by RBPs has become a significant research focus in bioinformatics. Deep Neural Networks have enhanced the accuracy of prediction for RBP-binding sites, yet understanding the structural basis of RBP-binding specificity from these models is challenging due to their limited interpretability. To address this, we developed RNAelem, which combines profile context-free grammar and the Turner energy model for RNA secondary structure to predict sequence-structure motifs in RBP-binding regions.</p><p><strong>Results: </strong>RNAelem exhibited superior detection accuracy compared to existing tools for RNA sequences with structural motifs. Upon applying RNAelem to the eCLIP database, we were not only able to reproduce many known primary sequence motifs in the absence of secondary structures, but also discovered many secondary structural motifs that contained sequence-nonspecific insertion regions. Furthermore, the high interpretability of RNAelem yielded insightful findings such as long-range base-pairing interactions in the binding region of the U2AF protein.</p><p><strong>Availability and implementation: </strong>The code is available at https://github.com/iyak/RNAelem.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae144"},"PeriodicalIF":2.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471262/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482212","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations. FAVOR-GPT:全基因组变异功能注释的自然语言生成界面。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-28 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae143
Thomas Cheng Li, Hufeng Zhou, Vineet Verma, Xiangru Tang, Yanjun Shao, Eric Van Buren, Zhiping Weng, Mark Gerstein, Benjamin Neale, Shamil R Sunyaev, Xihong Lin
{"title":"FAVOR-GPT: a generative natural language interface to whole genome variant functional annotations.","authors":"Thomas Cheng Li, Hufeng Zhou, Vineet Verma, Xiangru Tang, Yanjun Shao, Eric Van Buren, Zhiping Weng, Mark Gerstein, Benjamin Neale, Shamil R Sunyaev, Xihong Lin","doi":"10.1093/bioadv/vbae143","DOIUrl":"10.1093/bioadv/vbae143","url":null,"abstract":"<p><strong>Motivation: </strong>Functional Annotation of genomic Variants Online Resources (FAVOR) offers multi-faceted, whole genome variant functional annotations, which is essential for Whole Genome and Exome Sequencing (WGS/WES) analysis and the functional prioritization of disease-associated variants. A versatile chatbot designed to facilitate informative interpretation and interactive, user-centric summary of the whole genome variant functional annotation data in the FAVOR database is needed.</p><p><strong>Results: </strong>We have developed FAVOR-GPT, a generative natural language interface powered by integrating large language models (LLMs) and FAVOR. It is developed based on the Retrieval Augmented Generation (RAG) approach, and complements the original FAVOR portal, enhancing usability for users, especially those without specialized expertise. FAVOR-GPT simplifies raw annotations by providing interpretable explanations and result summaries in response to the user's prompt. It shows high accuracy when cross-referencing with the FAVOR database, underscoring the robustness of the retrieval framework.</p><p><strong>Availability and implementation: </strong>Researchers can access FAVOR-GPT at FAVOR's main website (https://favor.genohub.org).</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae143"},"PeriodicalIF":2.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461909/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142397001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Meeting the challenge of genomic analysis: a collaboratively developed workshop for pangenomics and topological data analysis. 迎接基因组分析的挑战:合作开发的泛基因组学和拓扑数据分析研讨会。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-27 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae139
Haydeé Contreras-Peruyero, Shaday Guerrero-Flores, Claudia Zirión-Martínez, Paulina M Mejía-Ponce, Marisol Navarro-Miranda, J Abel Lovaco-Flores, José M Ibarra-Rodríguez, Anton Pashkov, Cuauhtémoc Licona-Cassani, Nelly Sélem-Mojica
{"title":"Meeting the challenge of genomic analysis: a collaboratively developed workshop for pangenomics and topological data analysis.","authors":"Haydeé Contreras-Peruyero, Shaday Guerrero-Flores, Claudia Zirión-Martínez, Paulina M Mejía-Ponce, Marisol Navarro-Miranda, J Abel Lovaco-Flores, José M Ibarra-Rodríguez, Anton Pashkov, Cuauhtémoc Licona-Cassani, Nelly Sélem-Mojica","doi":"10.1093/bioadv/vbae139","DOIUrl":"10.1093/bioadv/vbae139","url":null,"abstract":"<p><strong>Motivation: </strong>As genomics data analysis becomes increasingly intricate, researchers face the challenge of mastering various software tools. The rise of Pangenomics analysis, which examines the complete set of genes in a group of genomes, is particularly transformative in understanding genetic diversity. Our interdisciplinary team of biologists and mathematicians developed a short Pangenomics Workshop covering Bash, Python scripting, Pangenome, and Topological Data Analysis. These skills provide deeper insights into genetic variations and their implications in Evolutionary Biology. The workshop uses a Conda environment for reproducibility and accessibility. Developed in The Carpentries Incubator infrastructure, the workshop aims to equip researchers with essential skills for Pangenomics research. By emphasizing the role of a community of practice, this work underscores its significance in empowering multidisciplinary professionals to collaboratively develop training that adheres to best practices.</p><p><strong>Results: </strong>Our workshop delivers tangible outcomes by enhancing the skill sets of Computational Biology professionals. Participants gain hands-on experience using real data from the first described pangenome. We share our paths toward creating an open-source, multidisciplinary, and public resource where learners can develop expertise in Pangenomic Analysis. This initiative goes beyond advancing individual capabilities, aligning with the broader mission of addressing educational needs in Computational Biology.</p><p><strong>Availability and implementation: </strong>https://carpentries-incubator.github.io/pangenomics-workshop/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae139"},"PeriodicalIF":2.4,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11525208/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142559613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How pairs of insertion mutations impact protein structure: an exhaustive computational study. 插入突变对如何影响蛋白质结构:一个详尽的计算研究。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-27 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae138
Changrui Li, Yang Zheng, Filip Jagodzinski
{"title":"How pairs of insertion mutations impact protein structure: an exhaustive computational study.","authors":"Changrui Li, Yang Zheng, Filip Jagodzinski","doi":"10.1093/bioadv/vbae138","DOIUrl":"10.1093/bioadv/vbae138","url":null,"abstract":"<p><strong>Summary: </strong>Understanding how amino acid insertion mutations affect protein structure can inform pharmaceutical efforts targeting diseases that are caused by protein mutants. <i>In silico</i> simulation of mutations complements experiments performed on physical proteins which are time and cost prohibitive. We have computationally generated the exhaustive sets of two amino acid insertion mutations for five protein structures in the Protein Data Bank. To probe and identify how pairs of insertions affect structural stability and flexibility, we tally the count of hydrogen bonds and analyze a variety of metrics of each mutant. We identify hotspots where pairs of insertions have a pronounced effect, and study how amino acid properties such as size and type, and insertion into alpha helices, affect a protein's structure. The findings show that although there are some residues, Proline and Tryptophan specifically, which if inserted have a significant impact on the protein's structure, there is also a great deal of variance in the effects of the exhaustive insertions both for any single protein, and across the five proteins. That suggests that computational or otherwise quantitative efforts should consider large representative sample sizes especially when training models to make predictions about the effects of insertions.</p><p><strong>Availability and implementation: </strong>The data underlying this article is available at https://multimute.cs.wwu.edu.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae138"},"PeriodicalIF":2.4,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11639182/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142831007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chronogram: an R package for data curation and analysis of infection and vaccination cohort studies. Chronogram:用于感染和疫苗接种队列研究数据整理和分析的 R 软件包。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-27 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae146
David Greenwood, Marianne Shawe-Taylor, Hermaleigh Townsley, Joshua Gahir, Nikita Sahadeo, Yakubu Alhassan, Charlotte Chaloner, Oliver Galgut, Gavin Kelly, David L V Bauer, Emma C Wall, Mary Y Wu, Edward J Carr
{"title":"Chronogram: an R package for data curation and analysis of infection and vaccination cohort studies.","authors":"David Greenwood, Marianne Shawe-Taylor, Hermaleigh Townsley, Joshua Gahir, Nikita Sahadeo, Yakubu Alhassan, Charlotte Chaloner, Oliver Galgut, Gavin Kelly, David L V Bauer, Emma C Wall, Mary Y Wu, Edward J Carr","doi":"10.1093/bioadv/vbae146","DOIUrl":"https://doi.org/10.1093/bioadv/vbae146","url":null,"abstract":"<p><strong>Motivation: </strong>Observational cohort studies that track vaccine and infection responses offer real-world data to inform pandemic policy. Translating biological hypotheses, such as whether different patterns of accumulated antigenic exposures confer differing antibody responses, into analysis code can be onerous, particularly when source data is dis-aggregated.</p><p><strong>Results: </strong>The R package chronogram introduces the class chronogram, where metadata is seamlessly aggregated with sparse infection episode, clinical and laboratory data. Each experimental modality is added sequentially, allowing the incorporation of new data, such as specialized time-consuming research assays, or their downstream analyses. Source data can be any rectangular data format, including database tables (such as structured query language databases). This supports annotations that aggregate data types/sources, for example, combining symptoms, molecular testing, and sequencing of one or more infectious episodes in a pathogen-agnostic manner. Chronogram arranges observational data to allow the translation of biological hypotheses into their corresponding code via a shared vocabulary.</p><p><strong>Availability and implementation: </strong>Chronogram is implemented R and available under an MIT licence at: https://www.github.com/FrancisCrickInstitute/chronogram<b>;</b> a user manual is available at: https://franciscrickinstitute.github.io/chronogram/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae146"},"PeriodicalIF":2.4,"publicationDate":"2024-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11470235/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482208","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VCAb: a web-tool for structure-guided exploration of antibodies. VCAb:结构引导下的抗体探索网络工具。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-20 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae137
Dongjun Guo, Joseph Chi-Fung Ng, Deborah K Dunn-Walters, Franca Fraternali
{"title":"VCAb: a web-tool for structure-guided exploration of antibodies.","authors":"Dongjun Guo, Joseph Chi-Fung Ng, Deborah K Dunn-Walters, Franca Fraternali","doi":"10.1093/bioadv/vbae137","DOIUrl":"https://doi.org/10.1093/bioadv/vbae137","url":null,"abstract":"<p><strong>Motivation: </strong>Effective responses against immune challenges require antibodies of different isotypes performing specific effector functions. Structural information on these isotypes is essential to engineer antibodies with desired physico-chemical features of their antigen-binding properties, and optimal developability as potential therapeutics. <i>In silico</i> mutational scanning profiles on antibody structures would further pinpoint candidate mutations for enhancing antibody stability and function. Current antibody structure databases lack consistent annotations of isotypes and structural coverage of 3D antibody structures, as well as computed deep mutation profiles.</p><p><strong>Results: </strong>The <i>V</i> and <i>C</i> region bearing <i>a</i>nti<i>b</i>ody (VCAb) web-tool is established to clarify these annotations and provides an accessible resource to facilitate antibody engineering and design. VCAb currently provides data on 7,166 experimentally determined antibody structures including both V and C regions from different species. Additionally, VCAb provides annotations of species and isotypes with numbering schemes applied. These information can be interactively queried or downloaded in batch.</p><p><strong>Availability and implementation: </strong>VCAb is implemented as a R shiny application to enable interactive data interrogation. The online application is freely accessible https://fraternalilab.cs.ucl.ac.uk/VCAb/. The source code to generate the database and the online application is available open-source at https://github.com/Fraternalilab/VCAb.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae137"},"PeriodicalIF":2.4,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11471263/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DECOMICS, a shiny application for unsupervised cell type deconvolution and biological interpretation of bulk omic data. DECOMICS 是一款闪亮的应用程序,用于对大量 omic 数据进行无监督细胞类型解卷积和生物学解释。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-20 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae136
Slim Karkar, Ashwini Sharma, Carl Herrmann, Yuna Blum, Magali Richard
{"title":"DECOMICS, a shiny application for unsupervised cell type deconvolution and biological interpretation of bulk omic data.","authors":"Slim Karkar, Ashwini Sharma, Carl Herrmann, Yuna Blum, Magali Richard","doi":"10.1093/bioadv/vbae136","DOIUrl":"https://doi.org/10.1093/bioadv/vbae136","url":null,"abstract":"<p><strong>Summary: </strong>Unsupervised deconvolution algorithms are often used to estimate cell composition from bulk tissue samples. However, applying cell-type deconvolution and interpreting the results remain a challenge, even more without prior training in bioinformatics. Here, we propose a tool for estimating and identifying cell type composition from bulk transcriptomes or methylomes. DECOMICS is a shiny-web application dedicated to unsupervised deconvolution approaches of bulk omic data. It provides (i) a variety of existing algorithms to perform deconvolution on the gene expression or methylation-level matrix, (ii) an enrichment analysis module to aid biological interpretation of the deconvolved components, based on enrichment analysis, and (iii) some visualization tools. Input data can be downloaded in csv format and preprocessed in the web application (normalization, transformation, and feature selection). The results of the deconvolution, enrichment, and visualization processes can be downloaded.</p><p><strong>Availability and implementation: </strong>DECOMICS is an R-shiny web application that can be launched (i) directly from a local R session using the R package available here: https://gitlab.in2p3.fr/Magali.Richard/decomics (either by installing it locally or via a virtual machine and a Docker image that we provide); or (ii) in the Biosphere-IFB Clouds Federation for Life Science, a multi-cloud environment scalable for high-performance computing: https://biosphere.france-bioinformatique.fr/catalogue/appliance/193/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae136"},"PeriodicalIF":2.4,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11479579/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigation of protein family relationships with deep learning. 利用深度学习研究蛋白质家族关系。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-18 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae132
Irina Ponamareva, Antonina Andreeva, Maxwell L Bileschi, Lucy Colwell, Alex Bateman
{"title":"Investigation of protein family relationships with deep learning.","authors":"Irina Ponamareva, Antonina Andreeva, Maxwell L Bileschi, Lucy Colwell, Alex Bateman","doi":"10.1093/bioadv/vbae132","DOIUrl":"https://doi.org/10.1093/bioadv/vbae132","url":null,"abstract":"<p><strong>Motivation: </strong>In this article, we propose a method for finding similarities between Pfam families based on the pre-trained neural network ProtENN2. We use the model ProtENN2 per-residue embeddings to produce new high-dimensional per-family embeddings and develop an approach for calculating inter-family similarity scores based on these embeddings, and evaluate its predictions using structure comparison.</p><p><strong>Results: </strong>We apply our method to Pfam annotation by refining clan membership for Pfam families, suggesting both new members of existing clans and potential new clans for future Pfam releases. We investigate some of the failure modes of our approach, which suggests directions for future improvements. Our method is relatively simple with few parameters and could be applied to other protein family classification models. Overall, our work suggests potential benefits of employing deep learning for improving our understanding of protein family relationships and functions of previously uncharacterized families.</p><p><strong>Availability and implementation: </strong>github.com/iponamareva/ProtCNNSim, 10.5281/zenodo.10091909.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae132"},"PeriodicalIF":2.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11467057/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142482211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EmbedGEM: a framework to evaluate the utility of embeddings for genetic discovery. EmbedGEM:一个评估嵌入在基因发现中的效用的框架。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-17 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae135
Sumit Mukherjee, Zachary R McCaw, Jingwen Pei, Anna Merkoulovitch, Tom Soare, Raghav Tandon, David Amar, Hari Somineni, Christoph Klein, Santhosh Satapati, David Lloyd, Christopher Probert, Daphne Koller, Colm O'Dushlaine, Theofanis Karaletsos
{"title":"EmbedGEM: a framework to evaluate the utility of embeddings for genetic discovery.","authors":"Sumit Mukherjee, Zachary R McCaw, Jingwen Pei, Anna Merkoulovitch, Tom Soare, Raghav Tandon, David Amar, Hari Somineni, Christoph Klein, Santhosh Satapati, David Lloyd, Christopher Probert, Daphne Koller, Colm O'Dushlaine, Theofanis Karaletsos","doi":"10.1093/bioadv/vbae135","DOIUrl":"10.1093/bioadv/vbae135","url":null,"abstract":"<p><strong>Summary: </strong>Machine learning-derived embeddings are a compressed representation of high content data modalities. Embeddings can capture detailed information about disease states and have been qualitatively shown to be useful in genetic discovery. Despite their promise, embeddings have a major limitation: it is unclear if genetic variants associated with embeddings are relevant to the disease or trait of interest. In this work, we describe EmbedGEM (<b>Embed</b>ding <b>G</b>enetic <b>E</b>valuation <b>M</b>ethods), a framework to systematically evaluate the utility of embeddings in genetic discovery. EmbedGEM focuses on comparing embeddings along two axes: heritability and disease relevance. As measures of heritability, we consider the number of genome-wide significant associations and the mean <math> <mrow> <mrow> <msup><mrow><mo>χ</mo></mrow> <mn>2</mn></msup> </mrow> </mrow> </math> statistic at significant loci. For disease relevance, we compute polygenic risk scores for each embedding principal component, then evaluate their association with high-confidence disease or trait labels in a held-out evaluation patient set. While our development of EmbedGEM is motivated by embeddings, the approach is generally applicable to multivariate traits and can readily be extended to accommodate additional metrics along the evaluation axes. We demonstrate EmbedGEM's utility by evaluating embeddings and multivariate traits in two separate datasets: (i) a synthetic dataset simulated to demonstrate the ability of the framework to correctly rank traits based on their heritability and disease relevance and (ii) a real data from the UK Biobank, including metabolic and liver-related traits. Importantly, we show that greater disease relevance does not automatically follow from greater heritability.</p><p><strong>Availability and implementation: </strong>https://github.com/insitro/EmbedGEM.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae135"},"PeriodicalIF":2.4,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11632179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142815225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Batch-effect correction in single-cell RNA sequencing data using JIVE. 利用 JIVE 对单细胞 RNA 测序数据进行批次效应校正。
IF 2.4
Bioinformatics advances Pub Date : 2024-09-13 eCollection Date: 2024-01-01 DOI: 10.1093/bioadv/vbae134
Joseph Hastings, Donghyung Lee, Michael J O'Connell
{"title":"Batch-effect correction in single-cell RNA sequencing data using JIVE.","authors":"Joseph Hastings, Donghyung Lee, Michael J O'Connell","doi":"10.1093/bioadv/vbae134","DOIUrl":"10.1093/bioadv/vbae134","url":null,"abstract":"<p><strong>Motivation: </strong>In single-cell RNA sequencing analysis, addressing batch effects-technical artifacts stemming from factors such as varying sequencing technologies, equipment, and capture times-is crucial. These factors can cause unwanted variation and obfuscate the underlying biological signal of interest. The joint and individual variation explained (JIVE) method can be used to extract shared biological patterns from multi-source sequencing data while adjusting for individual non-biological variations (i.e. batch effect). However, its current implementation is originally designed for bulk sequencing data, making it computationally infeasible for large-scale single-cell sequencing datasets.</p><p><strong>Results: </strong>In this study, we enhance JIVE for large-scale single-cell data by boosting its computational efficiency. Additionally, we introduce a novel application of JIVE for batch-effect correction on multiple single-cell sequencing datasets. Our enhanced method aims to decompose single-cell sequencing datasets into a joint structure capturing the true biological variability and individual structures, which capture technical variability within each batch. This joint structure is then suitable for use in downstream analyses. We benchmarked the results against four popular tools, Seurat v5, Harmony, LIGER, and Combat-seq, which were developed for this purpose. JIVE performed best in terms of preserving cell-type effects and in scenarios in which the batch sizes are balanced.</p><p><strong>Availability and implementation: </strong>The JIVE implementation used for this analysis can be found at https://github.com/oconnell-statistics-lab/scJIVE.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"4 1","pages":"vbae134"},"PeriodicalIF":2.4,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11461915/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142395682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信