NAR Genomics and Bioinformatics最新文献

筛选
英文 中文
scATAcat: cell-type annotation for scATAC-seq data. scATAcat:用于 scATAC-seq 数据的细胞类型注释。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-10-08 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae135
Aybuge Altay, Martin Vingron
{"title":"scATAcat: cell-type annotation for scATAC-seq data.","authors":"Aybuge Altay, Martin Vingron","doi":"10.1093/nargab/lqae135","DOIUrl":"https://doi.org/10.1093/nargab/lqae135","url":null,"abstract":"<p><p>Cells whose accessibility landscape has been profiled with scATAC-seq cannot readily be annotated to a particular cell type. In fact, annotating cell-types in scATAC-seq data is a challenging task since, unlike in scRNA-seq data, we lack knowledge of 'marker regions' which could be used for cell-type annotation. Current annotation methods typically translate accessibility to expression space and rely on gene expression patterns. We propose a novel approach, scATAcat, that leverages characterized bulk ATAC-seq data as prototypes to annotate scATAC-seq data. To mitigate the inherent sparsity of single-cell data, we aggregate cells that belong to the same cluster and create pseudobulk. To demonstrate the feasibility of our approach we collected a number of datasets with respective annotations to quantify the results and evaluate performance for scATAcat. scATAcat is available as a python package at https://github.com/aybugealtay/scATAcat.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae135"},"PeriodicalIF":4.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11459382/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142396992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TrajectoryGeometry suggests cell fate decisions can involve branches rather than bifurcations. 轨迹几何表明,细胞命运的决定可能涉及分支而非分叉。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-10-08 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae139
Anna Laddach, Vassilis Pachnis, Michael Shapiro
{"title":"TrajectoryGeometry suggests cell fate decisions can involve branches rather than bifurcations.","authors":"Anna Laddach, Vassilis Pachnis, Michael Shapiro","doi":"10.1093/nargab/lqae139","DOIUrl":"10.1093/nargab/lqae139","url":null,"abstract":"<p><p>Differentiation of multipotential progenitor cells is a key process in the development of any multi-cellular organism and often continues throughout its life. It is often assumed that a bi-potential progenitor develops along a (relatively) straight trajectory until it reaches a decision point where the trajectory bifurcates. At this point one of two directions is chosen, each direction representing the unfolding of a new transcriptional programme. However, we have lacked quantitative means for testing this model. Accordingly, we have developed the R package TrajectoryGeometry. Applying this to published data we find several examples where, rather than bifurcate, developmental pathways <i>branch</i>. That is, the bipotential progenitor develops along a relatively straight trajectory leading to one of its potential fates. A second relatively straight trajectory branches off from this towards the other potential fate. In this sense only cells that branch off to follow the second trajectory make a 'decision'. Our methods give precise descriptions of the genes and cellular pathways involved in these trajectories. We speculate that branching may be the more common behaviour and may have advantages from a control-theoretic viewpoint.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae139"},"PeriodicalIF":4.0,"publicationDate":"2024-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11459380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142393890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Genome wide clustering on integrated chromatin states and Micro-C contacts reveals chromatin interaction signatures. 整合染色质状态和 Micro-C 接触的全基因组聚类揭示了染色质相互作用特征。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-10-03 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae136
Corinne E Sexton, Sylvia Victor Paul, Dylan Barth, Mira V Han
{"title":"Genome wide clustering on integrated chromatin states and Micro-C contacts reveals chromatin interaction signatures.","authors":"Corinne E Sexton, Sylvia Victor Paul, Dylan Barth, Mira V Han","doi":"10.1093/nargab/lqae136","DOIUrl":"10.1093/nargab/lqae136","url":null,"abstract":"<p><p>We can now analyze 3D physical interactions of chromatin regions with chromatin conformation capture technologies, in addition to the 1D chromatin state annotations, but methods to integrate this information are lacking. We propose a method to integrate the chromatin state of interacting regions into a vector representation through the contact-weighted sum of chromatin states. Unsupervised clustering on integrated chromatin states and Micro-C contacts reveals common patterns of chromatin interaction signatures. This provides an integrated view of the complex dynamics of concurrent change occurring in chromatin state and in chromatin interaction, adding another layer of annotation beyond chromatin state or Hi-C contact separately.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae136"},"PeriodicalIF":4.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447530/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring public cancer gene expression signatures across bulk, single-cell and spatial transcriptomics data with signifinder Bioconductor package. 利用 Signifinder Bioconductor 软件包探索大量、单细胞和空间转录组学数据中的公共癌症基因表达特征。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-10-03 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae138
Stefania Pirrotta, Laura Masatti, Anna Bortolato, Anna Corrà, Fabiola Pedrini, Martina Aere, Giovanni Esposito, Paolo Martini, Davide Risso, Chiara Romualdi, Enrica Calura
{"title":"Exploring public cancer gene expression signatures across bulk, single-cell and spatial transcriptomics data with signifinder Bioconductor package.","authors":"Stefania Pirrotta, Laura Masatti, Anna Bortolato, Anna Corrà, Fabiola Pedrini, Martina Aere, Giovanni Esposito, Paolo Martini, Davide Risso, Chiara Romualdi, Enrica Calura","doi":"10.1093/nargab/lqae138","DOIUrl":"10.1093/nargab/lqae138","url":null,"abstract":"<p><p>Understanding cancer mechanisms, defining subtypes, predicting prognosis and assessing therapy efficacy are crucial aspects of cancer research. Gene-expression signatures derived from bulk gene expression data have played a significant role in these endeavors over the past decade. However, recent advancements in high-resolution transcriptomic technologies, such as single-cell RNA sequencing and spatial transcriptomics, have revealed the complex cellular heterogeneity within tumors, necessitating the development of computational tools to characterize tumor mass heterogeneity accurately. Thus we implemented signifinder, a novel R Bioconductor package designed to streamline the collection and use of cancer transcriptional signatures across bulk, single-cell, and spatial transcriptomics data. Leveraging publicly available signatures curated by signifinder, users can assess a wide range of tumor characteristics, including hallmark processes, therapy responses, and tumor microenvironment peculiarities. Through three case studies, we demonstrate the utility of transcriptional signatures in bulk, single-cell, and spatial transcriptomic data analyses, providing insights into cell-resolution transcriptional signatures in oncology. Signifinder represents a significant advancement in cancer transcriptomic data analysis, offering a comprehensive framework for interpreting high-resolution data and addressing tumor complexity.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae138"},"PeriodicalIF":4.0,"publicationDate":"2024-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11447528/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142373105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StableMate: a statistical method to select stable predictors in omics data. StableMate:一种在 omics 数据中选择稳定预测因子的统计方法。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-09-28 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae130
Yidi Deng, Jiadong Mao, Jarny Choi, Kim-Anh Lê Cao
{"title":"StableMate: a statistical method to select stable predictors in omics data.","authors":"Yidi Deng, Jiadong Mao, Jarny Choi, Kim-Anh Lê Cao","doi":"10.1093/nargab/lqae130","DOIUrl":"https://doi.org/10.1093/nargab/lqae130","url":null,"abstract":"<p><p>Identifying statistical associations between biological variables is crucial to understanding molecular mechanisms. Most association studies are based on correlation or linear regression analyses, but the identified associations often lack reproducibility and interpretability due to the complexity and variability of omics datasets, making it difficult to translate associations into meaningful biological hypotheses. We developed StableMate, a regression framework, to address these challenges through a process of variable selection across heterogeneous datasets. Given datasets from different environments, such as experimental batches, StableMate selects environment-agnostic (stable) and environment-specific predictors in predicting the response of interest. Stable predictors represent robust functional dependencies with the response, and can be used to build regression models that make generalizable predictions in unseen environments. We applied StableMate to (i) RNA sequencing data of breast cancer to discover genes that consistently predict estrogen receptor expression across disease status; (ii) metagenomics data to identify microbial signatures that show persistent association with colon cancer across study cohorts; and (iii) single-cell RNA sequencing data of glioblastoma to discern signature genes associated with the development of pro-tumour microglia regardless of cell location. Our case studies demonstrate that StableMate is adaptable to regression and classification analyses and achieves comprehensive characterization of biological systems for different omics data types.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae130"},"PeriodicalIF":4.0,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11437361/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scDAPP: a comprehensive single-cell transcriptomics analysis pipeline optimized for cross-group comparison. scDAPP:为跨组比较而优化的综合性单细胞转录组学分析管道。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-09-28 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae134
Alexander Ferrena, Xiang Yu Zheng, Kevyn Jackson, Bang Hoang, Bernice E Morrow, Deyou Zheng
{"title":"scDAPP: a comprehensive single-cell transcriptomics analysis pipeline optimized for cross-group comparison.","authors":"Alexander Ferrena, Xiang Yu Zheng, Kevyn Jackson, Bang Hoang, Bernice E Morrow, Deyou Zheng","doi":"10.1093/nargab/lqae134","DOIUrl":"10.1093/nargab/lqae134","url":null,"abstract":"<p><p>Single-cell transcriptomics profiling has increasingly been used to evaluate cross-group (or condition) differences in cell population and cell-type gene expression. This often leads to large datasets with complex experimental designs that need advanced comparative analysis. Concurrently, bioinformatics software and analytic approaches also become more diverse and constantly undergo improvement. Thus, there is an increased need for automated and standardized data processing and analysis pipelines, which should be efficient and flexible too. To address these, we develop the <b>s</b>ingle-<b>c</b>ell <b>D</b>ifferential <b>A</b>nalysis and <b>P</b>rocessing <b>P</b>ipeline (scDAPP), a R-based workflow for comparative analysis of single cell (or nucleus) transcriptomic data between two or more groups and at the levels of single cells or 'pseudobulking' samples. The pipeline automates many steps of pre-processing using data-learnt parameters, uses previously benchmarked software, and generates comprehensive intermediate data and final results that are valuable for both beginners and experts of scRNA-seq analysis. Moreover, the analytic reports, augmented by extensive data visualization, increase the transparency of computational analysis and parameter choices, while facilitate users to go seamlessly from raw data to biological interpretation. scDAPP is freely available under the MIT license, with source code, documentation and sample data at the GitHub (https://github.com/bioinfoDZ/scDAPP).</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 4","pages":"lqae134"},"PeriodicalIF":4.0,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11437360/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142336666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RNAMotifProfile: a graph-based approach to build RNA structural motif profiles. RNAMotifProfile:一种基于图谱的 RNA 结构主题图谱构建方法。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-09-26 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae128
Md Mahfuzur Rahaman, Shaojie Zhang
{"title":"RNAMotifProfile: a graph-based approach to build RNA structural motif profiles.","authors":"Md Mahfuzur Rahaman, Shaojie Zhang","doi":"10.1093/nargab/lqae128","DOIUrl":"https://doi.org/10.1093/nargab/lqae128","url":null,"abstract":"<p><p>RNA structural motifs are the recurrent segments in RNA three-dimensional structures that play a crucial role in the functional diversity of RNAs. Understanding the similarities and variations within these recurrent motif groups is essential for gaining insights into RNA structure and function. While recurrent structural motifs are generally assumed to be composed of the same isosteric base interactions, this consistent pattern is not observed across all examples of these motifs. Existing methods for analyzing and comparing RNA structural motifs may overlook variations in base interactions and associated nucleotides. RNAMotifProfile is a novel profile-to-profile alignment algorithm that generates a comprehensive profile from a group of structural motifs, incorporating all base interactions and associated nucleotides at each position. By structurally aligning input motif instances using a guide-tree-based approach, RNAMotifProfile captures the similarities and variations within recurrent motif groups. Additionally, RNAMotifProfile can function as a motif search tool, enabling the identification of instances of a specific motif family by searching with the corresponding profile. The ability to generate accurate and comprehensive profiles for RNA structural motif families, and to search for these motifs, facilitates a deeper understanding of RNA structure-function relationships and potential applications in RNA engineering and therapeutic design.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae128"},"PeriodicalIF":4.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11426329/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prognostic importance of splicing-triggered aberrations of protein complex interfaces in cancer. 剪接触发的癌症蛋白质复合界面畸变的重要预后意义
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-09-26 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae133
Khalique Newaz, Christoph Schaefers, Katja Weisel, Jan Baumbach, Dmitrij Frishman
{"title":"Prognostic importance of splicing-triggered aberrations of protein complex interfaces in cancer.","authors":"Khalique Newaz, Christoph Schaefers, Katja Weisel, Jan Baumbach, Dmitrij Frishman","doi":"10.1093/nargab/lqae133","DOIUrl":"https://doi.org/10.1093/nargab/lqae133","url":null,"abstract":"<p><p>Aberrant alternative splicing (AS) is a prominent hallmark of cancer. AS can perturb protein-protein interactions (PPIs) by adding or removing interface regions encoded by individual exons. Identifying prognostic exon-exon interactions (EEIs) from PPI interfaces can help discover AS-affected cancer-driving PPIs that can serve as potential drug targets. Here, we assessed the prognostic significance of EEIs across 15 cancer types by integrating RNA-seq data with three-dimensional (3D) structures of protein complexes. By analyzing the resulting EEI network we identified patient-specific perturbed EEIs (i.e., EEIs present in healthy samples but absent from the paired cancer samples or vice versa) that were significantly associated with survival. We provide the first evidence that EEIs can be used as prognostic biomarkers for cancer patient survival. Our findings provide mechanistic insights into AS-affected PPI interfaces. Given the ongoing expansion of available RNA-seq data and the number of 3D structurally-resolved (or confidently predicted) protein complexes, our computational framework will help accelerate the discovery of clinically important cancer-promoting AS events.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae133"},"PeriodicalIF":4.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11426328/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cooperative binding of bivalent ligands yields new insights into the guanidine-II riboswitch. 二价配体的合作结合为胍-II 核糖开关提供了新的视角。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-09-25 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae132
Jakob Steuer, Malte Sinn, Franziska Eble, Sina Rütschlin, Thomas Böttcher, Jörg S Hartig, Christine Peter
{"title":"Cooperative binding of bivalent ligands yields new insights into the guanidine-II riboswitch.","authors":"Jakob Steuer, Malte Sinn, Franziska Eble, Sina Rütschlin, Thomas Böttcher, Jörg S Hartig, Christine Peter","doi":"10.1093/nargab/lqae132","DOIUrl":"https://doi.org/10.1093/nargab/lqae132","url":null,"abstract":"<p><p>Riboswitches are involved in regulating the gene expression in bacteria. They are located within the untranslated regions of bacterial messenger RNA and function as switches by adjusting their shape, depending on the presence or absence of specific ligands. To decipher the fundamental aspects of bacterial gene control, it is therefore important to understand the mechanisms that underlie these conformational switches. To this end, a combination of an experimental binding study, molecular simulations and machine learning has been employed to obtain insights into the conformational changes and structural dynamics of the guanidine-II riboswitch. By exploiting the design of a bivalent ligand, we were able to study ligand binding in the aptamer dimer at the molecular level. Spontaneous ligand-binding events, which are usually difficult to simulate, were observed and the contributing factors are described. These findings were further confirmed by <i>in vivo</i> experiments, where the cooperative binding effects of the bivalent ligands resulted in increased binding affinity compared to the native guanidinium ligand. Beyond ligand binding itself, the simulations revealed a novel, ligand-dependent base-stacking interaction outside of the binding pocket that stabilizes the riboswitch.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae132"},"PeriodicalIF":4.0,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11423145/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring the impact of sequence context on errors in SNP genotype calling with whole genome sequencing data using AI-based autoencoder approach. 利用基于人工智能的自动编码器方法,探索序列上下文对利用全基因组测序数据进行 SNP 基因型调用时出现错误的影响。
IF 4
NAR Genomics and Bioinformatics Pub Date : 2024-09-24 eCollection Date: 2024-09-01 DOI: 10.1093/nargab/lqae131
Krzysztof Kotlarz, Magda Mielczarek, Przemysław Biecek, Bernt Guldbrandtsen, Joanna Szyda
{"title":"Exploring the impact of sequence context on errors in SNP genotype calling with whole genome sequencing data using AI-based autoencoder approach.","authors":"Krzysztof Kotlarz, Magda Mielczarek, Przemysław Biecek, Bernt Guldbrandtsen, Joanna Szyda","doi":"10.1093/nargab/lqae131","DOIUrl":"https://doi.org/10.1093/nargab/lqae131","url":null,"abstract":"<p><p>A critical step in the analysis of whole genome sequencing data is variant calling. Despite its importance, variant calling is prone to errors. Our study investigated the association between incorrect single nucleotide polymorphism (SNP) calls and variant quality metrics and nucleotide context. In our study, incorrect SNPs were defined in 20 Holstein-Friesian cows by comparing their SNPs genotypes identified by whole genome sequencing with the IlluminaNovaSeq6000 and the EuroGMD50K genotyping microarray. The dataset was divided into the correct SNP set (666 333 SNPs) and the incorrect SNP set (4 557 SNPs). The training dataset consisted of only the correct SNPs, while the test dataset contained a balanced mix of all the incorrectly and correctly called SNPs. An autoencoder was constructed to identify systematically incorrect SNPs that were marked as outliers by a one-class support vector machine and isolation forest algorithms. The results showed that 59.53% (±0.39%) of the incorrect SNPs had systematic patterns, with the remainder being random errors. The frequent occurrence of the CGC 3-mer was due to mislabelling a call for C. Incorrect T instead of A call was associated with the presence of T in the neighbouring downstream position. These errors may arise due to the fluorescence patterns of nucleotide labelling.</p>","PeriodicalId":33994,"journal":{"name":"NAR Genomics and Bioinformatics","volume":"6 3","pages":"lqae131"},"PeriodicalIF":4.0,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11420682/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142355520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信