Genome BiologyPub Date : 2021-03-05DOI: 10.1186/s13059-021-02294-2
Steffen Albrecht, Maximilian Sprang, Miguel A Andrade-Navarro, Jean-Fred Fontaine
{"title":"seqQscorer: automated quality control of next-generation sequencing data using machine learning.","authors":"Steffen Albrecht, Maximilian Sprang, Miguel A Andrade-Navarro, Jean-Fred Fontaine","doi":"10.1186/s13059-021-02294-2","DOIUrl":"https://doi.org/10.1186/s13059-021-02294-2","url":null,"abstract":"<p><p>Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at https://github.com/salbrec/seqQscorer .</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"75"},"PeriodicalIF":12.3,"publicationDate":"2021-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02294-2","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25447787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2021-03-05DOI: 10.1186/s13059-021-02293-3
Drew Neavin, Quan Nguyen, Maciej S Daniszewski, Helena H Liang, Han Sheng Chiu, Yong Kiat Wee, Anne Senabouth, Samuel W Lukowski, Duncan E Crombie, Grace E Lidgerwood, Damián Hernández, James C Vickers, Anthony L Cook, Nathan J Palpant, Alice Pébay, Alex W Hewitt, Joseph E Powell
{"title":"Single cell eQTL analysis identifies cell type-specific genetic control of gene expression in fibroblasts and reprogrammed induced pluripotent stem cells.","authors":"Drew Neavin, Quan Nguyen, Maciej S Daniszewski, Helena H Liang, Han Sheng Chiu, Yong Kiat Wee, Anne Senabouth, Samuel W Lukowski, Duncan E Crombie, Grace E Lidgerwood, Damián Hernández, James C Vickers, Anthony L Cook, Nathan J Palpant, Alice Pébay, Alex W Hewitt, Joseph E Powell","doi":"10.1186/s13059-021-02293-3","DOIUrl":"10.1186/s13059-021-02293-3","url":null,"abstract":"<p><strong>Background: </strong>The discovery that somatic cells can be reprogrammed to induced pluripotent stem cells (iPSCs) has provided a foundation for in vitro human disease modelling, drug development and population genetics studies. Gene expression plays a critical role in complex disease risk and therapeutic response. However, while the genetic background of reprogrammed cell lines has been shown to strongly influence gene expression, the effect has not been evaluated at the level of individual cells which would provide significant resolution. By integrating single cell RNA-sequencing (scRNA-seq) and population genetics, we apply a framework in which to evaluate cell type-specific effects of genetic variation on gene expression.</p><p><strong>Results: </strong>Here, we perform scRNA-seq on 64,018 fibroblasts from 79 donors and map expression quantitative trait loci (eQTLs) at the level of individual cell types. We demonstrate that the majority of eQTLs detected in fibroblasts are specific to an individual cell subtype. To address if the allelic effects on gene expression are maintained following cell reprogramming, we generate scRNA-seq data in 19,967 iPSCs from 31 reprogramed donor lines. We again identify highly cell type-specific eQTLs in iPSCs and show that the eQTLs in fibroblasts almost entirely disappear during reprogramming.</p><p><strong>Conclusions: </strong>This work provides an atlas of how genetic variation influences gene expression across cell subtypes and provides evidence for patterns of genetic architecture that lead to cell type-specific eQTL effects.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"76"},"PeriodicalIF":12.3,"publicationDate":"2021-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02293-3","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25441071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2021-03-04DOI: 10.1186/s13059-021-02270-w
Zeinab Navidi, Lin Zhang, Bo Wang
{"title":"simATAC: a single-cell ATAC-seq simulation framework.","authors":"Zeinab Navidi, Lin Zhang, Bo Wang","doi":"10.1186/s13059-021-02270-w","DOIUrl":"https://doi.org/10.1186/s13059-021-02270-w","url":null,"abstract":"<p><p>Single-cell assay for transposase-accessible chromatin sequencing (scATAC-seq) identifies regulated chromatin accessibility modules at the single-cell resolution. Robust evaluation is critical to the development of scATAC-seq pipelines, which calls for reproducible datasets for benchmarking. We hereby present the simATAC framework, an R package that generates scATAC-seq count matrices that highly resemble real scATAC-seq datasets in library size, sparsity, and chromatin accessibility signals. simATAC deploys statistical models derived from analyzing 90 real scATAC-seq cell groups. simATAC provides a robust and systematic approach to generate in silico scATAC-seq samples with known cell labels for assessing analytical pipelines.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"74"},"PeriodicalIF":12.3,"publicationDate":"2021-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02270-w","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25430881","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2021-03-01DOI: 10.1186/s13059-021-02296-0
Matthew T Parker, Katarzyna Knop, Geoffrey J Barton, Gordon G Simpson
{"title":"2passtools: two-pass alignment using machine-learning-filtered splice junctions increases the accuracy of intron detection in long-read RNA sequencing.","authors":"Matthew T Parker, Katarzyna Knop, Geoffrey J Barton, Gordon G Simpson","doi":"10.1186/s13059-021-02296-0","DOIUrl":"https://doi.org/10.1186/s13059-021-02296-0","url":null,"abstract":"<p><p>Transcription of eukaryotic genomes involves complex alternative processing of RNAs. Sequencing of full-length RNAs using long reads reveals the true complexity of processing. However, the relatively high error rates of long-read sequencing technologies can reduce the accuracy of intron identification. Here we apply alignment metrics and machine-learning-derived sequence information to filter spurious splice junctions from long-read alignments and use the remaining junctions to guide realignment in a two-pass approach. This method, available in the software package 2passtools ( https://github.com/bartongroup/2passtools ), improves the accuracy of spliced alignment and transcriptome assembly for species both with and without existing high-quality annotations.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"72"},"PeriodicalIF":12.3,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02296-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25418311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2021-02-24DOI: 10.1186/s13059-021-02292-4
Mohieddin Jafari, Yuanfang Guan, David C Wedge, Naser Ansari-Pour
{"title":"Re-evaluating experimental validation in the Big Data Era: a conceptual argument.","authors":"Mohieddin Jafari, Yuanfang Guan, David C Wedge, Naser Ansari-Pour","doi":"10.1186/s13059-021-02292-4","DOIUrl":"https://doi.org/10.1186/s13059-021-02292-4","url":null,"abstract":"","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"71"},"PeriodicalIF":12.3,"publicationDate":"2021-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02292-4","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25400441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2021-02-23DOI: 10.1186/s13059-021-02291-5
Fang Wang, Qihan Wang, Vakul Mohanty, Shaoheng Liang, Jinzhuang Dou, Jincheng Han, Darlan Conterno Minussi, Ruli Gao, Li Ding, Nicholas Navin, Ken Chen
{"title":"MEDALT: single-cell copy number lineage tracing enabling gene discovery.","authors":"Fang Wang, Qihan Wang, Vakul Mohanty, Shaoheng Liang, Jinzhuang Dou, Jincheng Han, Darlan Conterno Minussi, Ruli Gao, Li Ding, Nicholas Navin, Ken Chen","doi":"10.1186/s13059-021-02291-5","DOIUrl":"10.1186/s13059-021-02291-5","url":null,"abstract":"<p><p>We present a Minimal Event Distance Aneuploidy Lineage Tree (MEDALT) algorithm that infers the evolution history of a cell population based on single-cell copy number (SCCN) profiles, and a statistical routine named lineage speciation analysis (LSA), whichty facilitates discovery of fitness-associated alterations and genes from SCCN lineage trees. MEDALT appears more accurate than phylogenetics approaches in reconstructing copy number lineage. From data from 20 triple-negative breast cancer patients, our approaches effectively prioritize genes that are essential for breast cancer cell fitness and predict patient survival, including those implicating convergent evolution.The source code of our study is available at https://github.com/KChen-lab/MEDALT .</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"70"},"PeriodicalIF":12.3,"publicationDate":"2021-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7901082/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25403623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2021-02-22DOI: 10.1186/s13059-021-02283-5
Vahid Akbari, Jean-Michel Garant, Kieran O'Neill, Pawan Pandoh, Richard Moore, Marco A Marra, Martin Hirst, Steven J M Jones
{"title":"Megabase-scale methylation phasing using nanopore long reads and NanoMethPhase.","authors":"Vahid Akbari, Jean-Michel Garant, Kieran O'Neill, Pawan Pandoh, Richard Moore, Marco A Marra, Martin Hirst, Steven J M Jones","doi":"10.1186/s13059-021-02283-5","DOIUrl":"https://doi.org/10.1186/s13059-021-02283-5","url":null,"abstract":"<p><p>The ability of nanopore sequencing to simultaneously detect modified nucleotides while producing long reads makes it ideal for detecting and phasing allele-specific methylation. However, there is currently no complete software for detecting SNPs, phasing haplotypes, and mapping methylation to these from nanopore sequence data. Here, we present NanoMethPhase, a software tool to phase 5-methylcytosine from nanopore sequencing. We also present SNVoter, which can post-process nanopore SNV calls to improve accuracy in low coverage regions. Together, these tools can accurately detect allele-specific methylation genome-wide using nanopore sequence data with low coverage of about ten-fold redundancy.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"68"},"PeriodicalIF":12.3,"publicationDate":"2021-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02283-5","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25395196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2021-02-22DOI: 10.1186/s13059-021-02281-7
Hongyu Guo, Jun Li
{"title":"scSorter: assigning cells to known cell types according to marker genes.","authors":"Hongyu Guo, Jun Li","doi":"10.1186/s13059-021-02281-7","DOIUrl":"10.1186/s13059-021-02281-7","url":null,"abstract":"<p><p>On single-cell RNA-sequencing data, we consider the problem of assigning cells to known cell types, assuming that the identities of cell-type-specific marker genes are given but their exact expression levels are unavailable, that is, without using a reference dataset. Based on an observation that the expected over-expression of marker genes is often absent in a nonnegligible proportion of cells, we develop a method called scSorter. scSorter allows marker genes to express at a low level and borrows information from the expression of non-marker genes. On both simulated and real data, scSorter shows much higher power compared to existing methods.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"69"},"PeriodicalIF":12.3,"publicationDate":"2021-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7898451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25395193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2021-02-19DOI: 10.1186/s13059-021-02288-0
Yanping Long, Zhijian Liu, Jinbu Jia, Weipeng Mo, Liang Fang, Dongdong Lu, Bo Liu, Hong Zhang, Wei Chen, Jixian Zhai
{"title":"FlsnRNA-seq: protoplasting-free full-length single-nucleus RNA profiling in plants.","authors":"Yanping Long, Zhijian Liu, Jinbu Jia, Weipeng Mo, Liang Fang, Dongdong Lu, Bo Liu, Hong Zhang, Wei Chen, Jixian Zhai","doi":"10.1186/s13059-021-02288-0","DOIUrl":"10.1186/s13059-021-02288-0","url":null,"abstract":"<p><p>The broad application of single-cell RNA profiling in plants has been hindered by the prerequisite of protoplasting that requires digesting the cell walls from different types of plant tissues. Here, we present a protoplasting-free approach, flsnRNA-seq, for large-scale full-length RNA profiling at a single-nucleus level in plants using isolated nuclei. Combined with 10x Genomics and Nanopore long-read sequencing, we validate the robustness of this approach in Arabidopsis root cells and the developing endosperm. Sequencing results demonstrate that it allows for uncovering alternative splicing and polyadenylation-related RNA isoform information at the single-cell level, which facilitates characterizing cell identities.</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"66"},"PeriodicalIF":12.3,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02288-0","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25386338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Genome BiologyPub Date : 2021-02-19DOI: 10.1186/s13059-021-02265-7
Stephan Schmeing, Mark D Robinson
{"title":"ReSeq simulates realistic Illumina high-throughput sequencing data.","authors":"Stephan Schmeing, Mark D Robinson","doi":"10.1186/s13059-021-02265-7","DOIUrl":"https://doi.org/10.1186/s13059-021-02265-7","url":null,"abstract":"<p><p>In high-throughput sequencing data, performance comparisons between computational tools are essential for making informed decisions at each step of a project. Simulations are a critical part of method comparisons, but for standard Illumina sequencing of genomic DNA, they are often oversimplified, which leads to optimistic results for most tools. ReSeq improves the authenticity of synthetic data by extracting and reproducing key components from real data. Major advancements are the inclusion of systematic errors, a fragment-based coverage model and sampling-matrix estimates based on two-dimensional margins. These improvements lead to more faithful performance evaluations. ReSeq is available at https://github.com/schmeing/ReSeq .</p>","PeriodicalId":48922,"journal":{"name":"Genome Biology","volume":"22 1","pages":"67"},"PeriodicalIF":12.3,"publicationDate":"2021-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/s13059-021-02265-7","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25386439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}