Ya-Chi Lin , Joseph T. Tseng , Shuen-Lin Jeng , H. Sunny Sun
{"title":"Comprehensive analysis of common coding sequence variants in Taiwanese Han population","authors":"Ya-Chi Lin , Joseph T. Tseng , Shuen-Lin Jeng , H. Sunny Sun","doi":"10.1016/j.bgm.2014.05.001","DOIUrl":null,"url":null,"abstract":"<div><p>The diversity of genomic variations exists among different ethnic populations. Information on population-specific genomic variants provides important insights to link between genotypes and phenotypes. To facilitate genomic medicine research, this study aims to detect and characterize sequence variations enriched in the coding regions of the genome in the Chinese population residing in Taiwan. DNAs from 11 unrelated Taiwanese individuals were enriched for coding regions (i.e., exome) and followed by deep sequencing. Approximately 30 Gb of high-quality data from massively parallel sequencing was obtained. On average, ∼60% of the total reads were uniquely mapped to the human reference genome and overall 97% of the target regions were covered by sequence reads, resulting in an average enrichment fold relative to target size of ∼50-fold. Comprehensive variant detection and analysis were performed with various in-house established bioinformatics pipelines, and information for different types of variations including single nucleotide variants, short insertions and deletions, and copy number variations was collected. The sequence variations were crossed with variants in the public databases to identify ethnic-specific variants. To study the impact of sequence variations that are enriched in the Taiwanese Han population, variants that are present in at least two exomes (i.e., minor allele frequency >9%) were further annotated. Overall, we detected 308 loss-of-function variants that belong to 291 genes in the Taiwanese Han Exome Sequencing dataset. Functional annotation revealed a significant pathological influence of these loss-of-function-associated genes in the risk of various human diseases including lung cancer. This is the first NGS (next-generation sequencing)-generating dataset to comprehensively report coding sequence variants in the Taiwanese Han population. Given that the Taiwanese Han population is the Han Chinese residing in Taiwan, it is normally underrepresented in population-genetics studies. We believe the study will contribute valuable information that will have an impact on medical as well as population genetics.</p></div>","PeriodicalId":100178,"journal":{"name":"Biomarkers and Genomic Medicine","volume":"6 4","pages":"Pages 133-143"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.bgm.2014.05.001","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomarkers and Genomic Medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2214024714000355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The diversity of genomic variations exists among different ethnic populations. Information on population-specific genomic variants provides important insights to link between genotypes and phenotypes. To facilitate genomic medicine research, this study aims to detect and characterize sequence variations enriched in the coding regions of the genome in the Chinese population residing in Taiwan. DNAs from 11 unrelated Taiwanese individuals were enriched for coding regions (i.e., exome) and followed by deep sequencing. Approximately 30 Gb of high-quality data from massively parallel sequencing was obtained. On average, ∼60% of the total reads were uniquely mapped to the human reference genome and overall 97% of the target regions were covered by sequence reads, resulting in an average enrichment fold relative to target size of ∼50-fold. Comprehensive variant detection and analysis were performed with various in-house established bioinformatics pipelines, and information for different types of variations including single nucleotide variants, short insertions and deletions, and copy number variations was collected. The sequence variations were crossed with variants in the public databases to identify ethnic-specific variants. To study the impact of sequence variations that are enriched in the Taiwanese Han population, variants that are present in at least two exomes (i.e., minor allele frequency >9%) were further annotated. Overall, we detected 308 loss-of-function variants that belong to 291 genes in the Taiwanese Han Exome Sequencing dataset. Functional annotation revealed a significant pathological influence of these loss-of-function-associated genes in the risk of various human diseases including lung cancer. This is the first NGS (next-generation sequencing)-generating dataset to comprehensively report coding sequence variants in the Taiwanese Han population. Given that the Taiwanese Han population is the Han Chinese residing in Taiwan, it is normally underrepresented in population-genetics studies. We believe the study will contribute valuable information that will have an impact on medical as well as population genetics.