Haplotype Classification Using Copy Number Variation and Principal Components Analysis

Q3 Computer Science

Open Bioinformatics Journal Pub Date : 2013-11-29 DOI:10.2174/1875036201307010019

K. Blighe

{"title":"Haplotype Classification Using Copy Number Variation and Principal Components Analysis","authors":"K. Blighe","doi":"10.2174/1875036201307010019","DOIUrl":null,"url":null,"abstract":"Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such 'dimension reduction' techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.","PeriodicalId":38956,"journal":{"name":"Open Bioinformatics Journal","volume":"7 1","pages":"19-24"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Bioinformatics Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2174/1875036201307010019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 2

Abstract

Elaborate downstream methods are required to analyze large microarray data-sets. At times, where the end goal is to look for relationships between (or patterns within) different subgroups or even just individual samples, large data-sets must first be filtered using statistical thresholds in order to reduce their overall volume. As an example, in anthropological microarray studies, such 'dimension reduction' techniques are essential to elucidate any links between polymorphisms and phenotypes for given populations. In such large data-sets, a subset can first be taken to represent the larger data-set. For example, polling results taken during elections are used to infer the opinions of the population at large. However, what is the best and easiest method of capturing a sub-set of variation in a data-set that can represent the overall portrait of variation? In this article, principal components analysis (PCA) is discussed in detail, including its history, the mathematics behind the process, and in which ways it can be applied to modern large-scale biological datasets. New methods of analysis using PCA are also suggested, with tentative results outlined.

查看原文本刊更多论文

利用拷贝数变异和主成分分析进行单倍型分类

需要详细的下游方法来分析大型微阵列数据集。有时，如果最终目标是寻找不同子组之间的关系(或模式之间的关系)，甚至只是单个样本，则必须首先使用统计阈值过滤大型数据集，以减少它们的总体容量。例如，在人类学微阵列研究中，这种“降维”技术对于阐明特定人群的多态性和表型之间的任何联系至关重要。在这样大的数据集中，首先可以取一个子集来表示更大的数据集。例如，在选举期间进行的民意调查结果被用来推断广大民众的意见。然而，什么是最好的和最简单的方法来捕获数据集中的一个子集的变化，可以代表变化的整体肖像?在本文中，详细讨论了主成分分析(PCA)，包括其历史，过程背后的数学，以及它可以应用于现代大规模生物数据集的方式。本文还提出了新的PCA分析方法，并概述了初步结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Open Bioinformatics Journal Computer Science-Computer Science (miscellaneous)

CiteScore

2.40

自引率

0.00%

发文量

期刊介绍： The Open Bioinformatics Journal is an Open Access online journal, which publishes research articles, reviews/mini-reviews, letters, clinical trial studies and guest edited single topic issues in all areas of bioinformatics and computational biology. The coverage includes biomedicine, focusing on large data acquisition, analysis and curation, computational and statistical methods for the modeling and analysis of biological data, and descriptions of new algorithms and databases. The Open Bioinformatics Journal, a peer reviewed journal, is an important and reliable source of current information on the developments in the field. The emphasis will be on publishing quality articles rapidly and freely available worldwide.