稀有等位基因数据挖掘在生物地理祖先评估中的应用

Colleen M. Callahan, Holden Bridge
{"title":"稀有等位基因数据挖掘在生物地理祖先评估中的应用","authors":"Colleen M. Callahan, Holden Bridge","doi":"10.1109/SIEDS52267.2021.9483709","DOIUrl":null,"url":null,"abstract":"The United States Department of Defense (DoD) routinely seeks more efficient ways to examine genetic data applied to cases of foreign or domestic crime. The process of identifying biogeographic ancestry groups using forensic DNA data to provide investigative leads is currently performed on Single Nucleotide Polymorphisms (SNP). The motivation for this project was to determine whether SNP assessment of biogeographic ancestry can be replicated using analysis of autosomal Short Tandem Repeats (STR) while preserving predictive accuracy. Replacing SNP analysis with STR analysis is theoretically more efficient. STR data can be generated from a significantly smaller amount of DNA. Additionally, readily available genetic data can be analyzed well after collection. Moreover, in contrast to SNP analysis, STR analysis is more cost effective per sample. Several considerations for this paper were necessary: 1) Whether or not STR profiles at 24 loci can be distinguished into distinct clusters using microvariants and off-ladder alleles. 2) Given that there is identifiable clustering, whether or not these clusters can be probabilistically identified as members biogeographic ancestry groups. STR profiles consisting of 24 loci from N=2,348 subjects were analyzed. The present analysis employed multidimensional scaling (MDS), which provides a measure of dissimilarity between STR profiles and reduces the tabular profiles into two latent dimensions. Using the scaled MDS coordinates, a Gaussian Mixture Model (GMM) was constructed which provides probabilities of belongingness for every data point to each cluster. Results from the model indicated separations between certain biogeographic ancestry groups with the probabilities generated from the GMM providing a posteriori confidence levels for group membership. Such analyses may be of benefit for efforts in future crime investigation where biogeographic ancestry identification is needed.","PeriodicalId":426747,"journal":{"name":"2021 Systems and Information Engineering Design Symposium (SIEDS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data Mining of Rare Alleles to Assess Biogeographic Ancestry\",\"authors\":\"Colleen M. Callahan, Holden Bridge\",\"doi\":\"10.1109/SIEDS52267.2021.9483709\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The United States Department of Defense (DoD) routinely seeks more efficient ways to examine genetic data applied to cases of foreign or domestic crime. The process of identifying biogeographic ancestry groups using forensic DNA data to provide investigative leads is currently performed on Single Nucleotide Polymorphisms (SNP). The motivation for this project was to determine whether SNP assessment of biogeographic ancestry can be replicated using analysis of autosomal Short Tandem Repeats (STR) while preserving predictive accuracy. Replacing SNP analysis with STR analysis is theoretically more efficient. STR data can be generated from a significantly smaller amount of DNA. Additionally, readily available genetic data can be analyzed well after collection. Moreover, in contrast to SNP analysis, STR analysis is more cost effective per sample. Several considerations for this paper were necessary: 1) Whether or not STR profiles at 24 loci can be distinguished into distinct clusters using microvariants and off-ladder alleles. 2) Given that there is identifiable clustering, whether or not these clusters can be probabilistically identified as members biogeographic ancestry groups. STR profiles consisting of 24 loci from N=2,348 subjects were analyzed. The present analysis employed multidimensional scaling (MDS), which provides a measure of dissimilarity between STR profiles and reduces the tabular profiles into two latent dimensions. Using the scaled MDS coordinates, a Gaussian Mixture Model (GMM) was constructed which provides probabilities of belongingness for every data point to each cluster. Results from the model indicated separations between certain biogeographic ancestry groups with the probabilities generated from the GMM providing a posteriori confidence levels for group membership. Such analyses may be of benefit for efforts in future crime investigation where biogeographic ancestry identification is needed.\",\"PeriodicalId\":426747,\"journal\":{\"name\":\"2021 Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIEDS52267.2021.9483709\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS52267.2021.9483709","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

美国国防部(DoD)经常寻求更有效的方法来检查应用于国外或国内犯罪案件的基因数据。利用法医DNA数据识别生物地理祖先群体以提供调查线索的过程目前是在单核苷酸多态性(SNP)上进行的。该项目的动机是确定是否可以使用常染色体短串联重复序列(STR)分析复制生物地理祖先的SNP评估,同时保持预测准确性。用STR分析代替SNP分析理论上更有效。STR数据可以从数量少得多的DNA生成。此外,现成的遗传数据可以在收集后很好地分析。此外,与SNP分析相比,STR分析对每个样本更具成本效益。本文需要考虑以下几个问题:1)24个位点的STR谱是否可以通过微变异体和离梯等位基因区分为不同的簇。2)假设存在可识别的聚类,这些聚类是否可以被概率地识别为成员生物地理祖先群体。分析了来自2348名受试者的24个基因座的STR谱。目前的分析采用了多维尺度(MDS),它提供了一个衡量STR配置文件之间的差异,并将表格配置文件减少到两个潜在维度。利用缩放后的MDS坐标,构造了高斯混合模型(GMM),该模型提供了每个数据点属于每个簇的概率。该模型的结果表明,某些生物地理祖先群体之间存在分离,GMM生成的概率为群体成员提供了后验置信水平。这种分析可能对未来需要生物地理血统鉴定的犯罪调查工作有益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Data Mining of Rare Alleles to Assess Biogeographic Ancestry
The United States Department of Defense (DoD) routinely seeks more efficient ways to examine genetic data applied to cases of foreign or domestic crime. The process of identifying biogeographic ancestry groups using forensic DNA data to provide investigative leads is currently performed on Single Nucleotide Polymorphisms (SNP). The motivation for this project was to determine whether SNP assessment of biogeographic ancestry can be replicated using analysis of autosomal Short Tandem Repeats (STR) while preserving predictive accuracy. Replacing SNP analysis with STR analysis is theoretically more efficient. STR data can be generated from a significantly smaller amount of DNA. Additionally, readily available genetic data can be analyzed well after collection. Moreover, in contrast to SNP analysis, STR analysis is more cost effective per sample. Several considerations for this paper were necessary: 1) Whether or not STR profiles at 24 loci can be distinguished into distinct clusters using microvariants and off-ladder alleles. 2) Given that there is identifiable clustering, whether or not these clusters can be probabilistically identified as members biogeographic ancestry groups. STR profiles consisting of 24 loci from N=2,348 subjects were analyzed. The present analysis employed multidimensional scaling (MDS), which provides a measure of dissimilarity between STR profiles and reduces the tabular profiles into two latent dimensions. Using the scaled MDS coordinates, a Gaussian Mixture Model (GMM) was constructed which provides probabilities of belongingness for every data point to each cluster. Results from the model indicated separations between certain biogeographic ancestry groups with the probabilities generated from the GMM providing a posteriori confidence levels for group membership. Such analyses may be of benefit for efforts in future crime investigation where biogeographic ancestry identification is needed.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信