从基因数据到亲属关系清晰度:利用机器学习检测乱伦关系。

IF 2.8 3区 生物学 Q2 GENETICS & HEREDITY
Frontiers in Genetics Pub Date : 2025-06-02 eCollection Date: 2025-01-01 DOI:10.3389/fgene.2025.1578581
Dejan Šorgić, Aleksandra Stefanović, Mladen Popović, Dušan Keckarević
{"title":"从基因数据到亲属关系清晰度:利用机器学习检测乱伦关系。","authors":"Dejan Šorgić, Aleksandra Stefanović, Mladen Popović, Dušan Keckarević","doi":"10.3389/fgene.2025.1578581","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>The aim of the study was to develop a predictive model based on STR profiles of mothers and children for the detection of incestuous conception.</p><p><strong>Methods: </strong>Based on allele frequency data from the USA and Saudi Arabia, STR profiles were generated and used to simulate offspring profiles corresponding to father-child and brother-sister incest scenarios. Model training and evaluation were performed using the STR profiles of the mother and child. In addition to the baseline model, we examined its performance under a one-step mutation model, as well as its ability to detect incestuous relationships based solely on the child's STR profile. Several machine learning algorithms and neural networks were tested for classification accuracy.</p><p><strong>Results: </strong>The CatBoost algorithm performed best in the binary classification of Normal Paternity vs. Incest Kinship. For the USA, we achieved the following results: 96.94% for 29 markers and 95% for 21 markers. The same accuracy was obtained with a single-step mutation, while prediction based on child profiles exclusively yielded an accuracy of 90.37% in the U.S. population. When analysing profiles from Saudi Arabia and modified Saudi frequencies, an accuracy of 94% was achieved.</p><p><strong>Discussion: </strong>It was established that population structure does not affect the model's accuracy and that it can be applied even in isolated populations.</p>","PeriodicalId":12750,"journal":{"name":"Frontiers in Genetics","volume":"16 ","pages":"1578581"},"PeriodicalIF":2.8000,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12171372/pdf/","citationCount":"0","resultStr":"{\"title\":\"From genetic data to kinship clarity: employing machine learning for detecting incestuous relations.\",\"authors\":\"Dejan Šorgić, Aleksandra Stefanović, Mladen Popović, Dušan Keckarević\",\"doi\":\"10.3389/fgene.2025.1578581\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Introduction: </strong>The aim of the study was to develop a predictive model based on STR profiles of mothers and children for the detection of incestuous conception.</p><p><strong>Methods: </strong>Based on allele frequency data from the USA and Saudi Arabia, STR profiles were generated and used to simulate offspring profiles corresponding to father-child and brother-sister incest scenarios. Model training and evaluation were performed using the STR profiles of the mother and child. In addition to the baseline model, we examined its performance under a one-step mutation model, as well as its ability to detect incestuous relationships based solely on the child's STR profile. Several machine learning algorithms and neural networks were tested for classification accuracy.</p><p><strong>Results: </strong>The CatBoost algorithm performed best in the binary classification of Normal Paternity vs. Incest Kinship. For the USA, we achieved the following results: 96.94% for 29 markers and 95% for 21 markers. The same accuracy was obtained with a single-step mutation, while prediction based on child profiles exclusively yielded an accuracy of 90.37% in the U.S. population. When analysing profiles from Saudi Arabia and modified Saudi frequencies, an accuracy of 94% was achieved.</p><p><strong>Discussion: </strong>It was established that population structure does not affect the model's accuracy and that it can be applied even in isolated populations.</p>\",\"PeriodicalId\":12750,\"journal\":{\"name\":\"Frontiers in Genetics\",\"volume\":\"16 \",\"pages\":\"1578581\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2025-06-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12171372/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3389/fgene.2025.1578581\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3389/fgene.2025.1578581","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

前言:本研究的目的是建立一种基于母亲和儿童STR谱的预测模型,用于检测乱伦怀孕。方法:基于来自美国和沙特阿拉伯的等位基因频率数据,生成STR谱,并用于模拟父子乱伦和兄弟姐妹乱伦情况下的后代谱。使用母亲和孩子的STR档案进行模型训练和评估。除了基线模型,我们还检验了它在一步突变模型下的表现,以及它仅根据孩子的STR谱检测乱伦关系的能力。测试了几种机器学习算法和神经网络的分类准确性。结果:CatBoost算法在正常父系与乱伦亲属关系的二元分类中表现最好。对于美国,我们获得了以下结果:29个标记为96.94%,21个标记为95%。单步突变获得了相同的准确性,而仅基于儿童谱的预测在美国人口中获得了90.37%的准确性。当分析来自沙特阿拉伯的剖面和修改后的沙特频率时,准确率达到94%。讨论:确定了种群结构不影响模型的准确性,即使在孤立的种群中也可以应用该模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
From genetic data to kinship clarity: employing machine learning for detecting incestuous relations.

Introduction: The aim of the study was to develop a predictive model based on STR profiles of mothers and children for the detection of incestuous conception.

Methods: Based on allele frequency data from the USA and Saudi Arabia, STR profiles were generated and used to simulate offspring profiles corresponding to father-child and brother-sister incest scenarios. Model training and evaluation were performed using the STR profiles of the mother and child. In addition to the baseline model, we examined its performance under a one-step mutation model, as well as its ability to detect incestuous relationships based solely on the child's STR profile. Several machine learning algorithms and neural networks were tested for classification accuracy.

Results: The CatBoost algorithm performed best in the binary classification of Normal Paternity vs. Incest Kinship. For the USA, we achieved the following results: 96.94% for 29 markers and 95% for 21 markers. The same accuracy was obtained with a single-step mutation, while prediction based on child profiles exclusively yielded an accuracy of 90.37% in the U.S. population. When analysing profiles from Saudi Arabia and modified Saudi frequencies, an accuracy of 94% was achieved.

Discussion: It was established that population structure does not affect the model's accuracy and that it can be applied even in isolated populations.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Frontiers in Genetics
Frontiers in Genetics Biochemistry, Genetics and Molecular Biology-Molecular Medicine
CiteScore
5.50
自引率
8.10%
发文量
3491
审稿时长
14 weeks
期刊介绍: Frontiers in Genetics publishes rigorously peer-reviewed research on genes and genomes relating to all the domains of life, from humans to plants to livestock and other model organisms. Led by an outstanding Editorial Board of the world’s leading experts, this multidisciplinary, open-access journal is at the forefront of communicating cutting-edge research to researchers, academics, clinicians, policy makers and the public. The study of inheritance and the impact of the genome on various biological processes is well documented. However, the majority of discoveries are still to come. A new era is seeing major developments in the function and variability of the genome, the use of genetic and genomic tools and the analysis of the genetic basis of various biological phenomena.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信