Machine Learning based models for examining differences between modern and ancient DNA in dental calculus

Maria-Iuliana Bocicor, Iulia-Monica Szuhai, Emilia-Loredana Pop, Ioan-Gabriel Mircea
{"title":"Machine Learning based models for examining differences between modern and ancient DNA in dental calculus","authors":"Maria-Iuliana Bocicor, Iulia-Monica Szuhai, Emilia-Loredana Pop, Ioan-Gabriel Mircea","doi":"10.1109/SYNASC51798.2020.00036","DOIUrl":null,"url":null,"abstract":"DNA, or deoxyribonucleic acid, carries the entirety of genetic information of any living organism. The study of the bacterial DNA extracted from human bones excavated from archaeological and anthropological sites aims to analyse the evolution of microorganisms inhabiting the human body and to contribute to new insight related to the health, diet and even migration of our ancestors. This paper aims to offer a solution for the discrimination between ancient and modern bacterial DNA in dental calculus. We propose three internal representations for the considered DNA sequences in order to analyse which captures the most information and is more informative for classification models. Two of these are text-based, while the third one takes advantage of several physical and chemical properties of nucleotides in the DNA. We use a data set containing both ancient and modern dental calculus bacterial DNA and apply two supervised models, namely artificial neural networks and support vector machines to distinguish between the two types of sequences. The two main conclusions indicated by the obtained results are: the representation based on physical and chemical properties seems to best capture relevant information for the task at hand; for the considered data set and DNA encoding proposals, support vector machines outperform artificial neural networks, although results obtained by both models are promising.","PeriodicalId":278104,"journal":{"name":"2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","volume":"31 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC51798.2020.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

DNA, or deoxyribonucleic acid, carries the entirety of genetic information of any living organism. The study of the bacterial DNA extracted from human bones excavated from archaeological and anthropological sites aims to analyse the evolution of microorganisms inhabiting the human body and to contribute to new insight related to the health, diet and even migration of our ancestors. This paper aims to offer a solution for the discrimination between ancient and modern bacterial DNA in dental calculus. We propose three internal representations for the considered DNA sequences in order to analyse which captures the most information and is more informative for classification models. Two of these are text-based, while the third one takes advantage of several physical and chemical properties of nucleotides in the DNA. We use a data set containing both ancient and modern dental calculus bacterial DNA and apply two supervised models, namely artificial neural networks and support vector machines to distinguish between the two types of sequences. The two main conclusions indicated by the obtained results are: the representation based on physical and chemical properties seems to best capture relevant information for the task at hand; for the considered data set and DNA encoding proposals, support vector machines outperform artificial neural networks, although results obtained by both models are promising.
基于机器学习的模型用于检查牙石中现代和古代DNA的差异
DNA,或脱氧核糖核酸,携带着任何生物体的全部遗传信息。从考古和人类学遗址出土的人骨中提取细菌DNA的研究旨在分析居住在人体中的微生物的进化,并有助于对我们祖先的健康、饮食甚至迁徙有新的认识。本文旨在为牙结石中古代和现代细菌DNA的区分提供一种解决方案。我们为考虑的DNA序列提出了三种内部表示,以分析哪一种捕获了最多的信息,并为分类模型提供了更多的信息。其中两种是基于文本的,而第三种则利用了DNA中核苷酸的几种物理和化学特性。我们使用包含古代和现代牙石细菌DNA的数据集,并应用两种监督模型,即人工神经网络和支持向量机来区分两种类型的序列。所获得的结果表明了两个主要结论:基于物理和化学性质的表示似乎最能捕获手头任务的相关信息;对于考虑的数据集和DNA编码方案,支持向量机优于人工神经网络,尽管两种模型获得的结果都很有希望。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信