Carter Norton, Chad Pollard, Kelaney Stalker, Kenneth Aston, Timothy Jenkins
{"title":"精子样本中体细胞污染的新型生物信息学分析。","authors":"Carter Norton, Chad Pollard, Kelaney Stalker, Kenneth Aston, Timothy Jenkins","doi":"10.1080/19396368.2024.2368716","DOIUrl":null,"url":null,"abstract":"<p><p>The assessment of epigenetic profiles in sperm is sensitive to somatic cell contamination, which can influence methylation signals at gene promoters. This contamination is particularly problematic in the assessment of DNA methylation in samples with low sperm counts, where fractional amounts of somatic cell DNA can lead to significant shifts in measured methylation state. In this study, a new method of detecting possible somatic cell contamination is proposed through two multi-region bioinformatic models: a traditional differential methylation analysis and a machine learning logistic regression model. These models were trained on publicly available sperm (<i>n</i> = 489) and blood (<i>n</i> = 1029) DNA methylation array data and tested on a contamination set, wherein the sperm of four donors with normal sperm counts were run on a 450k methylation array with four permutations each, including pure blood, half blood and half sperm by DNA concentration, half blood and half sperm by cell count, and pure sperm (<i>n</i> = 16). The DMR and logistic regression model classified the contamination testing set with 100% and 94% accuracy, respectively. These new methods of detecting the effects of somatic cell contamination allow for more accurate differentiation between epigenetic profiles that contain a biological somatic-like shift and those that have somatic-like signatures because of contamination.</p>","PeriodicalId":22184,"journal":{"name":"Systems Biology in Reproductive Medicine","volume":"70 1","pages":"174-182"},"PeriodicalIF":2.1000,"publicationDate":"2024-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Novel bioinformatic analyses of somatic cell contamination in sperm samples.\",\"authors\":\"Carter Norton, Chad Pollard, Kelaney Stalker, Kenneth Aston, Timothy Jenkins\",\"doi\":\"10.1080/19396368.2024.2368716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The assessment of epigenetic profiles in sperm is sensitive to somatic cell contamination, which can influence methylation signals at gene promoters. This contamination is particularly problematic in the assessment of DNA methylation in samples with low sperm counts, where fractional amounts of somatic cell DNA can lead to significant shifts in measured methylation state. In this study, a new method of detecting possible somatic cell contamination is proposed through two multi-region bioinformatic models: a traditional differential methylation analysis and a machine learning logistic regression model. These models were trained on publicly available sperm (<i>n</i> = 489) and blood (<i>n</i> = 1029) DNA methylation array data and tested on a contamination set, wherein the sperm of four donors with normal sperm counts were run on a 450k methylation array with four permutations each, including pure blood, half blood and half sperm by DNA concentration, half blood and half sperm by cell count, and pure sperm (<i>n</i> = 16). The DMR and logistic regression model classified the contamination testing set with 100% and 94% accuracy, respectively. These new methods of detecting the effects of somatic cell contamination allow for more accurate differentiation between epigenetic profiles that contain a biological somatic-like shift and those that have somatic-like signatures because of contamination.</p>\",\"PeriodicalId\":22184,\"journal\":{\"name\":\"Systems Biology in Reproductive Medicine\",\"volume\":\"70 1\",\"pages\":\"174-182\"},\"PeriodicalIF\":2.1000,\"publicationDate\":\"2024-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Systems Biology in Reproductive Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1080/19396368.2024.2368716\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/22 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"ANDROLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Systems Biology in Reproductive Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1080/19396368.2024.2368716","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/22 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"ANDROLOGY","Score":null,"Total":0}
引用次数: 0
摘要
精子表观遗传特征的评估对体细胞污染很敏感,体细胞污染会影响基因启动子的甲基化信号。在对精子数量较少的样本进行 DNA 甲基化评估时,这种污染尤其容易造成问题,因为少量的体细胞 DNA 会导致甲基化状态的显著变化。本研究提出了一种检测可能的体细胞污染的新方法,通过两个多区域生物信息模型来实现:传统的差异甲基化分析和机器学习逻辑回归模型。这些模型在公开的精子(n = 489)和血液(n = 1029)DNA甲基化阵列数据上进行了训练,并在污染集上进行了测试,污染集是将四名精子数量正常的捐献者的精子在 450k 甲基化阵列上运行,每个阵列有四种排列组合,包括纯血、按 DNA 浓度计算的半血半精子、按细胞数量计算的半血半精子和纯精子(n = 16)。DMR 和逻辑回归模型对污染测试集的分类准确率分别为 100%和 94%。通过这些检测体细胞污染影响的新方法,可以更准确地区分含有生物类体细胞转变的表观遗传图谱和因污染而具有类体细胞特征的表观遗传图谱。
Novel bioinformatic analyses of somatic cell contamination in sperm samples.
The assessment of epigenetic profiles in sperm is sensitive to somatic cell contamination, which can influence methylation signals at gene promoters. This contamination is particularly problematic in the assessment of DNA methylation in samples with low sperm counts, where fractional amounts of somatic cell DNA can lead to significant shifts in measured methylation state. In this study, a new method of detecting possible somatic cell contamination is proposed through two multi-region bioinformatic models: a traditional differential methylation analysis and a machine learning logistic regression model. These models were trained on publicly available sperm (n = 489) and blood (n = 1029) DNA methylation array data and tested on a contamination set, wherein the sperm of four donors with normal sperm counts were run on a 450k methylation array with four permutations each, including pure blood, half blood and half sperm by DNA concentration, half blood and half sperm by cell count, and pure sperm (n = 16). The DMR and logistic regression model classified the contamination testing set with 100% and 94% accuracy, respectively. These new methods of detecting the effects of somatic cell contamination allow for more accurate differentiation between epigenetic profiles that contain a biological somatic-like shift and those that have somatic-like signatures because of contamination.
期刊介绍:
Systems Biology in Reproductive Medicine, SBiRM, publishes Research Articles, Communications, Applications Notes that include protocols a Clinical Corner that includes case reports, Review Articles and Hypotheses and Letters to the Editor on human and animal reproduction. The journal will highlight the use of systems approaches including genomic, cellular, proteomic, metabolomic, bioinformatic, molecular, and biochemical, to address fundamental questions in reproductive biology, reproductive medicine, and translational research. The journal publishes research involving human and animal gametes, stem cells, developmental biology and toxicology, and clinical care in reproductive medicine. Specific areas of interest to the journal include: male factor infertility and germ cell biology, reproductive technologies (gamete micro-manipulation and cryopreservation, in vitro fertilization/embryo transfer (IVF/ET) and contraception. Research that is directed towards developing new or enhanced technologies for clinical medicine or scientific research in reproduction is of significant interest to the journal.