A framework for the application of decision trees to the analysis of SNPs data

L. Fiaschi, J. Garibaldi, N. Krasnogor
{"title":"A framework for the application of decision trees to the analysis of SNPs data","authors":"L. Fiaschi, J. Garibaldi, N. Krasnogor","doi":"10.1109/CIBCB.2009.4925715","DOIUrl":null,"url":null,"abstract":"Data mining is the analysis of experimental datasets to extract trends and relationships that can be meaningful for the user. In genetic studies these techniques have revealed interesting findings, especially in the heritable predisposition to contract specific diseases. One of these diseases which is still under extensive analysis is pre-eclampsia, a progressive disorder which occurs during pregnancy and soon after the birth, affecting both the mothers and their babies. There are many choices to be made in the application of the various data mining techniques that may be used to study general genotype-phenotype associations. The aim of this paper is to describe the general framework that we adopted in the application of decision tree algorithms to the analysis of SNPs data related to cases of pre-eclampsia. The results show the validity of this methodology to detect a subset of attributes associated with the predictable variable, providing a reduction in the size of the dataset. Moreover, from the clinical point of view, it confirmed the medical interpretation of the ‘corrected birth-weight centile’ (CBC) value of 10 being a meaningful cut-off and confirmed association between an infant's CBC and the ‘week of delivery’ parameter. We hope that the generic framework described here will be of use to other researchers analysing such data.","PeriodicalId":162052,"journal":{"name":"2009 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"151 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2009.4925715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Data mining is the analysis of experimental datasets to extract trends and relationships that can be meaningful for the user. In genetic studies these techniques have revealed interesting findings, especially in the heritable predisposition to contract specific diseases. One of these diseases which is still under extensive analysis is pre-eclampsia, a progressive disorder which occurs during pregnancy and soon after the birth, affecting both the mothers and their babies. There are many choices to be made in the application of the various data mining techniques that may be used to study general genotype-phenotype associations. The aim of this paper is to describe the general framework that we adopted in the application of decision tree algorithms to the analysis of SNPs data related to cases of pre-eclampsia. The results show the validity of this methodology to detect a subset of attributes associated with the predictable variable, providing a reduction in the size of the dataset. Moreover, from the clinical point of view, it confirmed the medical interpretation of the ‘corrected birth-weight centile’ (CBC) value of 10 being a meaningful cut-off and confirmed association between an infant's CBC and the ‘week of delivery’ parameter. We hope that the generic framework described here will be of use to other researchers analysing such data.
应用决策树分析snp数据的框架
数据挖掘是对实验数据集的分析,以提取对用户有意义的趋势和关系。在遗传学研究中,这些技术揭示了有趣的发现,特别是在遗传易感性感染特定疾病方面。其中一种仍在进行广泛分析的疾病是先兆子痫,这是一种发生在怀孕期间和分娩后不久的进行性疾病,对母亲和婴儿都有影响。在应用各种数据挖掘技术来研究一般的基因型-表型关联时,有许多选择要做。本文的目的是描述我们在应用决策树算法分析与先兆子痫病例相关的snp数据时采用的一般框架。结果表明,这种方法在检测与可预测变量相关的属性子集时是有效的,从而减少了数据集的大小。此外,从临床角度来看,它证实了“校正出生体重百分位数”(CBC)值10的医学解释是一个有意义的截止值,并证实了婴儿CBC与“分娩周数”参数之间的关联。我们希望这里描述的通用框架将对其他研究人员分析此类数据有用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信