约束约束关联规则挖掘识别强snp对自闭症数据进行分类

U. M, P. Rajashekar
{"title":"约束约束关联规则挖掘识别强snp对自闭症数据进行分类","authors":"U. M, P. Rajashekar","doi":"10.1109/C2I451079.2020.9368948","DOIUrl":null,"url":null,"abstract":"Autism is a heterogeneous neuro developmental disorder found among all age groups. Nowadays more patients are detected with autism but very less awareness is prevailing in the society related to it. This paved a way for many researchers to carry out serious study on autism and its characteristics. Studying behavior and characteristics of Autistic patients is very important for diagnosing the level of autism. Classifying the association of different characteristic in autistic patients at gene level using machine learning techniques can give an important insight to the doctors and the care takers of the patients. Research is being carried out to identify the genes responsible for autism. The changes in gene sequence may lead to different characteristics in different people. Thus genotypic research is found to reveal well defined insight about various characteristics in autistic patients and their associations with genes. Single Nucleotide Polymorphism (SNP) being high in features indicate human genome variability and is associated with identification of traits for many human diseases including autism. The main aim of the proposed work is to identify SNP sequences which are responsible for carrying the autistic traits. This paper explore the application of Constraint Governed Association Rule Mining (CGARM) technique on SNP data for dimensionality reduction and thereby selecting the strong predominant SNP features which are relevant enough to accomplish classification with high accuracy. The research work incorporates the application of CGARM and is carried out in two stages. In the first stage CGARM was used to choose significant SNP features resulting in dimensionality reduction. In the second stage classification was carried out by subjecting the selected features to Artificial Neural Network (ANN) algorithm. The main advantage of the proposed work is its ability to reduce the dimensions without compromising the quality i.e. using CGARM strong SNPs were selected by applying various constraints like Syntactical constraints, Semantical constraints and Dimensionality Constraints resulting in higher accuracy. The CGARM technique is applied on Autism data collected from National Center for Biotechnology Information (NCBI) repository. The data is divided into a set of 118 features, out of 118 features CGARM contributed in identifying 22 predominant SNPs. Further by applying forward selection method top 17 features were selected and were given as input to ANN. The 10 fold cross validation resulted in 76.9% accuracy which was found to be 50% more than that of original features. The proposed work contributed in reducing the dimension by 85% and provided 76.9% accuracy with the help of only 15% features.","PeriodicalId":354259,"journal":{"name":"2020 International Conference on Communication, Computing and Industry 4.0 (C2I4)","volume":"234 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Constraint Governed Association Rule Mining for Identification of Strong SNPs to Classify Autism Data\",\"authors\":\"U. M, P. Rajashekar\",\"doi\":\"10.1109/C2I451079.2020.9368948\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Autism is a heterogeneous neuro developmental disorder found among all age groups. Nowadays more patients are detected with autism but very less awareness is prevailing in the society related to it. This paved a way for many researchers to carry out serious study on autism and its characteristics. Studying behavior and characteristics of Autistic patients is very important for diagnosing the level of autism. Classifying the association of different characteristic in autistic patients at gene level using machine learning techniques can give an important insight to the doctors and the care takers of the patients. Research is being carried out to identify the genes responsible for autism. The changes in gene sequence may lead to different characteristics in different people. Thus genotypic research is found to reveal well defined insight about various characteristics in autistic patients and their associations with genes. Single Nucleotide Polymorphism (SNP) being high in features indicate human genome variability and is associated with identification of traits for many human diseases including autism. The main aim of the proposed work is to identify SNP sequences which are responsible for carrying the autistic traits. This paper explore the application of Constraint Governed Association Rule Mining (CGARM) technique on SNP data for dimensionality reduction and thereby selecting the strong predominant SNP features which are relevant enough to accomplish classification with high accuracy. The research work incorporates the application of CGARM and is carried out in two stages. In the first stage CGARM was used to choose significant SNP features resulting in dimensionality reduction. In the second stage classification was carried out by subjecting the selected features to Artificial Neural Network (ANN) algorithm. The main advantage of the proposed work is its ability to reduce the dimensions without compromising the quality i.e. using CGARM strong SNPs were selected by applying various constraints like Syntactical constraints, Semantical constraints and Dimensionality Constraints resulting in higher accuracy. The CGARM technique is applied on Autism data collected from National Center for Biotechnology Information (NCBI) repository. The data is divided into a set of 118 features, out of 118 features CGARM contributed in identifying 22 predominant SNPs. Further by applying forward selection method top 17 features were selected and were given as input to ANN. The 10 fold cross validation resulted in 76.9% accuracy which was found to be 50% more than that of original features. The proposed work contributed in reducing the dimension by 85% and provided 76.9% accuracy with the help of only 15% features.\",\"PeriodicalId\":354259,\"journal\":{\"name\":\"2020 International Conference on Communication, Computing and Industry 4.0 (C2I4)\",\"volume\":\"234 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Communication, Computing and Industry 4.0 (C2I4)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/C2I451079.2020.9368948\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Communication, Computing and Industry 4.0 (C2I4)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/C2I451079.2020.9368948","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

自闭症是一种存在于所有年龄组的异质神经发育障碍。如今,越来越多的患者被发现患有自闭症,但社会上对自闭症的认识却很少。这为许多研究人员对自闭症及其特征进行认真的研究铺平了道路。研究自闭症患者的行为特征对诊断自闭症的水平具有重要意义。利用机器学习技术在基因水平上对自闭症患者不同特征的关联进行分类,可以为医生和患者的护理人员提供重要的见解。目前正在进行研究,以确定导致自闭症的基因。基因序列的变化可能导致不同人的不同特征。因此,基因型研究被发现揭示了自闭症患者的各种特征及其与基因的关系。单核苷酸多态性(SNP)的高特征表明人类基因组变异性,并与包括自闭症在内的许多人类疾病的特征鉴定有关。这项工作的主要目的是确定携带自闭症特征的SNP序列。本文探讨了约束治理关联规则挖掘(CGARM)技术在SNP数据上的应用,对SNP数据进行降维,从而选择相关度高的强优势SNP特征,实现高准确率的分类。研究工作结合了CGARM的应用,分两个阶段进行。在第一阶段,使用CGARM选择显著的SNP特征导致降维。在第二阶段,将选择的特征应用于人工神经网络(ANN)算法进行分类。所提出的工作的主要优点是它能够在不影响质量的情况下降低维度,即使用CGARM,通过应用各种约束(如语法约束、语义约束和维度约束)来选择强snp,从而提高准确性。将CGARM技术应用于国家生物技术信息中心(NCBI)数据库中收集的自闭症数据。数据被分成118个特征,在118个特征中,CGARM贡献了22个主要snp。进一步采用前向选择方法,选择前17个特征作为神经网络的输入。10倍交叉验证的准确率为76.9%,比原特征准确率提高了50%。在仅使用15%的特征的情况下,所提出的工作将尺寸降低了85%,并提供了76.9%的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Constraint Governed Association Rule Mining for Identification of Strong SNPs to Classify Autism Data
Autism is a heterogeneous neuro developmental disorder found among all age groups. Nowadays more patients are detected with autism but very less awareness is prevailing in the society related to it. This paved a way for many researchers to carry out serious study on autism and its characteristics. Studying behavior and characteristics of Autistic patients is very important for diagnosing the level of autism. Classifying the association of different characteristic in autistic patients at gene level using machine learning techniques can give an important insight to the doctors and the care takers of the patients. Research is being carried out to identify the genes responsible for autism. The changes in gene sequence may lead to different characteristics in different people. Thus genotypic research is found to reveal well defined insight about various characteristics in autistic patients and their associations with genes. Single Nucleotide Polymorphism (SNP) being high in features indicate human genome variability and is associated with identification of traits for many human diseases including autism. The main aim of the proposed work is to identify SNP sequences which are responsible for carrying the autistic traits. This paper explore the application of Constraint Governed Association Rule Mining (CGARM) technique on SNP data for dimensionality reduction and thereby selecting the strong predominant SNP features which are relevant enough to accomplish classification with high accuracy. The research work incorporates the application of CGARM and is carried out in two stages. In the first stage CGARM was used to choose significant SNP features resulting in dimensionality reduction. In the second stage classification was carried out by subjecting the selected features to Artificial Neural Network (ANN) algorithm. The main advantage of the proposed work is its ability to reduce the dimensions without compromising the quality i.e. using CGARM strong SNPs were selected by applying various constraints like Syntactical constraints, Semantical constraints and Dimensionality Constraints resulting in higher accuracy. The CGARM technique is applied on Autism data collected from National Center for Biotechnology Information (NCBI) repository. The data is divided into a set of 118 features, out of 118 features CGARM contributed in identifying 22 predominant SNPs. Further by applying forward selection method top 17 features were selected and were given as input to ANN. The 10 fold cross validation resulted in 76.9% accuracy which was found to be 50% more than that of original features. The proposed work contributed in reducing the dimension by 85% and provided 76.9% accuracy with the help of only 15% features.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信