Prediction of Autism-Related Genes Using a New Clustering-Based Under-Sampling Method

Xuan Tho Dang, Duong Hung Bui, Thi Hong Nguyen, T. Nguyen, D. Tran
{"title":"Prediction of Autism-Related Genes Using a New Clustering-Based Under-Sampling Method","authors":"Xuan Tho Dang, Duong Hung Bui, Thi Hong Nguyen, T. Nguyen, D. Tran","doi":"10.1109/KSE.2019.8919377","DOIUrl":null,"url":null,"abstract":"Autism is one of the neurological disorders that occurs in children. There are many causes of autism, one of which is genetic factors. Therefore, in order to find effective treatments, we need to discover the genes which relate to autism disease. In this paper, we use a computational approach to train a model that can predict new autism-related candidate genes. The methodology combines different data sources such as protein-protein interaction networks, microRNAs (miRNA)-target network and known autism-related genes into an integrated network. The structural properties of this network are represented as a vector dataset and a binary classification problem is formulated. However, because the number of known autism-related genes is very small, we face an imbalance data classification problem. To solve this issue, an under-sampling clustering-based data balancing algorithm has been proposed. Training classifiers with machine learning models such as SVMs, k-NN, and RFs, we obtained results of 1-3% higher in G-mean measures when comparing to cases without using any data balancing strategies. These results implied that our proposed model may contribute to finding new autism-related gene candidates.","PeriodicalId":439841,"journal":{"name":"2019 11th International Conference on Knowledge and Systems Engineering (KSE)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 11th International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE.2019.8919377","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Autism is one of the neurological disorders that occurs in children. There are many causes of autism, one of which is genetic factors. Therefore, in order to find effective treatments, we need to discover the genes which relate to autism disease. In this paper, we use a computational approach to train a model that can predict new autism-related candidate genes. The methodology combines different data sources such as protein-protein interaction networks, microRNAs (miRNA)-target network and known autism-related genes into an integrated network. The structural properties of this network are represented as a vector dataset and a binary classification problem is formulated. However, because the number of known autism-related genes is very small, we face an imbalance data classification problem. To solve this issue, an under-sampling clustering-based data balancing algorithm has been proposed. Training classifiers with machine learning models such as SVMs, k-NN, and RFs, we obtained results of 1-3% higher in G-mean measures when comparing to cases without using any data balancing strategies. These results implied that our proposed model may contribute to finding new autism-related gene candidates.
基于聚类的欠采样新方法预测自闭症相关基因
自闭症是一种发生在儿童身上的神经系统疾病。自闭症有很多原因,其中之一是遗传因素。因此,为了找到有效的治疗方法,我们需要发现与自闭症疾病相关的基因。在本文中,我们使用一种计算方法来训练一个模型,该模型可以预测新的自闭症相关候选基因。该方法将不同的数据来源,如蛋白质-蛋白质相互作用网络、microRNAs (miRNA)靶点网络和已知的自闭症相关基因整合成一个综合网络。将该网络的结构特性表示为一个向量数据集,并提出了一个二元分类问题。然而,由于已知的自闭症相关基因数量很少,我们面临着数据分类不平衡的问题。为了解决这一问题,提出了一种基于欠采样聚类的数据均衡算法。使用svm、k-NN和RFs等机器学习模型训练分类器,与不使用任何数据平衡策略的情况相比,我们获得的g均值测量结果高出1-3%。这些结果表明,我们提出的模型可能有助于发现新的自闭症相关基因候选者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信