On the Utility of Parents' Historical Data to Investigate the Causes of Autism Spectrum Disorder: A Data Mining-Based Framework

IF 4.2 4区医学 Q1 ENGINEERING, BIOMEDICAL

Irbm Pub Date : 2023-08-01 DOI:10.1016/j.irbm.2023.100780

Zahid Halim , Gohar Khan , Babar Shah , Rabia Naseer , Sajid Anwar , Ahsan Shah

{"title":"On the Utility of Parents' Historical Data to Investigate the Causes of Autism Spectrum Disorder: A Data Mining-Based Framework","authors":"Zahid Halim , Gohar Khan , Babar Shah , Rabia Naseer , Sajid Anwar , Ahsan Shah","doi":"10.1016/j.irbm.2023.100780","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><p>Autism Spectrum Disorder (ASD) is acknowledged as a challenge that influences the learning ability of adolescents and also negatively impacts their families. Autism may be caused due to environmental exposure or genetically inherited disorder, however, no definitive or universally customary reasons are known. This makes the issue fairly challenging.</p></div><div><h3>Material and methods</h3><p><span>This work focuses on identifying the reasons of ASD utilizing computational methods. For this, data is collected that focuses on parental history for finding the trigged features by reviewing antenatal, perinatal, and infant hazard factors of ASD. Afterwards, ML techniques are applied on the collected instances to develop a predictive model and identify the reasons to ASD. While collecting the data, samples are obtained for ASD and non-ASD individuals both. A total of 115 features are obtained from each subject. The collected dataset has 47% samples of the subjects with ASD. Dimensionality reduction, and four feature selection methods are applied on the data to eliminate noise and least valued features. The data is verified using two clustering techniques, i.e., </span><em>k</em>-means and <em>k</em>-medoid. To validate the clustering results five clustering validation indices are used. Later, three classifiers, i.e. <em>k</em>-nearest neighbor (<em>k</em><span><span>-NN), Support Vector Machine (SVM), and </span>Artificial Neural Network (ANN) are trained to predict cases with ASD. The frequent items mining technique and the descriptive analysis of the clustered data are utilized to identify the factors that may cause ASD.</span></p></div><div><h3>Results</h3><p>The proposed framework enables to identify the features that may contribute towards ASD. Whereas, for the classification part, SVM classifier performs better than others do with an average accuracy of 98.34% in predicting the ASD cases.</p></div><div><h3>Conclusion</h3><p><span>The results identified stress as the dominant feature and environmental factors, like frequent use of canned food and plastic/steel bottles during </span>fertilization period that may contribute towards ASD.</p></div>","PeriodicalId":14605,"journal":{"name":"Irbm","volume":"44 4","pages":"Article 100780"},"PeriodicalIF":4.2000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Irbm","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1959031823000295","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, BIOMEDICAL","Score":null,"Total":0}

引用次数: 1

Abstract

Objective

Autism Spectrum Disorder (ASD) is acknowledged as a challenge that influences the learning ability of adolescents and also negatively impacts their families. Autism may be caused due to environmental exposure or genetically inherited disorder, however, no definitive or universally customary reasons are known. This makes the issue fairly challenging.

Material and methods

This work focuses on identifying the reasons of ASD utilizing computational methods. For this, data is collected that focuses on parental history for finding the trigged features by reviewing antenatal, perinatal, and infant hazard factors of ASD. Afterwards, ML techniques are applied on the collected instances to develop a predictive model and identify the reasons to ASD. While collecting the data, samples are obtained for ASD and non-ASD individuals both. A total of 115 features are obtained from each subject. The collected dataset has 47% samples of the subjects with ASD. Dimensionality reduction, and four feature selection methods are applied on the data to eliminate noise and least valued features. The data is verified using two clustering techniques, i.e., k-means and k-medoid. To validate the clustering results five clustering validation indices are used. Later, three classifiers, i.e. k-nearest neighbor (k-NN), Support Vector Machine (SVM), and Artificial Neural Network (ANN) are trained to predict cases with ASD. The frequent items mining technique and the descriptive analysis of the clustered data are utilized to identify the factors that may cause ASD.

Results

The proposed framework enables to identify the features that may contribute towards ASD. Whereas, for the classification part, SVM classifier performs better than others do with an average accuracy of 98.34% in predicting the ASD cases.

Conclusion

The results identified stress as the dominant feature and environmental factors, like frequent use of canned food and plastic/steel bottles during fertilization period that may contribute towards ASD.

Abstract Image

查看原文本刊更多论文

父母历史数据在自闭症谱系障碍病因调查中的应用:基于数据挖掘的框架

自闭症谱系障碍（ASD）被认为是一个影响青少年学习能力的挑战，也会对他们的家庭产生负面影响。自闭症可能是由环境暴露或遗传性疾病引起的，但目前还不知道确切的或普遍习惯的原因。这使得这个问题相当具有挑战性。材料和方法本工作的重点是利用计算方法确定ASD的原因。为此，收集的数据侧重于父母病史，通过回顾ASD的产前、围产期和婴儿危险因素来寻找触发特征。然后，将ML技术应用于收集的实例，以开发预测模型并确定ASD的原因。在收集数据的同时，获得了ASD和非ASD个体的样本。每个受试者总共获得115个特征。收集的数据集有47%的ASD受试者样本。对数据应用降维和四种特征选择方法来消除噪声和最小值特征。使用两种聚类技术验证数据，即k-means和k-medoid。为了验证聚类结果，使用了五个聚类验证指数。随后，训练三个分类器，即k近邻（k-NN）、支持向量机（SVM）和人工神经网络（ANN）来预测ASD病例。利用频繁项挖掘技术和聚类数据的描述性分析来识别可能导致ASD的因素。结果所提出的框架能够识别可能导致自闭症的特征。然而，在分类部分，SVM分类器在预测ASD病例方面比其他分类器表现更好，平均准确率为98.34%。结论压力是ASD的主要特征，环境因素，如受精期频繁使用罐头食品和塑料/钢瓶，可能导致ASD。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Irbm ENGINEERING, BIOMEDICAL-

CiteScore

10.30

自引率

4.20%

发文量

审稿时长

57 days

期刊介绍： IRBM is the journal of the AGBM (Alliance for engineering in Biology an Medicine / Alliance pour le génie biologique et médical) and the SFGBM (BioMedical Engineering French Society / Société française de génie biologique médical) and the AFIB (French Association of Biomedical Engineers / Association française des ingénieurs biomédicaux). As a vehicle of information and knowledge in the field of biomedical technologies, IRBM is devoted to fundamental as well as clinical research. Biomedical engineering and use of new technologies are the cornerstones of IRBM, providing authors and users with the latest information. Its six issues per year propose reviews (state-of-the-art and current knowledge), original articles directed at fundamental research and articles focusing on biomedical engineering. All articles are submitted to peer reviewers acting as guarantors for IRBM''s scientific and medical content. The field covered by IRBM includes all the discipline of Biomedical engineering. Thereby, the type of papers published include those that cover the technological and methodological development in: -Physiological and Biological Signal processing (EEG, MEG, ECG…)- Medical Image processing- Biomechanics- Biomaterials- Medical Physics- Biophysics- Physiological and Biological Sensors- Information technologies in healthcare- Disability research- Computational physiology- …