Theory-based Model Validation in the Generalized Multifactor Dimensionality Reduction Algorithm for Ordinal Phenotypes  

Mohammed Othman, Zaid Al-Khaledi
{"title":"Theory-based Model Validation in the Generalized Multifactor Dimensionality Reduction Algorithm for Ordinal Phenotypes  ","authors":"Mohammed Othman, Zaid Al-Khaledi","doi":"10.33899/iqjoss.2023.181255","DOIUrl":null,"url":null,"abstract":"Clinical studies indicate a close relationship between some diseases and the presence of specific interactions between genetic factors. As is the case in many studies, revealing genetic interactions that have a significant impact on the emergence of genetic diseases requires extensive statistical analyses. Because of the enormous volume of genetic data in the human race, it was necessary to develop statistical methods adapted to deal with high-dimensional data. Multifactor Dimensionality Reduction (MDR) is one of the leading nonparametric algorithms in this field. The algorithm reduces the dimensions of genetic data to obtain the most important interaction that has a direct impact on increasing the likelihood of genetic diseases appearing. In its composition, the algorithm relies on a set of nonparametric procedures to diagnose genetic interference with the highest impact exclusively on binary response variables. Like any statistical method, this algorithm is not devoid of weaknesses and application limitations, so the algorithm had to be developed to overcome the obstacles. One of the weaknesses of this algorithm is that the algorithm cannot handle data sets with ordinal response variable. Some researchers have developed a generalization of the multifactor dimensionality reduction algorithm to enable it to work with ordinal data. However, the generalized algorithm is more complex than the original algorithm. Therefore, we proposed developing the original algorithm in a simple way by employing ordinal logistic regression to classify individuals in the sample, while keeping all steps of the original algorithm unchanged. On the other hand, the MDR algorithm adopts a non-parametric method to verify the significance of the interferences nominated in the algorithm. This nonparametric procedure is based on the idea of permutational tests, and it consumes a very long time compared to parametric procedures that relies on theoretical approaches. Some researchers have suggested using the generalized extreme value distribution to verify the statistical significance of candidate interactions, but this method has only been used with continuous and binary dependent variables. In this research, the theoretical method based on the generalized extreme value distribution was employed instead of the permutational tests adopted in the algorithm when the response variable is of the ordinal type.","PeriodicalId":351789,"journal":{"name":"IRAQI JOURNAL OF STATISTICAL SCIENCES","volume":"299 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IRAQI JOURNAL OF STATISTICAL SCIENCES","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33899/iqjoss.2023.181255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Clinical studies indicate a close relationship between some diseases and the presence of specific interactions between genetic factors. As is the case in many studies, revealing genetic interactions that have a significant impact on the emergence of genetic diseases requires extensive statistical analyses. Because of the enormous volume of genetic data in the human race, it was necessary to develop statistical methods adapted to deal with high-dimensional data. Multifactor Dimensionality Reduction (MDR) is one of the leading nonparametric algorithms in this field. The algorithm reduces the dimensions of genetic data to obtain the most important interaction that has a direct impact on increasing the likelihood of genetic diseases appearing. In its composition, the algorithm relies on a set of nonparametric procedures to diagnose genetic interference with the highest impact exclusively on binary response variables. Like any statistical method, this algorithm is not devoid of weaknesses and application limitations, so the algorithm had to be developed to overcome the obstacles. One of the weaknesses of this algorithm is that the algorithm cannot handle data sets with ordinal response variable. Some researchers have developed a generalization of the multifactor dimensionality reduction algorithm to enable it to work with ordinal data. However, the generalized algorithm is more complex than the original algorithm. Therefore, we proposed developing the original algorithm in a simple way by employing ordinal logistic regression to classify individuals in the sample, while keeping all steps of the original algorithm unchanged. On the other hand, the MDR algorithm adopts a non-parametric method to verify the significance of the interferences nominated in the algorithm. This nonparametric procedure is based on the idea of permutational tests, and it consumes a very long time compared to parametric procedures that relies on theoretical approaches. Some researchers have suggested using the generalized extreme value distribution to verify the statistical significance of candidate interactions, but this method has only been used with continuous and binary dependent variables. In this research, the theoretical method based on the generalized extreme value distribution was employed instead of the permutational tests adopted in the algorithm when the response variable is of the ordinal type.
基于理论的正序表型广义多因素降维算法模型验证
临床研究表明,某些疾病与遗传因素之间存在特定的相互作用关系密切。与许多研究一样,要揭示对遗传疾病的出现有重大影响的遗传相互作用,需要进行大量的统计分析。由于人类的遗传数据量巨大,因此有必要开发适用于处理高维数据的统计方法。多因子降维法(MDR)是这一领域领先的非参数算法之一。该算法降低遗传数据的维度,以获得对增加遗传疾病出现的可能性有直接影响的最重要的交互作用。在其组成中,该算法依靠一套非参数程序来诊断对二元响应变量影响最大的遗传干扰。与任何统计方法一样,该算法也不乏弱点和应用限制,因此必须开发该算法以克服障碍。该算法的弱点之一是无法处理具有顺序响应变量的数据集。一些研究人员开发了多因素降维算法的广义算法,使其能够处理序数数据。然而,广义算法比原始算法更加复杂。因此,我们建议在保持原始算法所有步骤不变的情况下,采用序数逻辑回归对样本中的个体进行分类,从而以简单的方式发展原始算法。另一方面,MDR 算法采用了一种非参数方法来验证算法中提名干扰的显著性。这种非参数程序基于排列检验的思想,与依赖理论方法的参数程序相比,耗时很长。一些研究人员建议使用广义极值分布来验证候选交互作用的统计意义,但这种方法只用于连续和二元因变量。在本研究中,当响应变量为序数类型时,采用了基于广义极值分布的理论方法,而不是算法中采用的排列检验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信