Theory-based Model Validation in the Generalized Multifactor Dimensionality Reduction Algorithm for Ordinal Phenotypes

IRAQI JOURNAL OF STATISTICAL SCIENCES Pub Date : 2023-12-01 DOI:10.33899/iqjoss.2023.181255

Mohammed Othman, Zaid Al-Khaledi

{"title":"Theory-based Model Validation in the Generalized Multifactor Dimensionality Reduction Algorithm for Ordinal Phenotypes ","authors":"Mohammed Othman, Zaid Al-Khaledi","doi":"10.33899/iqjoss.2023.181255","DOIUrl":null,"url":null,"abstract":"Clinical studies indicate a close relationship between some diseases and the presence of specific interactions between genetic factors. As is the case in many studies, revealing genetic interactions that have a significant impact on the emergence of genetic diseases requires extensive statistical analyses. Because of the enormous volume of genetic data in the human race, it was necessary to develop statistical methods adapted to deal with high-dimensional data. Multifactor Dimensionality Reduction (MDR) is one of the leading nonparametric algorithms in this field. The algorithm reduces the dimensions of genetic data to obtain the most important interaction that has a direct impact on increasing the likelihood of genetic diseases appearing. In its composition, the algorithm relies on a set of nonparametric procedures to diagnose genetic interference with the highest impact exclusively on binary response variables. Like any statistical method, this algorithm is not devoid of weaknesses and application limitations, so the algorithm had to be developed to overcome the obstacles. One of the weaknesses of this algorithm is that the algorithm cannot handle data sets with ordinal response variable. Some researchers have developed a generalization of the multifactor dimensionality reduction algorithm to enable it to work with ordinal data. However, the generalized algorithm is more complex than the original algorithm. Therefore, we proposed developing the original algorithm in a simple way by employing ordinal logistic regression to classify individuals in the sample, while keeping all steps of the original algorithm unchanged. On the other hand, the MDR algorithm adopts a non-parametric method to verify the significance of the interferences nominated in the algorithm. This nonparametric procedure is based on the idea of permutational tests, and it consumes a very long time compared to parametric procedures that relies on theoretical approaches. Some researchers have suggested using the generalized extreme value distribution to verify the statistical significance of candidate interactions, but this method has only been used with continuous and binary dependent variables. In this research, the theoretical method based on the generalized extreme value distribution was employed instead of the permutational tests adopted in the algorithm when the response variable is of the ordinal type.","PeriodicalId":351789,"journal":{"name":"IRAQI JOURNAL OF STATISTICAL SCIENCES","volume":"299 ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IRAQI JOURNAL OF STATISTICAL SCIENCES","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33899/iqjoss.2023.181255","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Clinical studies indicate a close relationship between some diseases and the presence of specific interactions between genetic factors. As is the case in many studies, revealing genetic interactions that have a significant impact on the emergence of genetic diseases requires extensive statistical analyses. Because of the enormous volume of genetic data in the human race, it was necessary to develop statistical methods adapted to deal with high-dimensional data. Multifactor Dimensionality Reduction (MDR) is one of the leading nonparametric algorithms in this field. The algorithm reduces the dimensions of genetic data to obtain the most important interaction that has a direct impact on increasing the likelihood of genetic diseases appearing. In its composition, the algorithm relies on a set of nonparametric procedures to diagnose genetic interference with the highest impact exclusively on binary response variables. Like any statistical method, this algorithm is not devoid of weaknesses and application limitations, so the algorithm had to be developed to overcome the obstacles. One of the weaknesses of this algorithm is that the algorithm cannot handle data sets with ordinal response variable. Some researchers have developed a generalization of the multifactor dimensionality reduction algorithm to enable it to work with ordinal data. However, the generalized algorithm is more complex than the original algorithm. Therefore, we proposed developing the original algorithm in a simple way by employing ordinal logistic regression to classify individuals in the sample, while keeping all steps of the original algorithm unchanged. On the other hand, the MDR algorithm adopts a non-parametric method to verify the significance of the interferences nominated in the algorithm. This nonparametric procedure is based on the idea of permutational tests, and it consumes a very long time compared to parametric procedures that relies on theoretical approaches. Some researchers have suggested using the generalized extreme value distribution to verify the statistical significance of candidate interactions, but this method has only been used with continuous and binary dependent variables. In this research, the theoretical method based on the generalized extreme value distribution was employed instead of the permutational tests adopted in the algorithm when the response variable is of the ordinal type.

查看原文本刊更多论文

基于理论的正序表型广义多因素降维算法模型验证

临床研究表明，某些疾病与遗传因素之间存在特定的相互作用关系密切。与许多研究一样，要揭示对遗传疾病的出现有重大影响的遗传相互作用，需要进行大量的统计分析。由于人类的遗传数据量巨大，因此有必要开发适用于处理高维数据的统计方法。多因子降维法（MDR）是这一领域领先的非参数算法之一。该算法降低遗传数据的维度，以获得对增加遗传疾病出现的可能性有直接影响的最重要的交互作用。在其组成中，该算法依靠一套非参数程序来诊断对二元响应变量影响最大的遗传干扰。与任何统计方法一样，该算法也不乏弱点和应用限制，因此必须开发该算法以克服障碍。该算法的弱点之一是无法处理具有顺序响应变量的数据集。一些研究人员开发了多因素降维算法的广义算法，使其能够处理序数数据。然而，广义算法比原始算法更加复杂。因此，我们建议在保持原始算法所有步骤不变的情况下，采用序数逻辑回归对样本中的个体进行分类，从而以简单的方式发展原始算法。另一方面，MDR 算法采用了一种非参数方法来验证算法中提名干扰的显著性。这种非参数程序基于排列检验的思想，与依赖理论方法的参数程序相比，耗时很长。一些研究人员建议使用广义极值分布来验证候选交互作用的统计意义，但这种方法只用于连续和二元因变量。在本研究中，当响应变量为序数类型时，采用了基于广义极值分布的理论方法，而不是算法中采用的排列检验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IRAQI JOURNAL OF STATISTICAL SCIENCES

自引率

0.00%

发文量