Robust correlation feature selection based support vector machine approach for high dimensional datasets

IF 3.2 Q3 Mathematics

Results in Control and Optimization Pub Date : 2025-09-11 DOI:10.1016/j.rico.2025.100609

Ishaq Abdullahi Baba , Mohammed Bappah Mohammed , Kamal Bakari Jillahi , Aliyu Umar , Hasan Talib Hendi

{"title":"Robust correlation feature selection based support vector machine approach for high dimensional datasets","authors":"Ishaq Abdullahi Baba , Mohammed Bappah Mohammed , Kamal Bakari Jillahi , Aliyu Umar , Hasan Talib Hendi","doi":"10.1016/j.rico.2025.100609","DOIUrl":null,"url":null,"abstract":"<div><div>Correlation-based feature selection methods are popular tools used to select the most important variables to include the true model in the analysis of sparse and high-dimensional models. In application, the presence of anomalous observations in both predictors and responses can seriously jeopardize the prediction accuracy of the model, which in turn leads to misleading interpretations and conclusions if not correctly addressed. Furthermore, the cause of dimensionality is another serious difficulty facing many existing feature selection algorithms. To achieve more reliable feature selection and prediction accuracy, a weighted sure independence screening-based support vector machine for high-dimensional datasets is proposed. The key contribution of our proposed method is that it minimizes the influence of outliers in differentiating between significant and insignificant features and improves predictability and interpretability. Our method consists of three basic steps. In the first step, a weights-based modified reweighted fast, consistent, and high break-down point is computed. The second step utilizes the estimates of weights from the first step to select the most important variables for the model. The third step employs the support vector machine algorithm to calculate prediction values. To demonstrate the effectiveness of the developed procedure, we used both simulation and real-life data examples. Our results show that the proposed methods performs better with a clear margin compared to other procedures.</div></div>","PeriodicalId":34733,"journal":{"name":"Results in Control and Optimization","volume":"21 ","pages":"Article 100609"},"PeriodicalIF":3.2000,"publicationDate":"2025-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Results in Control and Optimization","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666720725000943","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 0

Abstract

Correlation-based feature selection methods are popular tools used to select the most important variables to include the true model in the analysis of sparse and high-dimensional models. In application, the presence of anomalous observations in both predictors and responses can seriously jeopardize the prediction accuracy of the model, which in turn leads to misleading interpretations and conclusions if not correctly addressed. Furthermore, the cause of dimensionality is another serious difficulty facing many existing feature selection algorithms. To achieve more reliable feature selection and prediction accuracy, a weighted sure independence screening-based support vector machine for high-dimensional datasets is proposed. The key contribution of our proposed method is that it minimizes the influence of outliers in differentiating between significant and insignificant features and improves predictability and interpretability. Our method consists of three basic steps. In the first step, a weights-based modified reweighted fast, consistent, and high break-down point is computed. The second step utilizes the estimates of weights from the first step to select the most important variables for the model. The third step employs the support vector machine algorithm to calculate prediction values. To demonstrate the effectiveness of the developed procedure, we used both simulation and real-life data examples. Our results show that the proposed methods performs better with a clear margin compared to other procedures.

查看原文本刊更多论文

基于鲁棒相关特征选择的高维数据支持向量机方法

基于相关性的特征选择方法是一种常用的工具，用于在稀疏和高维模型分析中选择最重要的变量以包含真实模型。在实际应用中，预测因子和响应中存在的异常观测会严重危及模型的预测精度，如果处理不当，反过来会导致误导性的解释和结论。此外，维数原因是许多现有特征选择算法面临的另一个严重困难。为了获得更可靠的特征选择和预测精度，提出了一种基于加权确定独立筛选的高维数据集支持向量机。我们提出的方法的关键贡献在于，它最大限度地减少了异常值在区分重要和不重要特征时的影响，并提高了可预测性和可解释性。我们的方法包括三个基本步骤。第一步，计算基于权重的修正重加权快速、一致、高分解点。第二步利用第一步的权重估计来为模型选择最重要的变量。第三步采用支持向量机算法计算预测值。为了证明所开发程序的有效性，我们使用了模拟和实际数据示例。我们的结果表明，与其他方法相比，所提出的方法具有明显的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊