Feature Sorting Algorithm Based on XGBoost and MIC Combination Model

International Journal of Advanced Network, Monitoring and Controls Pub Date : 2021-01-01 DOI:10.21307/ijanmc-2021-037

Gao Xiang, Yu Jun, Huo Zhiyi, Huang Yuzhe

{"title":"Feature Sorting Algorithm Based on XGBoost and MIC Combination Model","authors":"Gao Xiang, Yu Jun, Huo Zhiyi, Huang Yuzhe","doi":"10.21307/ijanmc-2021-037","DOIUrl":null,"url":null,"abstract":"Abstract Feature ranking can not only help the data analysis system improve efficiency, but also reduce the interference of redundant features and irrelevant features to the results. At present, feature ranking of massive data is an important and difficult problem. In order to solve the above problems, this paper proposes a feature importance ranking algorithm based on XGBoost and MIC model by analyzing the existing algorithm models. Firstly, XGBoost model and MIC model are established respectively; Then, the results of the above two models are weighted and combined by the error reciprocal method. XGBoost model has the advantages of high efficiency, flexibility and portability, while MIC model has universality and easy parameter adjustment. The resulting XGBoost MIC combination model has both advantages; Finally, the first mock exam is used as a sample set of data for anticancer drug candidates. After preprocessing the data set, the XGBoost-MIC combination model is used to analyze the case. At the same time, the calculation results of a single model are calculated, and the model is optimized by adjusting the parameters of the model. The results show that the error of the first mock exam is obviously lower than that of the single calculation model, and the accuracy of the XGBoost-MIC is 0.75, which is 0.02 higher than that of the single model.","PeriodicalId":193299,"journal":{"name":"International Journal of Advanced Network, Monitoring and Controls","volume":"349 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Network, Monitoring and Controls","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21307/ijanmc-2021-037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract Feature ranking can not only help the data analysis system improve efficiency, but also reduce the interference of redundant features and irrelevant features to the results. At present, feature ranking of massive data is an important and difficult problem. In order to solve the above problems, this paper proposes a feature importance ranking algorithm based on XGBoost and MIC model by analyzing the existing algorithm models. Firstly, XGBoost model and MIC model are established respectively; Then, the results of the above two models are weighted and combined by the error reciprocal method. XGBoost model has the advantages of high efficiency, flexibility and portability, while MIC model has universality and easy parameter adjustment. The resulting XGBoost MIC combination model has both advantages; Finally, the first mock exam is used as a sample set of data for anticancer drug candidates. After preprocessing the data set, the XGBoost-MIC combination model is used to analyze the case. At the same time, the calculation results of a single model are calculated, and the model is optimized by adjusting the parameters of the model. The results show that the error of the first mock exam is obviously lower than that of the single calculation model, and the accuracy of the XGBoost-MIC is 0.75, which is 0.02 higher than that of the single model.

查看原文本刊更多论文

基于XGBoost和MIC组合模型的特征排序算法

特征排序不仅可以帮助数据分析系统提高效率，还可以减少冗余特征和不相关特征对分析结果的干扰。目前，海量数据的特征排序是一个重要而又困难的问题。为了解决上述问题，本文在分析现有算法模型的基础上，提出了一种基于XGBoost和MIC模型的特征重要性排序算法。首先，分别建立XGBoost模型和MIC模型;然后，采用误差倒数法对上述两种模型的结果进行加权组合。XGBoost模型具有高效、灵活、便携等优点，MIC模型具有通用性和参数调整方便等优点。由此产生的XGBoost MIC组合模型具有这两种优点;最后，将第一次模拟考试作为抗癌候选药物的样本数据集。对数据集进行预处理后，采用XGBoost-MIC组合模型对案例进行分析。同时，对单个模型的计算结果进行了计算，并通过调整模型参数对模型进行了优化。结果表明，第一次模拟考试的误差明显低于单一计算模型，XGBoost-MIC的精度为0.75，比单一模型高0.02。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Advanced Network, Monitoring and Controls

自引率

0.00%

发文量