基于相对节点图的稳健自训练算法

IF 3.4 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Intelligence Pub Date : 2024-11-16 DOI:10.1007/s10489-024-06062-0

Jikui Wang, Huiyu Duan, Cuihong Zhang, Feiping Nie

{"title":"基于相对节点图的稳健自训练算法","authors":"Jikui Wang, Huiyu Duan, Cuihong Zhang, Feiping Nie","doi":"10.1007/s10489-024-06062-0","DOIUrl":null,"url":null,"abstract":"<div><p>Self-training algorithm is a well-known framework of semi-supervised learning. How to select high-confidence samples is the key step for self-training algorithm. If high-confidence examples with incorrect labels are employed to train the classifier, the error will get worse during iterations. To improve the quality of high-confidence samples, a novel data editing technique termed Relative Node Graph Editing (RNGE) is put forward. Say concretely, mass estimation is used to calculate the density and peak of each sample to build a prototype tree to reveal the underlying spatial structure of the data. Then, we define the Relative Node Graph (RNG) for each sample. Finally, the mislabeled samples in the candidate high-confidence sample set are identified by hypothesis test based on RNG. Combined above, we propose a Robust Self-training Algorithm based on Relative Node Graph (STRNG), which uses RNGE to identify mislabeled samples and edit them. The experimental results show that the proposed algorithm can improve the performance of the self-training algorithm.</p></div>","PeriodicalId":8041,"journal":{"name":"Applied Intelligence","volume":"55 1","pages":""},"PeriodicalIF":3.4000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A robust self-training algorithm based on relative node graph\",\"authors\":\"Jikui Wang, Huiyu Duan, Cuihong Zhang, Feiping Nie\",\"doi\":\"10.1007/s10489-024-06062-0\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Self-training algorithm is a well-known framework of semi-supervised learning. How to select high-confidence samples is the key step for self-training algorithm. If high-confidence examples with incorrect labels are employed to train the classifier, the error will get worse during iterations. To improve the quality of high-confidence samples, a novel data editing technique termed Relative Node Graph Editing (RNGE) is put forward. Say concretely, mass estimation is used to calculate the density and peak of each sample to build a prototype tree to reveal the underlying spatial structure of the data. Then, we define the Relative Node Graph (RNG) for each sample. Finally, the mislabeled samples in the candidate high-confidence sample set are identified by hypothesis test based on RNG. Combined above, we propose a Robust Self-training Algorithm based on Relative Node Graph (STRNG), which uses RNGE to identify mislabeled samples and edit them. The experimental results show that the proposed algorithm can improve the performance of the self-training algorithm.</p></div>\",\"PeriodicalId\":8041,\"journal\":{\"name\":\"Applied Intelligence\",\"volume\":\"55 1\",\"pages\":\"\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10489-024-06062-0\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Intelligence","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10489-024-06062-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

自训练算法是一种著名的半监督学习框架。如何选择高置信度样本是自训练算法的关键步骤。如果采用标签不正确的高置信度样本来训练分类器，误差会在迭代过程中越来越大。为了提高高置信度样本的质量，我们提出了一种新的数据编辑技术，即相对节点图编辑（RNGE）。具体来说，通过质量估计来计算每个样本的密度和峰值，从而建立一棵原型树，揭示数据的潜在空间结构。然后，我们为每个样本定义相对节点图（RNG）。最后，通过基于 RNG 的假设检验来识别候选高置信度样本集中的错误标记样本。综合上述方法，我们提出了一种基于相对节点图的鲁棒自训练算法（STRNG），该算法利用相对节点图来识别误标注样本并对其进行编辑。实验结果表明，所提出的算法可以提高自训练算法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

A robust self-training algorithm based on relative node graph

查看原文本刊更多论文

A robust self-training algorithm based on relative node graph

Self-training algorithm is a well-known framework of semi-supervised learning. How to select high-confidence samples is the key step for self-training algorithm. If high-confidence examples with incorrect labels are employed to train the classifier, the error will get worse during iterations. To improve the quality of high-confidence samples, a novel data editing technique termed Relative Node Graph Editing (RNGE) is put forward. Say concretely, mass estimation is used to calculate the density and peak of each sample to build a prototype tree to reveal the underlying spatial structure of the data. Then, we define the Relative Node Graph (RNG) for each sample. Finally, the mislabeled samples in the candidate high-confidence sample set are identified by hypothesis test based on RNG. Combined above, we propose a Robust Self-training Algorithm based on Relative Node Graph (STRNG), which uses RNGE to identify mislabeled samples and edit them. The experimental results show that the proposed algorithm can improve the performance of the self-training algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applied Intelligence 工程技术-计算机：人工智能

CiteScore

6.60

自引率

20.80%

发文量

1361

审稿时长

5.9 months

期刊介绍： With a focus on research in artificial intelligence and neural networks, this journal addresses issues involving solutions of real-life manufacturing, defense, management, government and industrial problems which are too complex to be solved through conventional approaches and require the simulation of intelligent thought processes, heuristics, applications of knowledge, and distributed and parallel processing. The integration of these multiple approaches in solving complex problems is of particular importance. The journal presents new and original research and technological developments, addressing real and complex issues applicable to difficult problems. It provides a medium for exchanging scientific research and technological achievements accomplished by the international community.