Automatically Design Distance Functions for Graph-Based Semi-Supervised Learning

2017 IEEE Trustcom/BigDataSE/ICESS Pub Date : 2017-08-01 DOI:10.1109/Trustcom/BigDataSE/ICESS.2017.333

Patricia Miquilini, R. G. Rossi, M. G. Quiles, V. V. D. Melo, M. Basgalupp

{"title":"Automatically Design Distance Functions for Graph-Based Semi-Supervised Learning","authors":"Patricia Miquilini, R. G. Rossi, M. G. Quiles, V. V. D. Melo, M. Basgalupp","doi":"10.1109/Trustcom/BigDataSE/ICESS.2017.333","DOIUrl":null,"url":null,"abstract":"Automatic data classification is often performed by supervised learning algorithms, producing a model to classify new instances. Reflecting that labeled instances are expensive, semisupervised learning (SSL) methods prove to be an alternative to performing data classification, once the learning demands only a few labeled instances. There are many SSL algorithms, and graph-based ones have significant features. In particular, graph-based models grant to identify classes of different distributions without prior knowledge of statistical model parameters. However, a drawback that might influence their classification performance relays on the construction of the graph, which requires the measurement of distances (or similarities) between instances. Since a particular distance function can enhance the performance for some data sets and decrease to others, here, we introduce a novel approach, called GEAD, a Grammatical Evolution for Automatically designing Distance functions for Graph-based semi-supervised learning. We perform extensive experiments with 100 public data sets to assess the performance of our approach, and we compare it with traditional distance functions in the literature. Results show that GEAD is capable of designing distance functions that significantly outperform the baseline manually-designed ones regarding different predictive measures, such as Micro-F1, and Macro-F1.","PeriodicalId":170253,"journal":{"name":"2017 IEEE Trustcom/BigDataSE/ICESS","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Trustcom/BigDataSE/ICESS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Automatic data classification is often performed by supervised learning algorithms, producing a model to classify new instances. Reflecting that labeled instances are expensive, semisupervised learning (SSL) methods prove to be an alternative to performing data classification, once the learning demands only a few labeled instances. There are many SSL algorithms, and graph-based ones have significant features. In particular, graph-based models grant to identify classes of different distributions without prior knowledge of statistical model parameters. However, a drawback that might influence their classification performance relays on the construction of the graph, which requires the measurement of distances (or similarities) between instances. Since a particular distance function can enhance the performance for some data sets and decrease to others, here, we introduce a novel approach, called GEAD, a Grammatical Evolution for Automatically designing Distance functions for Graph-based semi-supervised learning. We perform extensive experiments with 100 public data sets to assess the performance of our approach, and we compare it with traditional distance functions in the literature. Results show that GEAD is capable of designing distance functions that significantly outperform the baseline manually-designed ones regarding different predictive measures, such as Micro-F1, and Macro-F1.

查看原文本刊更多论文

基于图的半监督学习距离函数的自动设计

自动数据分类通常由监督学习算法执行，产生一个模型来分类新的实例。半监督学习(SSL)方法反映了标记实例是昂贵的，一旦学习只需要几个标记实例，则证明是执行数据分类的一种替代方法。有许多SSL算法，基于图的算法具有重要的特性。特别是，基于图的模型允许在没有统计模型参数先验知识的情况下识别不同分布的类别。然而，可能影响其分类性能的一个缺点依赖于图的构造，这需要测量实例之间的距离(或相似性)。由于特定的距离函数可以提高某些数据集的性能，而降低其他数据集的性能，在这里，我们引入了一种新的方法，称为GEAD，一种用于自动设计基于图的半监督学习的距离函数的语法进化。我们对100个公共数据集进行了广泛的实验，以评估我们的方法的性能，并将其与文献中的传统距离函数进行了比较。结果表明，对于不同的预测指标，如Micro-F1和Macro-F1, GEAD能够设计出明显优于基线人工设计的距离函数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE Trustcom/BigDataSE/ICESS

自引率

0.00%

发文量