Automatically Design Distance Functions for Graph-Based Semi-Supervised Learning

Patricia Miquilini, R. G. Rossi, M. G. Quiles, V. V. D. Melo, M. Basgalupp
{"title":"Automatically Design Distance Functions for Graph-Based Semi-Supervised Learning","authors":"Patricia Miquilini, R. G. Rossi, M. G. Quiles, V. V. D. Melo, M. Basgalupp","doi":"10.1109/Trustcom/BigDataSE/ICESS.2017.333","DOIUrl":null,"url":null,"abstract":"Automatic data classification is often performed by supervised learning algorithms, producing a model to classify new instances. Reflecting that labeled instances are expensive, semisupervised learning (SSL) methods prove to be an alternative to performing data classification, once the learning demands only a few labeled instances. There are many SSL algorithms, and graph-based ones have significant features. In particular, graph-based models grant to identify classes of different distributions without prior knowledge of statistical model parameters. However, a drawback that might influence their classification performance relays on the construction of the graph, which requires the measurement of distances (or similarities) between instances. Since a particular distance function can enhance the performance for some data sets and decrease to others, here, we introduce a novel approach, called GEAD, a Grammatical Evolution for Automatically designing Distance functions for Graph-based semi-supervised learning. We perform extensive experiments with 100 public data sets to assess the performance of our approach, and we compare it with traditional distance functions in the literature. Results show that GEAD is capable of designing distance functions that significantly outperform the baseline manually-designed ones regarding different predictive measures, such as Micro-F1, and Macro-F1.","PeriodicalId":170253,"journal":{"name":"2017 IEEE Trustcom/BigDataSE/ICESS","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Trustcom/BigDataSE/ICESS","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Trustcom/BigDataSE/ICESS.2017.333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Automatic data classification is often performed by supervised learning algorithms, producing a model to classify new instances. Reflecting that labeled instances are expensive, semisupervised learning (SSL) methods prove to be an alternative to performing data classification, once the learning demands only a few labeled instances. There are many SSL algorithms, and graph-based ones have significant features. In particular, graph-based models grant to identify classes of different distributions without prior knowledge of statistical model parameters. However, a drawback that might influence their classification performance relays on the construction of the graph, which requires the measurement of distances (or similarities) between instances. Since a particular distance function can enhance the performance for some data sets and decrease to others, here, we introduce a novel approach, called GEAD, a Grammatical Evolution for Automatically designing Distance functions for Graph-based semi-supervised learning. We perform extensive experiments with 100 public data sets to assess the performance of our approach, and we compare it with traditional distance functions in the literature. Results show that GEAD is capable of designing distance functions that significantly outperform the baseline manually-designed ones regarding different predictive measures, such as Micro-F1, and Macro-F1.
基于图的半监督学习距离函数的自动设计
自动数据分类通常由监督学习算法执行,产生一个模型来分类新的实例。半监督学习(SSL)方法反映了标记实例是昂贵的,一旦学习只需要几个标记实例,则证明是执行数据分类的一种替代方法。有许多SSL算法,基于图的算法具有重要的特性。特别是,基于图的模型允许在没有统计模型参数先验知识的情况下识别不同分布的类别。然而,可能影响其分类性能的一个缺点依赖于图的构造,这需要测量实例之间的距离(或相似性)。由于特定的距离函数可以提高某些数据集的性能,而降低其他数据集的性能,在这里,我们引入了一种新的方法,称为GEAD,一种用于自动设计基于图的半监督学习的距离函数的语法进化。我们对100个公共数据集进行了广泛的实验,以评估我们的方法的性能,并将其与文献中的传统距离函数进行了比较。结果表明,对于不同的预测指标,如Micro-F1和Macro-F1, GEAD能够设计出明显优于基线人工设计的距离函数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信