Domain Adaptation with Good Edit Similarities: A Sparse Way to Deal with Scaling and Rotation Problems in Image Classification

2011 IEEE 23rd International Conference on Tools with Artificial Intelligence Pub Date : 2011-11-07 DOI:10.1109/ICTAI.2011.35

Amaury Habrard, Jean-Philippe Peyrache, M. Sebban

{"title":"Domain Adaptation with Good Edit Similarities: A Sparse Way to Deal with Scaling and Rotation Problems in Image Classification","authors":"Amaury Habrard, Jean-Philippe Peyrache, M. Sebban","doi":"10.1109/ICTAI.2011.35","DOIUrl":null,"url":null,"abstract":"In many real-life applications, the available source training information is either too small or not representative enough of the underlying target test problem. In the past few years, a new line of machine learning research has been developed to overcome such awkward situations, called Domain Adaptation (DA), giving rise to many adaptation algorithms and theoretical results in the form of generalization bounds. In this paper, a novel contribution is proposed in the form of a DA algorithm dealing with string-structured data, inspired from the DA support vector machine (SVM) technique introduced in [Bruzzone et al, PAMI 2010]. To ensure the convergence of SVM-based learning, the similarity functions involved in the process must be valid kernels, i.e. positive semi-definite (PSD) and symmetric. However, in the string-based context that we are considering in this paper, this condition is often not satisfied. Indeed, it has been proven that most string similarity functions based on the edit distance are not PSD. To overcome this drawback, we make use in this paper of the new theory of learning with good similarity functions introduced by Balcan et al., which (i) does not require the use of a valid kernel to learn well and (ii) allows us to induce sparser models. We take advantage of this theoretical framework to propose a new DA algorithm using good edit similarity functions. Using a suitable string-representation of handwritten digits, we show that are our new algorithm is very efficient to deal with the scaling and rotation problems usually encountered in image classification.","PeriodicalId":332661,"journal":{"name":"2011 IEEE 23rd International Conference on Tools with Artificial Intelligence","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 23rd International Conference on Tools with Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2011.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

In many real-life applications, the available source training information is either too small or not representative enough of the underlying target test problem. In the past few years, a new line of machine learning research has been developed to overcome such awkward situations, called Domain Adaptation (DA), giving rise to many adaptation algorithms and theoretical results in the form of generalization bounds. In this paper, a novel contribution is proposed in the form of a DA algorithm dealing with string-structured data, inspired from the DA support vector machine (SVM) technique introduced in [Bruzzone et al, PAMI 2010]. To ensure the convergence of SVM-based learning, the similarity functions involved in the process must be valid kernels, i.e. positive semi-definite (PSD) and symmetric. However, in the string-based context that we are considering in this paper, this condition is often not satisfied. Indeed, it has been proven that most string similarity functions based on the edit distance are not PSD. To overcome this drawback, we make use in this paper of the new theory of learning with good similarity functions introduced by Balcan et al., which (i) does not require the use of a valid kernel to learn well and (ii) allows us to induce sparser models. We take advantage of this theoretical framework to propose a new DA algorithm using good edit similarity functions. Using a suitable string-representation of handwritten digits, we show that are our new algorithm is very efficient to deal with the scaling and rotation problems usually encountered in image classification.

查看原文本刊更多论文

具有良好编辑相似度的域自适应:一种处理图像分类中缩放和旋转问题的稀疏方法

在许多实际应用程序中，可用的源训练信息要么太小，要么不足以代表潜在的目标测试问题。在过去的几年里，为了克服这种尴尬的情况，一种新的机器学习研究已经发展起来，称为领域适应(DA)，产生了许多适应算法和以泛化边界形式出现的理论结果。本文以处理字符串结构数据的数据处理算法的形式提出了一种新的贡献，其灵感来自于[Bruzzone等人，PAMI 2010]中引入的数据处理支持向量机(SVM)技术。为了保证基于svm学习的收敛性，过程中涉及的相似函数必须是有效核函数，即正半定函数(PSD)和对称函数。然而，在我们在本文中考虑的基于字符串的上下文中，这个条件通常不满足。事实证明，大多数基于编辑距离的字符串相似度函数都不是PSD函数。为了克服这一缺点，我们在本文中使用了Balcan等人引入的具有良好相似函数的学习新理论，该理论(i)不需要使用有效的核来学习，(ii)允许我们诱导稀疏模型。我们利用这一理论框架提出了一种利用良好的编辑相似度函数的新的数据挖掘算法。使用合适的手写数字字符串表示，我们表明我们的新算法非常有效地处理图像分类中经常遇到的缩放和旋转问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE 23rd International Conference on Tools with Artificial Intelligence

自引率

0.00%

发文量