基于距离的卷积神经网络深度特征空间学习损失函数

IF 4.3 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computer Vision and Image Understanding Pub Date : 2024-09-28 DOI:10.1016/j.cviu.2024.104184

Eduardo S. Ribeiro , Lourenço R.G. Araújo , Gabriel T.L. Chaves , Antônio P. Braga

{"title":"基于距离的卷积神经网络深度特征空间学习损失函数","authors":"Eduardo S. Ribeiro , Lourenço R.G. Araújo , Gabriel T.L. Chaves , Antônio P. Braga","doi":"10.1016/j.cviu.2024.104184","DOIUrl":null,"url":null,"abstract":"<div><div>Convolutional Neural Networks (CNNs) have been on the forefront of neural network research in recent years. Their breakthrough performance in fields such as image classification has gathered efforts in the development of new CNN-based architectures, but recently more attention has been directed to the study of new loss functions. Softmax loss remains the most popular loss function due mainly to its efficiency in class separation, but the function is unsatisfactory in terms of intra-class compactness. While some studies have addressed this problem, most solutions attempt to refine softmax loss or combine it with other approaches. We present a novel loss function based on distance matrices (LDMAT), softmax independent, that maximizes interclass distance and minimizes intraclass distance. The loss function operates directly on deep features, allowing their use on arbitrary classifiers. LDMAT minimizes the distance between two distance matrices, one constructed with the model’s deep features and the other calculated from the labels. The use of a distance matrix in the loss function allows a two-dimensional representation of features and imposes a fixed distance between classes, while improving intra-class compactness. A regularization method applied to the distance matrix of labels is also presented, that allows a degree of relaxation of the solution and leads to a better spreading of features in the separation space. Efficient feature extraction was observed on datasets such as MNIST, CIFAR10 and CIFAR100.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104184"},"PeriodicalIF":4.3000,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distance-based loss function for deep feature space learning of convolutional neural networks\",\"authors\":\"Eduardo S. Ribeiro , Lourenço R.G. Araújo , Gabriel T.L. Chaves , Antônio P. Braga\",\"doi\":\"10.1016/j.cviu.2024.104184\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Convolutional Neural Networks (CNNs) have been on the forefront of neural network research in recent years. Their breakthrough performance in fields such as image classification has gathered efforts in the development of new CNN-based architectures, but recently more attention has been directed to the study of new loss functions. Softmax loss remains the most popular loss function due mainly to its efficiency in class separation, but the function is unsatisfactory in terms of intra-class compactness. While some studies have addressed this problem, most solutions attempt to refine softmax loss or combine it with other approaches. We present a novel loss function based on distance matrices (LDMAT), softmax independent, that maximizes interclass distance and minimizes intraclass distance. The loss function operates directly on deep features, allowing their use on arbitrary classifiers. LDMAT minimizes the distance between two distance matrices, one constructed with the model’s deep features and the other calculated from the labels. The use of a distance matrix in the loss function allows a two-dimensional representation of features and imposes a fixed distance between classes, while improving intra-class compactness. A regularization method applied to the distance matrix of labels is also presented, that allows a degree of relaxation of the solution and leads to a better spreading of features in the separation space. Efficient feature extraction was observed on datasets such as MNIST, CIFAR10 and CIFAR100.</div></div>\",\"PeriodicalId\":50633,\"journal\":{\"name\":\"Computer Vision and Image Understanding\",\"volume\":\"249 \",\"pages\":\"Article 104184\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2024-09-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Vision and Image Understanding\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1077314224002650\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224002650","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

卷积神经网络（CNN）近年来一直处于神经网络研究的前沿。它们在图像分类等领域的突破性表现为开发基于 CNN 的新架构集聚了力量，但最近更多的注意力被引导到新损失函数的研究上。Softmax 损失函数仍然是最受欢迎的损失函数，这主要是因为它在类分离方面的高效性，但该函数在类内紧凑性方面并不令人满意。虽然一些研究已经解决了这一问题，但大多数解决方案都试图改进 softmax 损失函数或将其与其他方法相结合。我们提出了一种基于距离矩阵（LDMAT）、独立于 softmax 的新型损失函数，它能最大化类间距离，最小化类内距离。该损失函数直接作用于深度特征，可用于任意分类器。LDMAT 将两个距离矩阵之间的距离最小化，其中一个距离矩阵由模型的深度特征构建，另一个距离矩阵由标签计算得出。在损失函数中使用距离矩阵可实现特征的二维表示，并在改善类内紧凑性的同时，强加类之间的固定距离。此外，还介绍了一种应用于标签距离矩阵的正则化方法，这种方法可以在一定程度上放松解决方案，并使特征在分离空间中得到更好的分布。在 MNIST、CIFAR10 和 CIFAR100 等数据集上观察到了高效的特征提取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Distance-based loss function for deep feature space learning of convolutional neural networks

Convolutional Neural Networks (CNNs) have been on the forefront of neural network research in recent years. Their breakthrough performance in fields such as image classification has gathered efforts in the development of new CNN-based architectures, but recently more attention has been directed to the study of new loss functions. Softmax loss remains the most popular loss function due mainly to its efficiency in class separation, but the function is unsatisfactory in terms of intra-class compactness. While some studies have addressed this problem, most solutions attempt to refine softmax loss or combine it with other approaches. We present a novel loss function based on distance matrices (LDMAT), softmax independent, that maximizes interclass distance and minimizes intraclass distance. The loss function operates directly on deep features, allowing their use on arbitrary classifiers. LDMAT minimizes the distance between two distance matrices, one constructed with the model’s deep features and the other calculated from the labels. The use of a distance matrix in the loss function allows a two-dimensional representation of features and imposes a fixed distance between classes, while improving intra-class compactness. A regularization method applied to the distance matrix of labels is also presented, that allows a degree of relaxation of the solution and leads to a better spreading of features in the separation space. Efficient feature extraction was observed on datasets such as MNIST, CIFAR10 and CIFAR100.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computer Vision and Image Understanding 工程技术-工程：电子与电气

CiteScore

7.80

自引率

4.40%

发文量

112

审稿时长

79 days

期刊介绍： The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems