Future Feature-Based Supervised Contrastive Learning for Streaming Perception

IF 8.3 1区 工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Tongbo Wang;Hua Huang
{"title":"Future Feature-Based Supervised Contrastive Learning for Streaming Perception","authors":"Tongbo Wang;Hua Huang","doi":"10.1109/TCSVT.2024.3439692","DOIUrl":null,"url":null,"abstract":"Streaming perception, a critical task in computer vision, involves the real-time prediction of object locations within video sequences based on prior frames. While current methods like StreamYOLO mainly rely on coordinate information, they often fall short of delivering precise predictions due to feature misalignment between input data and supervisory labels. In this paper, a novel method, Future Feature-based Supervised Contrastive Learning (FFSCL), is introduced to address this challenge by incorporating appearance features from future frames and leveraging supervised contrastive learning techniques. FFSCL establishes a robust correspondence between the appearance of an object in current and past frames and its location in the subsequent frame. This integrated method significantly improves the accuracy of object position prediction in streaming perception tasks. In addition, the FFSCL method includes a sample pair construction module (SPC) for the efficient creation of positive and negative samples based on future frame labels and a feature consistency loss (FCL) to enhance the effectiveness of supervised contrastive learning by linking appearance features from future frames with those from past frames. The efficacy of FFSCL is demonstrated through extensive experiments on two large-scale benchmark datasets, where FFSCL consistently outperforms state-of-the-art methods in streaming perception tasks. This study represents a significant advancement in the incorporation of supervised contrastive learning techniques and future frame information into the realm of streaming perception, paving the way for more accurate and efficient object prediction within video streams.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"13611-13625"},"PeriodicalIF":8.3000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10630573/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Streaming perception, a critical task in computer vision, involves the real-time prediction of object locations within video sequences based on prior frames. While current methods like StreamYOLO mainly rely on coordinate information, they often fall short of delivering precise predictions due to feature misalignment between input data and supervisory labels. In this paper, a novel method, Future Feature-based Supervised Contrastive Learning (FFSCL), is introduced to address this challenge by incorporating appearance features from future frames and leveraging supervised contrastive learning techniques. FFSCL establishes a robust correspondence between the appearance of an object in current and past frames and its location in the subsequent frame. This integrated method significantly improves the accuracy of object position prediction in streaming perception tasks. In addition, the FFSCL method includes a sample pair construction module (SPC) for the efficient creation of positive and negative samples based on future frame labels and a feature consistency loss (FCL) to enhance the effectiveness of supervised contrastive learning by linking appearance features from future frames with those from past frames. The efficacy of FFSCL is demonstrated through extensive experiments on two large-scale benchmark datasets, where FFSCL consistently outperforms state-of-the-art methods in streaming perception tasks. This study represents a significant advancement in the incorporation of supervised contrastive learning techniques and future frame information into the realm of streaming perception, paving the way for more accurate and efficient object prediction within video streams.
未来基于特征的流媒体感知监督对比学习
流感知是计算机视觉中的一项关键任务,它涉及到基于先前帧的视频序列中物体位置的实时预测。虽然像StreamYOLO这样的当前方法主要依赖于坐标信息,但由于输入数据和监督标签之间的特征不一致,它们往往无法提供精确的预测。本文介绍了一种新的方法,基于未来特征的监督对比学习(FFSCL),通过结合来自未来框架的外观特征和利用监督对比学习技术来解决这一挑战。FFSCL在当前和过去帧中的对象外观及其在后续帧中的位置之间建立了强大的对应关系。该方法显著提高了流感知任务中目标位置预测的精度。此外,FFSCL方法还包括一个样本对构建模块(SPC),用于基于未来帧标签有效地创建正样本和负样本,以及一个特征一致性损失模块(FCL),通过将未来帧的外观特征与过去帧的外观特征联系起来,提高监督对比学习的有效性。FFSCL的有效性通过两个大规模基准数据集的广泛实验得到了证明,其中FFSCL在流感知任务中始终优于最先进的方法。这项研究代表了将监督对比学习技术和未来框架信息结合到流感知领域的重大进步,为视频流中更准确、更有效的对象预测铺平了道路。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
13.80
自引率
27.40%
发文量
660
审稿时长
5 months
期刊介绍: The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信