{"title":"Future Feature-Based Supervised Contrastive Learning for Streaming Perception","authors":"Tongbo Wang;Hua Huang","doi":"10.1109/TCSVT.2024.3439692","DOIUrl":null,"url":null,"abstract":"Streaming perception, a critical task in computer vision, involves the real-time prediction of object locations within video sequences based on prior frames. While current methods like StreamYOLO mainly rely on coordinate information, they often fall short of delivering precise predictions due to feature misalignment between input data and supervisory labels. In this paper, a novel method, Future Feature-based Supervised Contrastive Learning (FFSCL), is introduced to address this challenge by incorporating appearance features from future frames and leveraging supervised contrastive learning techniques. FFSCL establishes a robust correspondence between the appearance of an object in current and past frames and its location in the subsequent frame. This integrated method significantly improves the accuracy of object position prediction in streaming perception tasks. In addition, the FFSCL method includes a sample pair construction module (SPC) for the efficient creation of positive and negative samples based on future frame labels and a feature consistency loss (FCL) to enhance the effectiveness of supervised contrastive learning by linking appearance features from future frames with those from past frames. The efficacy of FFSCL is demonstrated through extensive experiments on two large-scale benchmark datasets, where FFSCL consistently outperforms state-of-the-art methods in streaming perception tasks. This study represents a significant advancement in the incorporation of supervised contrastive learning techniques and future frame information into the realm of streaming perception, paving the way for more accurate and efficient object prediction within video streams.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"13611-13625"},"PeriodicalIF":8.3000,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10630573/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Streaming perception, a critical task in computer vision, involves the real-time prediction of object locations within video sequences based on prior frames. While current methods like StreamYOLO mainly rely on coordinate information, they often fall short of delivering precise predictions due to feature misalignment between input data and supervisory labels. In this paper, a novel method, Future Feature-based Supervised Contrastive Learning (FFSCL), is introduced to address this challenge by incorporating appearance features from future frames and leveraging supervised contrastive learning techniques. FFSCL establishes a robust correspondence between the appearance of an object in current and past frames and its location in the subsequent frame. This integrated method significantly improves the accuracy of object position prediction in streaming perception tasks. In addition, the FFSCL method includes a sample pair construction module (SPC) for the efficient creation of positive and negative samples based on future frame labels and a feature consistency loss (FCL) to enhance the effectiveness of supervised contrastive learning by linking appearance features from future frames with those from past frames. The efficacy of FFSCL is demonstrated through extensive experiments on two large-scale benchmark datasets, where FFSCL consistently outperforms state-of-the-art methods in streaming perception tasks. This study represents a significant advancement in the incorporation of supervised contrastive learning techniques and future frame information into the realm of streaming perception, paving the way for more accurate and efficient object prediction within video streams.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.