Tingting Han , Kai Wang , Jun Yu , Sicheng Zhao , Jianping Fan
{"title":"Adversarial temporal sentence grounding by learning from external data","authors":"Tingting Han , Kai Wang , Jun Yu , Sicheng Zhao , Jianping Fan","doi":"10.1016/j.patcog.2025.111621","DOIUrl":null,"url":null,"abstract":"<div><div>Temporal sentence grounding (TSG) aims to localize the temporal moment that semantically corresponds to a given natural language query in the untrimmed video. Great efforts have been made to solve the problem in both fully supervised and weakly supervised settings. However, fully supervised methods heavily rely on manually annotated start and end timestamps which are arduous to obtain, while weakly supervised methods suffer from performance issues due to the lack of supervision. In this paper, we propose to solve the temporal sentence grounding by exploring external data. Specifically, we design an Adversarial Temporal Sentence Grounding (ATSG) framework, comprising a proposal generator and a semantic discriminator which is firstly pre-trained on external data. Benefiting from the pre-training, the semantic discriminator possesses the ability to distinguish cross-modal semantic similarities and encourages the proposal generator to produce more accurate candidates. In addition, we use an adversarial training process in the joint optimization stage where the proposal generator and the semantic discriminator compete alternately, ultimately leading to improved TSG performance. We conduct extensive experiments on two public benchmarks, i.e., ActivityNet Captions and Charades-STA, and the results demonstrate that the proposed ATSG network achieves state-of-the-art performance.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111621"},"PeriodicalIF":7.5000,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S003132032500281X","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Temporal sentence grounding (TSG) aims to localize the temporal moment that semantically corresponds to a given natural language query in the untrimmed video. Great efforts have been made to solve the problem in both fully supervised and weakly supervised settings. However, fully supervised methods heavily rely on manually annotated start and end timestamps which are arduous to obtain, while weakly supervised methods suffer from performance issues due to the lack of supervision. In this paper, we propose to solve the temporal sentence grounding by exploring external data. Specifically, we design an Adversarial Temporal Sentence Grounding (ATSG) framework, comprising a proposal generator and a semantic discriminator which is firstly pre-trained on external data. Benefiting from the pre-training, the semantic discriminator possesses the ability to distinguish cross-modal semantic similarities and encourages the proposal generator to produce more accurate candidates. In addition, we use an adversarial training process in the joint optimization stage where the proposal generator and the semantic discriminator compete alternately, ultimately leading to improved TSG performance. We conduct extensive experiments on two public benchmarks, i.e., ActivityNet Captions and Charades-STA, and the results demonstrate that the proposed ATSG network achieves state-of-the-art performance.
期刊介绍:
The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.