{"title":"A Survey of Temporal Activity Localization via Language in Untrimmed Videos","authors":"Yulan Yang, Z. Li, Gangyan Zeng","doi":"10.1109/ICCST50977.2020.00123","DOIUrl":null,"url":null,"abstract":"Video is one of the most informative media which consists of visual, textual and audio contents. As the number of videos on the Internet grows explosively, it is increasingly necessary for machines to understand the semantic information in the videos accurately. Temporally Activity Localization in a video is such a work which needs to localize the video moment that is most semantically similar to a given natural query. This task is quite challenging for that it not only requires to have a deep understanding of the sentences and videos, but also the fine-grained interactions between the two modalities. In this paper, we report a comprehensive survey of existed temporal sentence localization techniques. Firstly, we make a detailed classification and analysis of these methods. Then we discuss the experimental results and performance of existed approaches. Finally, we present some insights for future research direction.","PeriodicalId":189809,"journal":{"name":"2020 International Conference on Culture-oriented Science & Technology (ICCST)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Culture-oriented Science & Technology (ICCST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCST50977.2020.00123","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Video is one of the most informative media which consists of visual, textual and audio contents. As the number of videos on the Internet grows explosively, it is increasingly necessary for machines to understand the semantic information in the videos accurately. Temporally Activity Localization in a video is such a work which needs to localize the video moment that is most semantically similar to a given natural query. This task is quite challenging for that it not only requires to have a deep understanding of the sentences and videos, but also the fine-grained interactions between the two modalities. In this paper, we report a comprehensive survey of existed temporal sentence localization techniques. Firstly, we make a detailed classification and analysis of these methods. Then we discuss the experimental results and performance of existed approaches. Finally, we present some insights for future research direction.