T5-based anomaly-behavior video captioning using semantic relation mining

IF 6.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Soft Computing Pub Date : 2025-09-25 DOI:10.1016/j.asoc.2025.113923

Min-Jeong Kim , Kyungyong Chung

{"title":"T5-based anomaly-behavior video captioning using semantic relation mining","authors":"Min-Jeong Kim , Kyungyong Chung","doi":"10.1016/j.asoc.2025.113923","DOIUrl":null,"url":null,"abstract":"<div><div>Video data consist of a series of images that change over time. The sequence of frames in a video provides important information on the motion and continuity of the video. Therefore, this dynamic information can be used to analyze the movement and behavior patterns of objects. Video captioning, which is used to explain a video, can describe the content of the video data and provide subtitles or descriptions in various languages. It can also explain the main points in a video with complex content, facilitating the information provided to users. In captioning, semantic analysis is used to identify the overall context of the data and generate the correct captions. However, captions are usually generated by focusing on major objects and actions, making it difficult to capture the details. In this paper, we propose text-to-text transfer transformer (T5)-based abnormal behavior video capturing using semantic relation mining. The proposed method generates captions with semantic features from video data based on environmental factors and improves the accuracy of video description by identifying the similarity of each caption for similar video and caption classification. This enables the classification and search of video data and is useful in video analysis systems, such as video monitoring and media analysis.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"185 ","pages":"Article 113923"},"PeriodicalIF":6.6000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625012360","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Video data consist of a series of images that change over time. The sequence of frames in a video provides important information on the motion and continuity of the video. Therefore, this dynamic information can be used to analyze the movement and behavior patterns of objects. Video captioning, which is used to explain a video, can describe the content of the video data and provide subtitles or descriptions in various languages. It can also explain the main points in a video with complex content, facilitating the information provided to users. In captioning, semantic analysis is used to identify the overall context of the data and generate the correct captions. However, captions are usually generated by focusing on major objects and actions, making it difficult to capture the details. In this paper, we propose text-to-text transfer transformer (T5)-based abnormal behavior video capturing using semantic relation mining. The proposed method generates captions with semantic features from video data based on environmental factors and improves the accuracy of video description by identifying the similarity of each caption for similar video and caption classification. This enables the classification and search of video data and is useful in video analysis systems, such as video monitoring and media analysis.

查看原文本刊更多论文

基于t5的基于语义关系挖掘的异常行为视频字幕

视频数据由一系列随时间变化的图像组成。视频中的帧序列提供了关于视频的运动和连续性的重要信息。因此，这些动态信息可以用来分析物体的运动和行为模式。视频字幕用于对视频进行解释，可以描述视频数据的内容，并提供多种语言的字幕或描述。它还可以解释内容复杂的视频中的要点，方便向用户提供信息。在标题中，语义分析用于识别数据的整体上下文并生成正确的标题。然而，标题通常是通过关注主要对象和动作来生成的，因此很难捕捉细节。在本文中，我们提出了一种基于文本到文本传输转换器（T5）的基于语义关系挖掘的异常行为视频捕获方法。该方法基于环境因素从视频数据中生成具有语义特征的字幕，并通过识别每个字幕的相似度来进行相似视频和字幕分类，从而提高视频描述的准确性。这样可以对视频数据进行分类和搜索，并且在视频分析系统中很有用，例如视频监控和媒体分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.