{"title":"ResFNN: Residual Structure-Based Feedforward Neural Network for Action Quality Assessment in Sports Consumer Electronics","authors":"Honghao Gao;Si Yu;Muddesar Iqbal;Mohsen Guizani","doi":"10.1109/TCE.2024.3482560","DOIUrl":null,"url":null,"abstract":"With the development of artificial intelligence (AI) and sports consumer electronics, AI-empowered Olympic sport technologies are being implemented more extensively. Action quality assessment (AQA), a sport action recognition and video refereeing technology, aims to automatically score action performance in videos obtained from sports consumer electronics deployed in arenas. It has gained much attention for its wide range of applications, such as sports event scoring, specific skill assessment, and rehabilitation medicine. General methods score action performance by directly regressing the initial video features to score, which neglects the possibility that the initial features are insufficiently effective. To address this issue, we propose a residual structure-based feedforward neural network (ResFNN) that enables efficient action feature learning to attain improved score assessment performance. First, the input videos are downsampled to clips and passed through inflated 3D convolutional networks (ConvNets) to obtain initial action video features. These features contain spatiotemporal information about the human actions occurring in the videos. Second, these features are aggregated and learned through our ResFNN. The ResFNN is composed of feedforward neural network residual blocks, which have strong function fitting and feature conversion capabilities. Therefore, the network learns features well and obtains more effective features. Third, a score distribution regression method is applied to obtain the underlying score distribution. This step establishes a more accurate mapping between the videos and scores. Finally, our method is demonstrated to outperform the majority of the existing methods through experiments conducted on the AQA-7, MTL-AQA, and JIGSAWS datasets.","PeriodicalId":13208,"journal":{"name":"IEEE Transactions on Consumer Electronics","volume":"70 4","pages":"6653-6663"},"PeriodicalIF":4.3000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Consumer Electronics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10720818/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
With the development of artificial intelligence (AI) and sports consumer electronics, AI-empowered Olympic sport technologies are being implemented more extensively. Action quality assessment (AQA), a sport action recognition and video refereeing technology, aims to automatically score action performance in videos obtained from sports consumer electronics deployed in arenas. It has gained much attention for its wide range of applications, such as sports event scoring, specific skill assessment, and rehabilitation medicine. General methods score action performance by directly regressing the initial video features to score, which neglects the possibility that the initial features are insufficiently effective. To address this issue, we propose a residual structure-based feedforward neural network (ResFNN) that enables efficient action feature learning to attain improved score assessment performance. First, the input videos are downsampled to clips and passed through inflated 3D convolutional networks (ConvNets) to obtain initial action video features. These features contain spatiotemporal information about the human actions occurring in the videos. Second, these features are aggregated and learned through our ResFNN. The ResFNN is composed of feedforward neural network residual blocks, which have strong function fitting and feature conversion capabilities. Therefore, the network learns features well and obtains more effective features. Third, a score distribution regression method is applied to obtain the underlying score distribution. This step establishes a more accurate mapping between the videos and scores. Finally, our method is demonstrated to outperform the majority of the existing methods through experiments conducted on the AQA-7, MTL-AQA, and JIGSAWS datasets.
期刊介绍:
The main focus for the IEEE Transactions on Consumer Electronics is the engineering and research aspects of the theory, design, construction, manufacture or end use of mass market electronics, systems, software and services for consumers.