{"title":"A Lightweight Video Summarization Method Considering the Subjective Transition Degree for Online Educational Screen Content Videos","authors":"Qinqin Meng, Kaifang Yang, Yanchao Gong","doi":"10.1145/3532342.3532354","DOIUrl":"https://doi.org/10.1145/3532342.3532354","url":null,"abstract":"With the popularization of online education, the number of educational videos is increasing, and some shoddy and dangerous videos also become potential threat to the mental health of students and the public security. Therefore, efficient video summarization technology is critical for the analysis, retrieval, and management of online educational videos. Online education videos usually belong to the screen content videos (SCV), and is typical real-time communication system, which urgently needs lightweight technologies with fast speed and low hardware requirements. Therefore, the traditional video summarization method for nature videos or with high complexity cannot be effectively applied for the online educational videos. SCV generated by screen recording the play of the Power Point (PPT-SCV) has also been widely applied in education field. Therefore, taking the PPT-SCV as an example, this paper proposed a lightweight video summarization method for the educational videos. First, the average standard deviation among frames and the variance of frame difference in a video clip which can effectively reflect the content characteristics of PPT-SCV were used to obtain the category of transitions and the key frames. Then, the subjective transition degree in video is used for marking the knowledge importance of each key frames, and a lightweight video summarization method was finally proposed. Experimental results demonstrated that the proposed method can efficiently locate the key frames which is in line with subjective perception.","PeriodicalId":398859,"journal":{"name":"Proceedings of the 4th International Symposium on Signal Processing Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121471255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi Zhan, Yuanping Xu, Chaolong Zhang, Zhijie Xu, Benjun Guo
{"title":"An Irregularly Dropped Garbage Detection Method Based on Improved YOLOv5s","authors":"Yi Zhan, Yuanping Xu, Chaolong Zhang, Zhijie Xu, Benjun Guo","doi":"10.1145/3532342.3532344","DOIUrl":"https://doi.org/10.1145/3532342.3532344","url":null,"abstract":"Waste sorting and recycling play a significant role in carbon neutrality, and the government has promoted waste sorting stations in various cities while the stations have limited efficiency due to the absence of intelligent surveillance systems to monitor and analyze the scene in waste stations, especially to detect the irregularly dropped garbage. To take the most advantage of these stations, this study proposes an improved YOLO (You Only Look Once) v5s detector named YOLOv5s-Garbage to monitor waste sorting stations in real-time. This study enhances its ability to detect garbage by introducing CBAM (Convolutional Block Attention Module) and using EIoU (Efficient Intersection over Union) to accelerate the convergence of the bonding box loss. According to experiments, the mAP of YOLOv5s-Garbage on the waste sorting dataset reaches 89.7%, which is 3.3% higher than the classical YOLOv5s. This study then combines the DeepSort tracking algorithm and re-filter process to filter the target garbage to distinguish the irregularly dropped garbage and normal one, which reduces the false alarm significantly.","PeriodicalId":398859,"journal":{"name":"Proceedings of the 4th International Symposium on Signal Processing Systems","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124116439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uni-Dimensional Autoencoder Reinforced Multilayer Perceptron Network for Individual Behavior Detection","authors":"Lingzhe Wang, Yuefan Hao, Ying Liu","doi":"10.1145/3532342.3532353","DOIUrl":"https://doi.org/10.1145/3532342.3532353","url":null,"abstract":"In recent years, due to the increasing number of public security incidents, the field of individual behavior detection has made great progress. Among them, MLP method is representative, but its defects are also very obvious. this paper proposes an autoencoder fusion MLP based novel network structure for behavior detection, which can significantly improve the recognition accuracy. The proposed network extracts the color-based features from the video, outputs and compressed the features as a one-dimensional vector with autoencoder, and finally input the parameters into the fully connected layer for the classification of abnormal behaviors. The proposed network achieved the accuracy of 67% on the UCF-Crime data set, and significantly enhanced the accuracy on simple data sets. The experimental results indicate the autoencoder achieves promising performance on individual behavior recognition and potentially on the crowd behaviors.","PeriodicalId":398859,"journal":{"name":"Proceedings of the 4th International Symposium on Signal Processing Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129163824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Bilateral Pose Transformer for Human Pose Estimation","authors":"Chia-Chen Yen, Tao Pin, Hongmin Xu","doi":"10.1145/3532342.3532346","DOIUrl":"https://doi.org/10.1145/3532342.3532346","url":null,"abstract":"Human Pose is a well-defined fundamental task researched by the computer vision community for years. Previous Convolutional Neural Network (CNN) based works have achieved significant success in the human pose. Recently, Vision Transformer (VT) has shown superior performance on computer vision tasks. However, current VT methods emphasize local information less and often focus on only a single scale feature that may not be suitable for dense image prediction tasks, which essentially requires multi-scale representations. In this paper, we propose a novel Bilateral Pose Transformer (BPT) framework to handle the human pose. Specifically, BPT consists of an innovated bilateral branch encoder and a multi-scale integrating decoder. The bilateral branch encoder contains a Context Branch (CB) and Spatial Branch (SB). The CB involves a VT-based backbone to capture the context clues and produce multi-scale context features. The CNN-based SB maintains high-resolution representations containing rich spatial information to introduce the local spatial information that supplements the CB explicitly. About the decoder, a Mixed Feature Module consisting of local attention CNN is proposed to integrate the various-scale context and spatial features effectively. Experiments demonstrate that our approach achieves competitive performances in human pose estimation. Specifically, compared to the HRNet [1], the BPT saves 43% GFLOPs and drops only 0.1 points AP, achieving 75.7% AP with 9.0 GFLOPs, on the COCO keypoints dataset.","PeriodicalId":398859,"journal":{"name":"Proceedings of the 4th International Symposium on Signal Processing Systems","volume":"24 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120993604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust and Efficient Shoe Print Image Retrieval Using Spatial Transformer Network and Deep Hashing","authors":"Wei Liu, Dawei Xu","doi":"10.1145/3532342.3532356","DOIUrl":"https://doi.org/10.1145/3532342.3532356","url":null,"abstract":"In recent years, great progress has been made on the topic of shoe print image retrieval. However, it still remains a big challenge to accurately retrieve well-matched shoe print images from a huge database in real time. Deep hashing method has been proved to be effective in large-scale image retrieval in many applications. Output of deep hash network can be represented as a binary bit hash code, which helps to reduce storage space and retrieval time. Existing shoe print image retrieval methods have poor performance in instance retrieval because of shortage of classification information, sufficient sample information and feature quantization information. In order to overcome the problem, we put forward an end-to-end network to learn short deep hash codes. The learned hash code preserves the classification and small sample information very well. Moreover, due to the fact that STN (Spatial Transformer Network) block is simultaneously embedded into the hash network to enhance the retrieval ability for rotated shoe print images, the problem of rotation misalignment can be solved and the retrieval accuracy is improved. Furthermore, in order to make better use of class label information, we presented a new joint loss function. This loss function helps the network map both images’ classification information and similarities into hash codes and reduce quantitative loss. In addition, we used triple labels to alleviate sample imbalance problem. Experiments on database including 10,500 shoe print images show that our proposed method can improve the retrieval performance. The proposed approach can yield a mAP (mean Average Precision) of 0.83 and a recall of 0.35, which demonstrates the discriminatory power of the learned hash codes in shoe print image retrieval application.","PeriodicalId":398859,"journal":{"name":"Proceedings of the 4th International Symposium on Signal Processing Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125443763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fan Ding, Wenjie Shen, Y. Li, Yun Lin, Yanping Wang
{"title":"Suppress Target Height Induced False Alarm in CSAR Moving Target Detection","authors":"Fan Ding, Wenjie Shen, Y. Li, Yun Lin, Yanping Wang","doi":"10.1145/3532342.3532350","DOIUrl":"https://doi.org/10.1145/3532342.3532350","url":null,"abstract":"Circular Synthetic Aperture Radar(CSAR)is a new SAR imaging mode, which has the advantages of long-time observation and multi-aspect observation. Based on these advantages, our team have proposed a single channel moving target detection method entitled logarithmic background subtraction(LBS). This method utilizes the target signal motion in sub-aperture image sequence. However, in-depth study shows that target with height will project a circular defocus, which also have motion when in sub-aperture image sequence. Thus, causing misdetection and false-alarm rate increasing. In this paper, the formula between defocus radius and point target height is derived and analyzed. Based on analysis, a false-alarm suppression method for moving target detection is proposed. It uses trajectory difference between moving target and defocus signal to remove the false-alarm caused by defocus signal and reduce the false-alarm rate. The proposed false-alarm suppression method is verified by W-band video SAR data.","PeriodicalId":398859,"journal":{"name":"Proceedings of the 4th International Symposium on Signal Processing Systems","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121514550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Proceedings of the 4th International Symposium on Signal Processing Systems","authors":"","doi":"10.1145/3532342","DOIUrl":"https://doi.org/10.1145/3532342","url":null,"abstract":"","PeriodicalId":398859,"journal":{"name":"Proceedings of the 4th International Symposium on Signal Processing Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123125864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}