{"title":"Box-driven coarse-grained segmentation for stroke rehabilitation scenarios","authors":"Yiming Fan, Yunjia Liu, Xiaofeng Lu","doi":"10.1117/12.3014426","DOIUrl":"https://doi.org/10.1117/12.3014426","url":null,"abstract":"For complex stroke rehabilitation scenarios, visual algorithms, such as motion recognition or video understanding, find it challenging to focus on patient areas with slow motion amplitude and pay more attention to targets with drastic changes in light flow. Therefore, it can provide critical perspectives and adequate information for the above visual tasks using a semantic segmentation algorithm to capture the patient's area from the captured image. Currently, the weakly supervised segmentation algorithm based on bounding boxes tends to utilize existing image classification methods. They can perform secondary processing on the internal images of boxes to obtain larger areas of pseudo-label information. In order to avoid the redundancy caused by algorithm concatenation, this paper proposes an end-to-end weakly supervised segmentation algorithm. In this method, a U-shaped residual module with variable depth is designed to capture the deep semantic features of images, and its output is integrated into the target matrix of the NCut problem in the form of blocks. Then, the region of the target is indicated by solving the sub-minimum eigenvector of the generalized eigensystem, and the segmentation is realized. We conducted experiments on the PASCAL VOC 2012 dataset, and the proposed method achieved 67.7% mIoU. On the private dataset, we compared the proposed method with similar algorithms, which can segment the target area more intensively","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":" 3","pages":"129692D - 129692D-7"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139640401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on automatic scoring algorithm for English composition based on machine learning","authors":"Hui Li","doi":"10.1117/12.3014482","DOIUrl":"https://doi.org/10.1117/12.3014482","url":null,"abstract":"It is difficult to extract deep semantic features for English composition scoring methods based on artificial features, and it is difficult for English composition scoring methods based on neural networks to extract shallow features such as the number of words, resulting in the limitations of different composition scoring methods. Based on existing research results, this paper proposes an English composition scoring method that combines artificial feature extraction methods and deep learning methods. This method uses artificially designed features to extract shallow features at the word and sentence levels in the composition, draws on existing methods to extract semantic features of the composition, and performs regression calculations on the deep features and shallow features to obtain the total score of the composition. The experiment uses the Pearson evaluation index to measure the correlation between the predicted total score of the essay and the true total score under the combination method. The experiment shows that compared with the average results of 0.747 and 0.645 of baseline models such as BiLSTM and RNN, the algorithm proposed in this article is respectively improvements are 0.068 and 0.17, which proves the effectiveness of the method proposed in this paper.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"20 6","pages":"129690T - 129690T-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139640403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Danqing Zhao, Shuyi Xin, Lechen Liu, Yihan Sun, Anqi Du
{"title":"Enhancing audio perception in augmented reality: a dynamic vocal information processing framework","authors":"Danqing Zhao, Shuyi Xin, Lechen Liu, Yihan Sun, Anqi Du","doi":"10.1117/12.3014440","DOIUrl":"https://doi.org/10.1117/12.3014440","url":null,"abstract":"The development of the Metaverse nowadays has sparked widespread emotions among researchers, and correspondingly, many technologies have been derived to improve the human's sense of reality in the Metaverse. Especially, Extended Reality (XR), as an indispensable and important technology and research direction in the study of the metaverse, aims to bring seamless transformation between the virtual world and the real-world immersion to the experiential world. However, the technology we currently lack is the ability to simultaneously separate, classify, and locate dynamic human sound information to enhance human sound perception in complex noise environments. This article proposes a framework that utilizes FCNN for separation, algebraic models for positioning to obtain estimated distances, and SVM for classification. The dataset is built to simulates distance-related changes with accurate ground truth labels. The results show that our method can effectively separate, separate, and locate mixed sound data, providing users with comprehensive information about the content, gender, and distance of the speaking object in complex sound environments, enhancing their immersive experience and perception ability. Our innovation lies in the combination of three audio processing technologies and the framework proposed may well inspire future work on related topics.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":" 22","pages":"129691Z - 129691Z-9"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139640520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Collaborative filtering recommendation method based on graph convolutional neural networks","authors":"Zhengwu Yuan, Xiling Zhan, Yatao Zhou, Hao Yang","doi":"10.1117/12.3014407","DOIUrl":"https://doi.org/10.1117/12.3014407","url":null,"abstract":"In the rapidly advancing information technology era, information overload poses a significant challenge. Recommender systems offer a partial solution, yet traditional methods grapple with issues like sparse data and accuracy. For this reason, this paper introduces a novel approach—a high-order graph convolutional collaborative filtering model. This model employs a subgraph generation module to enhance the importance of neighbor nodes during high-order graph convolutions. Our approach yields enhanced embeddings by embedding user-item interaction information using graph techniques, stacking multi-layer graph convolutional networks to capture complex interactions, and leveraging both initial and convoluted embeddings. This paper introduces a constraint loss function to address over-smoothing in graph-based recommendations. Our method's effectiveness is confirmed through extensive experiments on three real-world datasets","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":" 56","pages":"129691U - 129691U-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139640385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Three-dimensional target detection algorithm for dangerous goods in CT security inspection","authors":"Jingze He, Yao Guo, qing song","doi":"10.1117/12.3014353","DOIUrl":"https://doi.org/10.1117/12.3014353","url":null,"abstract":"In this paper, a 3D dangerous goods detection method based on RetinaNet is proposed. This method uses the bidirectional feature pyramid network structure of RetinaNet to extract multi-scale features from point cloud data and trains the system using Focal Loss function to achieve fast and accurate detection of dangerous goods. In addition, in order to improve the detection accuracy, this paper also introduces the 3D region proposal network (3D RPN) and nonmaximum suppression (NMS) algorithm. The experimental results show that the proposed method performs well on our self-built CT dataset, with high accuracy and low false positive rate, and is suitable for dangerous goods detection tasks in practical scenarios.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"54 5","pages":"1296902 - 1296902-6"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenxuan Song, Jinming Liu, Chunqi Wang, Zhijiang Li
{"title":"Rapid identification of adulterated rice using fusion of near-infrared spectroscopy and machine vision data: the combination of feature optimization and nonlinear modeling","authors":"Chenxuan Song, Jinming Liu, Chunqi Wang, Zhijiang Li","doi":"10.1117/12.3014380","DOIUrl":"https://doi.org/10.1117/12.3014380","url":null,"abstract":"Rice is susceptible to mold and mildew during storage. Metabolites such as aflatoxin produced during mildew will do great harm to consumers. To meet the need for rapid detection of normal rice adulterated with moldy rice, a rapid identification method of adulterated rice was established based on data fusion of near-infrared spectroscopy and machine vision. Using competitive adaptive reweighted sampling (CARS), genetic algorithm (GA), and least angle regression (LARS) for spectral and image feature extraction, combined with support vector classification (SVC), random forest (RF), and gradient boosting tree (GBT) nonlinear discriminant models, and use Bayesian search to optimize modeling parameters. The results show that the GBT fusion data model established by LARS optimization of spectral and image feature variables has the highest discrimination accuracy, with recognition accuracy rates of 100.00% and 98.11% for its training and testing sets, respectively. The discrimination performance is significantly improved compared to single near-infrared spectroscopy and machine vision. The results indicate that rapid identification of adulterated rice based on near-infrared spectroscopy and machine vision data fusion technology is feasible, providing theoretical support for the development of online identification equipment for adulterated rice.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"63 2","pages":"129692J - 129692J-16"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bin Zhu, Gaoxiang He, Bo Xie, Yi Chen, Yaoxuan Zhu, Liuying Chen
{"title":"Fast and high quality neural radiance fields reconstruction based on depth regularization","authors":"Bin Zhu, Gaoxiang He, Bo Xie, Yi Chen, Yaoxuan Zhu, Liuying Chen","doi":"10.1117/12.3014528","DOIUrl":"https://doi.org/10.1117/12.3014528","url":null,"abstract":"Although the Neural Radiance Fields (NeRF) has been shown to achieve high-quality novel view synthesis, existing models still perform poorly in some scenarios, particularly unbounded scenes. These models either require excessively long training times or produce suboptimal synthesis results. Consequently, we propose SD-NeRF, which consists of a compact neural radiance field model and self-supervised depth regularization. Experimental results demonstrate that SDNeRF can shorten training time by over 20 times compared to Mip-NeRF360 without compromising reconstruction accuracy.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"43 3","pages":"129692F - 129692F-9"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Research on collaborative positioning of intelligent vehicle aided navigation based on computer vision technology","authors":"Shun Zhang","doi":"10.1117/12.3014415","DOIUrl":"https://doi.org/10.1117/12.3014415","url":null,"abstract":"Due to the low accuracy of collecting vehicle position information, the error in the positioning stage is relatively large. Therefore, the collaborative positioning of intelligent vehicle aided navigation based on computer vision technology is proposed. Taking the computer vision equipment-smart cameras VOF/VOF-S as a specific data acquisition device, and combining with the specific running state of the vehicle, the specific parameters in the data acquisition stage are set differently, so as to realize the accurate acquisition of vehicle position information. In the positioning stage, the plane where the wheel is located is taken as the road plane, and the coordinate parameters of data information collected by several road ground points in VOF/VOF-S computer vision technology device are integrated to realize the transformation of vehicle position information in real space. In the test results, the positioning error of vehicle position under different driving conditions is always stable within 1.50m, which has high accuracy.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"4 1","pages":"129692P - 129692P-5"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Na Geng, Hu Sheng, Weizhi Sun, Yifeng Wang, Tan Yu, Zihan Liu
{"title":"Image segmentation of rail surface defects based on fractional order particle swarm optimization 2D-Otsu algorithm","authors":"Na Geng, Hu Sheng, Weizhi Sun, Yifeng Wang, Tan Yu, Zihan Liu","doi":"10.1117/12.3014444","DOIUrl":"https://doi.org/10.1117/12.3014444","url":null,"abstract":"Under the influence of high density operation and natural environment, the rail surface will appear abrasion damage, which will affect the safety and comfort of the train. Rail surface defect detection is an important part to ensure the safe and efficient operation of railway system. In order to distinguish whether there are defects on the rail surface, a method of rail surface defect image segmentation based on FPSO 2D-Otsu algorithm is proposed. The rail image is denoised and enhanced by adaptive fractional calculus, and then the rail image is segmented by FPSO 2D-Otsu algorithm. In order to verify the accuracy of the algorithm, the proposed algorithm is compared with PSO 2D-Otsu image segmentation algorithm. The experimental results show that the accuracy of FPSO 2D-Otsu algorithm in rail image segmentation is improved from 48.76% to 83.59% compared with PSO 2D-Otsu algorithm.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"226 1","pages":"129690A - 129690A-4"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140511969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
BaiYang Xiang, BoKai Li, Huaijuan Zang, Zeliang Zhao, Shu Zhan
{"title":"Microexpression recognition algorithm based on multi feature fusion","authors":"BaiYang Xiang, BoKai Li, Huaijuan Zang, Zeliang Zhao, Shu Zhan","doi":"10.1117/12.3014469","DOIUrl":"https://doi.org/10.1117/12.3014469","url":null,"abstract":"Video facial micro expression recognition is difficult to extract features due to its short duration and small action amplitude. In order to better combine temporal and spatial information of video, the whole model is divided into local attention module, global attention module and temporal module. First, the local attention module intercepts the key areas and sends them to the network with channel attention after processing; Then the global attention module sends the data into the network with spatial attention after random erasure avoiding key areas; Finally, the temporal module sends the micro expression occurrence frame to the network with temporal shift module and spatial attention after processing; Finally, the classification results are obtained through three full connection layers after feature fusion. The experiment is tested based on CASMEⅡ dataset,After five-fold Cross Validation, the average accuracy rate is 76.15, the unweighted F1 value is 0.691.Compared with the mainstream algorithm, this method has improvement.","PeriodicalId":516634,"journal":{"name":"International Conference on Algorithm, Imaging Processing and Machine Vision (AIPMV 2023)","volume":"12 6","pages":"1296908 - 1296908-10"},"PeriodicalIF":0.0,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140512112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}