Multimedia Tools and Applications最新文献_第6页

Exploiting multi-transformer encoder with multiple-hypothesis aggregation via diffusion model for 3D human pose estimation 通过扩散模型利用多变换器编码器和多假设聚合进行三维人体姿态估计

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-10 DOI: 10.1007/s11042-024-20179-x

Sathiyamoorthi Arthanari, Jae Hoon Jeong, Young Hoon Joo

{"title":"Exploiting multi-transformer encoder with multiple-hypothesis aggregation via diffusion model for 3D human pose estimation","authors":"Sathiyamoorthi Arthanari, Jae Hoon Jeong, Young Hoon Joo","doi":"10.1007/s11042-024-20179-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20179-x","url":null,"abstract":"The transformer architecture has consistently achieved cutting-edge performance in the task of 2D to 3D lifting human pose estimation. Despite advances in transformer-based methods they still suffer from issues related to sequential data processing, addressing depth ambiguity, and effective handling of sensitive noisy data. As a result, transformer encoders encounter difficulties in precisely estimating human positions. To solve this problem, a novel multi-transformer encoder with a multiple-hypothesis aggregation (MHAFormer) module is proposed in this study. To do this, a diffusion module is first introduced that generates multiple 3D pose hypotheses and gradually distributes Gaussian noise to ground truth 3D poses. Subsequently, the denoiser is employed within the diffusion module to restore the feasible 3D poses by leveraging the information from the 2D keypoints. Moreover, we propose the multiple-hypothesis aggregation with a join-level reprojection (MHAJR) approach that redesigns the 3D hypotheses into the 2D position and selects the optimal hypothesis by considering reprojection errors. In particular, the multiple-hypothesis aggregation approach tackles depth ambiguity and sequential data processing by considering various possible poses and combining their strengths for a more accurate final estimation. Next, we present the improved spatial-temporal transformers encoder that can help to improve the accuracy and reduce the ambiguity of 3D pose estimation by explicitly modeling the spatial and temporal relationships between different body joints. Specifically, the temporal-transformer encoder introduces the temporal constriction & proliferation (TCP) attention mechanism and the feature aggregation refinement module (FAR) into the refined temporal constriction & proliferation (RTCP) transformer, which enhances intra-block temporal modeling and further refines inter-block feature interaction. Finally, the superiority of the proposed approach is demonstrated through comparison with existing methods using the Human3.6M and MPI-INF-3DHP benchmark datasets.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"47 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning attention characterization based on head pose sight estimation 基于头部姿势视线估计的注意力特征学习

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-10 DOI: 10.1007/s11042-024-20204-z

Jianwen Mo, Haochang Liang, Hua Yuan, Zhaoyu Shou, Huibing Zhang

{"title":"Learning attention characterization based on head pose sight estimation","authors":"Jianwen Mo, Haochang Liang, Hua Yuan, Zhaoyu Shou, Huibing Zhang","doi":"10.1007/s11042-024-20204-z","DOIUrl":"https://doi.org/10.1007/s11042-024-20204-z","url":null,"abstract":"The degree of students’ attentiveness in the classroom is known as learning attention and is the main indicator used to portray students’ learning status in the classroom. Studying smart classroom time-series image data and analyzing students’ attention to learning are important tools for improving student learning effects. To this end, this paper proposes a learning attention analysis algorithm based on the head pose sight estimation.The algorithm first employs multi-scale hourglass attention to enable the head pose estimation model to capture more spatial pose features.It is also proposed that the multi-classification multi-regression losses guide the model to learn different granularity of pose features, making the model more sensitive to subtle inter-class distinction of the data;Second, a sight estimation algorithm on 3D space is innovatively adopted to compute the coordinates of the student’s sight landing point through the head pose; Finally, a model of sight analysis over the duration of a knowledge point is constructed to characterize students’ attention to learning. Experiments show that the algorithm in this paper can effectively reduce the head pose estimation error, accurately characterize students’ learning attention, and provide strong technical support for the analysis of students’ learning effect. The algorithm demonstrates its potential application value and can be deployed in smart classrooms in schools.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Ambient-NeRF: light train enhancing neural radiance fields in low-light conditions with ambient-illumination 环境-神经辐射场：用环境照明增强弱光条件下神经辐射场的光列

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-10 DOI: 10.1007/s11042-024-19699-3

Peng Zhang, Gengsheng Hu, Mei Chen, Mahmoud Emam

{"title":"Ambient-NeRF: light train enhancing neural radiance fields in low-light conditions with ambient-illumination","authors":"Peng Zhang, Gengsheng Hu, Mei Chen, Mahmoud Emam","doi":"10.1007/s11042-024-19699-3","DOIUrl":"https://doi.org/10.1007/s11042-024-19699-3","url":null,"abstract":"NeRF can render photorealistic 3D scenes. It is widely used in virtual reality, autonomous driving, game development and other fields, and quickly becomes one of the most popular technologies in the field of 3D reconstruction. NeRF generates a realistic 3D scene by emitting light from the camera’s spatial coordinates and viewpoint, passing through the scene and calculating the view seen from the viewpoint. However, when the brightness of the original input image is low, it is difficult to recover the scene. Inspired by the ambient illumination in the Phong model of computer graphics, it is assumed that the final rendered image is the product of scene color and ambient illumination. In this paper, we employ Multi-Layer Perceptron (MLP) network to train the ambient illumination tensor (textbf{I}), which is multiplied by the color predicted by NeRF to render images with normal illumination. Furthermore, we use tiny-cuda-nn as a backbone network to simplify the proposed network structure and greatly improve the training speed. Additionally, a new loss function is introduced to achieve a better image quality under low illumination conditions. The experimental results demonstrate the efficiency of the proposed method in enhancing low-light scene images compared with other state-of-the-art methods, with an overall average of PSNR: 20.53 , SSIM: 0.785, and LPIPS: 0.258 on the LOM dataset.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"106 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203685","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Discrete ripplet-II transform feature extraction and metaheuristic-optimized feature selection for enhanced glaucoma detection in fundus images using least square-support vector machine 离散涟漪-II 变换特征提取和元搜索优化特征选择用于使用最小平方支持向量机增强眼底图像中的青光眼检测功能

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-10 DOI: 10.1007/s11042-024-19974-3

Santosh Kumar Sharma, Debendra Muduli, Adyasha Rath, Sujata Dash, Ganapati Panda, Achyut Shankar, Dinesh Chandra Dobhal

{"title":"Discrete ripplet-II transform feature extraction and metaheuristic-optimized feature selection for enhanced glaucoma detection in fundus images using least square-support vector machine","authors":"Santosh Kumar Sharma, Debendra Muduli, Adyasha Rath, Sujata Dash, Ganapati Panda, Achyut Shankar, Dinesh Chandra Dobhal","doi":"10.1007/s11042-024-19974-3","DOIUrl":"https://doi.org/10.1007/s11042-024-19974-3","url":null,"abstract":"Recently, significant progress has been made in developing computer-aided diagnosis (CAD) systems for identifying glaucoma abnormalities using fundus images. Despite their drawbacks, methods for extracting features such as wavelets and their variations, along with classifier like support vector machines (SVM), are frequently employed in such systems. This paper introduces a practical and enhanced system for detecting glaucoma in fundus images. The proposed model adresses the chanallages encountered by other existing models in recent litrature. Initially, we have employed contrast limited adaputive histogram equalization (CLAHE) to enhanced the visualization of input fundus inmages. Then, the discrete ripplet-II transform (DR2T) employing a degree of 2 for feature extraction. Afterwards, we have utilized a golden jackal optimization algorithm (GJO) employed to select the optimal features to reduce the dimension of the extracted feature vector. For classification purposes, we have employed a least square support vector machine (LS-SVM) equipped with three kernels: linear, polynomial, and radial basis function (RBF). This setup has been utilized to classify fundus images as either indicative of glaucoma or healthy. The proposed method is validated with the current state-of-the-art models on two standard datasets, namely, G1020 and ORIGA. The results obtained from our experimental result demonstrate that our best suggested approach DR2T+GJO+LS-SVM-RBF obtains better classification accuracy 93.38% and 97.31% for G1020 and ORIGA dataset with less number of features. It establishes a more streamlined network layout compared to conventional classifiers.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"4 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An efficient iterative pseudo point elimination technique to represent the shape of the digital image boundary 表示数字图像边界形状的高效迭代伪点消除技术

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-09 DOI: 10.1007/s11042-024-20183-1

Mangayarkarasi Ramaiah, Vinayakumar Ravi, Vanmathi Chandrasekaran, Vanitha Mohanraj, Deepa Mani, Angulakshmi Maruthamuthu

{"title":"An efficient iterative pseudo point elimination technique to represent the shape of the digital image boundary","authors":"Mangayarkarasi Ramaiah, Vinayakumar Ravi, Vanmathi Chandrasekaran, Vanitha Mohanraj, Deepa Mani, Angulakshmi Maruthamuthu","doi":"10.1007/s11042-024-20183-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20183-1","url":null,"abstract":"Visually, the environment is made up of a chaotic of irregular polygons. It is an important and intriguing issue in many fields of study to represent and comprehend the irregular polygon. However, approximating the polygon presents significant difficulties from a variety of perspectives. The method provided in this research eliminates the pseudo-redundant points that are not contributing to shape retention and then makes the polygonal approximation with the remaining high-curvature points, as opposed to searching for the real points on the digital image boundary curve. The proposed method uses chain code assignment to obtain initial segmentation points. Using integer arithmetic, the presented method calculates the curvature at each initial pseudo point using sum of squares of deviation. For every initial segmented pseudo point, the difference incurred by all the boundary points lies between its earlier pseudo point and its next initial pseudo point was taken into account. Then, this new proposal removes the redundant point from the subset of initial segmentation points whose curvature deviation is the lowest with each iteration. The method then recalculates the deviation information for the next and previous close pseudo points. Experiments are done with MPEG datasets and synthetic contours to show how well the proposed method works in both quantitative and qualitative ways. The experimental result shows the effectiveness of the proposed method in creating polygons with few points.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"35 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient compressed storage and fast reconstruction of large binary images using chain codes 利用链码高效压缩存储和快速重建大型二进制图像

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-09 DOI: 10.1007/s11042-024-20199-7

Damjan Strnad, Danijel Žlaus, Andrej Nerat, Borut Žalik

{"title":"Efficient compressed storage and fast reconstruction of large binary images using chain codes","authors":"Damjan Strnad, Danijel Žlaus, Andrej Nerat, Borut Žalik","doi":"10.1007/s11042-024-20199-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20199-7","url":null,"abstract":"Large binary images are used in many modern applications of image processing. For instance, they serve as inputs or target masks for training machine learning (ML) models in computer vision and image segmentation. Storing large binary images in limited memory and loading them repeatedly on demand, which is common in ML, calls for efficient image encoding and decoding mechanisms. In the paper, we propose an encoding scheme for efficient compressed storage of large binary images based on chain codes, and introduce a new single-pass algorithm for fast parallel reconstruction of raster images from the encoded representation. We use three large real-life binary masks to test the efficiency of the proposed method, which were derived from vector layers of single-class objects – a building cadaster, a woody vegetation landscape feature map, and a road network map. We show that the masks encoded by the proposed method require significantly less storage space than standard lossless compression formats. We further compared the proposed method for mask reconstruction from chain codes with a recent state-of-the-art algorithm, and achieved between (12%) and (33%) faster reconstruction on test data.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"168 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Predicting eye-tracking assisted web page segmentation 预测眼动跟踪辅助网页分割

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-09 DOI: 10.1007/s11042-024-20202-1

Abdullah Sulayfani, Sukru Eraslan, Yeliz Yesilada

{"title":"Predicting eye-tracking assisted web page segmentation","authors":"Abdullah Sulayfani, Sukru Eraslan, Yeliz Yesilada","doi":"10.1007/s11042-024-20202-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20202-1","url":null,"abstract":"Different kinds of algorithms have been proposed to identify the visual elements of web pages for different purposes, such as improving web accessibility, measuring web page visual quality and aesthetics etc. One group of these algorithms identifies the elements by analyzing the source code and visual representation of web pages, whereas another group discovers the attractive elements by analyzing the eye movements of users. A previous approach proposes combining these two approaches to consider both the source code and visual representation of web pages and users’ eye movements on those pages. The result of the proposed approach can be considered eye-tracking-assisted web page segmentation. However, since the eye-tracking data collection procedure is elaborate, time-consuming, and expensive, and it is not feasible to collect eye-tracking data for each page, we aim to develop a model to predict such segmentation without requiring eye-tracking data. In this paper, we present our experiments with different Machine and Deep Learning algorithms and show that the K-Nearest Neighbour (KNN) model yields the best results in prediction. We present a KNN model that predicts eye-tracking-assisted web page segmentation with an F1-score of 78.74%. This work shows how an Machine Learning algorithm can automate web page segmentation driven by eye-tracking data.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"21 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An optimized cluster validity index for identification of cancer mediating genes 用于识别癌症介导基因的优化聚类有效性指数

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-09 DOI: 10.1007/s11042-024-20105-1

Subir Hazra, Anupam Ghosh

{"title":"An optimized cluster validity index for identification of cancer mediating genes","authors":"Subir Hazra, Anupam Ghosh","doi":"10.1007/s11042-024-20105-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20105-1","url":null,"abstract":"One of the major challenges in bioinformatics lies in identification of modified gene expressions of an affected person due to medical ailments. Focused research has been observed till date in such identification, leading to multiple proposals pivoting in clustering of gene expressions. Moreover, while clustering proves to be an effective way to demarcate the affected gene expression vectors, there has been global research on the cluster count that optimizes the gene expression variations among the clusters. This study proposes a new index called mean-max index (MMI) to determine the cluster count which divides the data collection into ideal number of clusters depending on gene expression variations. MMI works on the principle of minimization of the intra cluster variations among the members and maximization of inter cluster variations. In this regard, the study has been conducted on publicly available dataset comprising of gene expressions for three diseases, namely lung disease, leukaemia, and colon cancer. The data count for normal as well as diseased patients lie at 10 and 86 for lung disease patients, 43 and 13 for patients observed with leukaemia, and 18 and 18 for patients with colon cancer respectively. The gene expression vectors for the three diseases comprise of 7129,22283, and 6600 respectively. Three clustering models have been used for this study, namely k-means, partition around medoid, and fuzzy c-means, all using the proposed MMI technique for finalizing the cluster count. The Comparative analysis reflects that the proposed MMI index is able to recognize much more true positives (biologically enriched) cancer mediating genes with respect to other cluster validity indices and it can be considered as superior to other with respect to enhanced accuracy by 85%.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"407 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A GAN based method for cross-scene classification of hyperspectral scenes captured by different sensors 基于 GAN 的方法，用于对不同传感器捕获的高光谱场景进行跨场景分类

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-09 DOI: 10.1007/s11042-024-19969-0

Amir Mahmoudi, Alireza Ahmadyfard

{"title":"A GAN based method for cross-scene classification of hyperspectral scenes captured by different sensors","authors":"Amir Mahmoudi, Alireza Ahmadyfard","doi":"10.1007/s11042-024-19969-0","DOIUrl":"https://doi.org/10.1007/s11042-024-19969-0","url":null,"abstract":"Labeling samples in hyperspectral images is time-consuming and labor-intensive. Domain adaptation methods seek to address this challenge by transferring the knowledge from a labeled source domain to an unlabeled target domain, enabling classification with minimal or no labeled samples in the target domain. This is achieved by mitigating the domain shift caused by differences in sensing conditions. However, most of the existing works implement domain adaptation techniques on homogeneous hyperspectral data where both source and target are acquired by the same sensor and contain an equal number of spectral bands. The present paper proposes an end-to-end network, Generative Adversarial Network for Heterogeneous Domain Adaptation (GANHDA), capable of handling domain adaptation between target and source scenes captured by different sensors with varying spectral and spatial resolutions, resulting in non-equivalent data representations across domains. GANHDA leverages adversarial training, a bi-classifier, variational autoencoders, and graph regularization to transfer high-level conceptual knowledge from the source to the target domain, aiming for improved classification performance. This approach is applied to two heterogeneous hyperspectral datasets, namely RPaviaU-DPaviaC and EHangzhou-RPaviaHR. All source labels are used for training, while only 5 pixels per class from the target are used for training. The results are promising and we achieved an overall accuracy of 90.16% for RPaviaU-DPaviaC and 99.12% for EHangzhou-RPaviaHR, outperforming previous methods. Our code Implementation can be found at https://github.com/amirmah/HSI_GANHDA.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Auto-proctoring using computer vision in MOOCs system 在 MOOC 系统中使用计算机视觉进行自动监考

IF 3.6 4区计算机科学

Multimedia Tools and Applications Pub Date : 2024-09-09 DOI: 10.1007/s11042-024-20099-w

Tuan Linh Dang, Nguyen Minh Nhat Hoang, The Vu Nguyen, Hoang Vu Nguyen, Quang Minh Dang, Quang Hai Tran, Huy Hoang Pham

{"title":"Auto-proctoring using computer vision in MOOCs system","authors":"Tuan Linh Dang, Nguyen Minh Nhat Hoang, The Vu Nguyen, Hoang Vu Nguyen, Quang Minh Dang, Quang Hai Tran, Huy Hoang Pham","doi":"10.1007/s11042-024-20099-w","DOIUrl":"https://doi.org/10.1007/s11042-024-20099-w","url":null,"abstract":"The COVID-19 outbreak has caused a significant shift towards virtual education, where Massive Open Online Courses (MOOCs), such as EdX and Coursera, have become prevalent distance learning mediums. Online exams are also gaining popularity, but they pose a risk of cheating without proper supervision. Online proctoring can significantly improve the quality of education, and with the addition of extended modules on MOOCs, the incorporation of artificial intelligence in the proctoring process has become more accessible. Despite the advancements in machine learning-based cheating detection in third-party proctoring tools, there is still a need for optimization and adaptability of such systems for massive simultaneous user requirements of MOOCs. Therefore, we have developed an examination monitoring system based on advanced artificial intelligence technology. This system is highly scalable and can be easily integrated with our existing MOOCs platform, daotao.ai. Experimental results demonstrated that our proposed system achieved a 95.66% accuracy rate in detecting cheating behaviors, processed video inputs with an average response time of 0.517 seconds, and successfully handled concurrent user demands, thereby validating its effectiveness and reliability for large-scale online examination monitoring.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"6 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0