arXiv - CS - Multimedia最新文献

筛选
英文 中文
Bridging Discrete and Continuous: A Multimodal Strategy for Complex Emotion Detection 连接离散与连续:复杂情绪检测的多模式策略
arXiv - CS - Multimedia Pub Date : 2024-09-12 DOI: arxiv-2409.07901
Jiehui Jia, Huan Zhang, Jinhua Liang
{"title":"Bridging Discrete and Continuous: A Multimodal Strategy for Complex Emotion Detection","authors":"Jiehui Jia, Huan Zhang, Jinhua Liang","doi":"arxiv-2409.07901","DOIUrl":"https://doi.org/arxiv-2409.07901","url":null,"abstract":"In the domain of human-computer interaction, accurately recognizing and\u0000interpreting human emotions is crucial yet challenging due to the complexity\u0000and subtlety of emotional expressions. This study explores the potential for\u0000detecting a rich and flexible range of emotions through a multimodal approach\u0000which integrates facial expressions, voice tones, and transcript from video\u0000clips. We propose a novel framework that maps variety of emotions in a\u0000three-dimensional Valence-Arousal-Dominance (VAD) space, which could reflect\u0000the fluctuations and positivity/negativity of emotions to enable a more variety\u0000and comprehensive representation of emotional states. We employed K-means\u0000clustering to transit emotions from traditional discrete categorization to a\u0000continuous labeling system and built a classifier for emotion recognition upon\u0000this system. The effectiveness of the proposed model is evaluated using the\u0000MER2024 dataset, which contains culturally consistent video clips from Chinese\u0000movies and TV series, annotated with both discrete and open-vocabulary emotion\u0000labels. Our experiment successfully achieved the transformation between\u0000discrete and continuous models, and the proposed model generated a more diverse\u0000and comprehensive set of emotion vocabulary while maintaining strong accuracy.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"67 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally FlashSplat:优化解决二维到三维高斯拼接分割问题
arXiv - CS - Multimedia Pub Date : 2024-09-12 DOI: arxiv-2409.08270
Qiuhong Shen, Xingyi Yang, Xinchao Wang
{"title":"FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally","authors":"Qiuhong Shen, Xingyi Yang, Xinchao Wang","doi":"arxiv-2409.08270","DOIUrl":"https://doi.org/arxiv-2409.08270","url":null,"abstract":"This study addresses the challenge of accurately segmenting 3D Gaussian\u0000Splatting from 2D masks. Conventional methods often rely on iterative gradient\u0000descent to assign each Gaussian a unique label, leading to lengthy optimization\u0000and sub-optimal solutions. Instead, we propose a straightforward yet globally\u0000optimal solver for 3D-GS segmentation. The core insight of our method is that,\u0000with a reconstructed 3D-GS scene, the rendering of the 2D masks is essentially\u0000a linear function with respect to the labels of each Gaussian. As such, the\u0000optimal label assignment can be solved via linear programming in closed form.\u0000This solution capitalizes on the alpha blending characteristic of the splatting\u0000process for single step optimization. By incorporating the background bias in\u0000our objective function, our method shows superior robustness in 3D segmentation\u0000against noises. Remarkably, our optimization completes within 30 seconds, about\u000050$times$ faster than the best existing methods. Extensive experiments\u0000demonstrate the efficiency and robustness of our method in segmenting various\u0000scenes, and its superior performance in downstream tasks such as object removal\u0000and inpainting. Demos and code will be available at\u0000https://github.com/florinshen/FlashSplat.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length SwinGS:用于任意长度体积视频流的滑动窗口高斯拼接技术
arXiv - CS - Multimedia Pub Date : 2024-09-12 DOI: arxiv-2409.07759
Bangya Liu, Suman Banerjee
{"title":"SwinGS: Sliding Window Gaussian Splatting for Volumetric Video Streaming with Arbitrary Length","authors":"Bangya Liu, Suman Banerjee","doi":"arxiv-2409.07759","DOIUrl":"https://doi.org/arxiv-2409.07759","url":null,"abstract":"Recent advances in 3D Gaussian Splatting (3DGS) have garnered significant\u0000attention in computer vision and computer graphics due to its high rendering\u0000speed and remarkable quality. While extant research has endeavored to extend\u0000the application of 3DGS from static to dynamic scenes, such efforts have been\u0000consistently impeded by excessive model sizes, constraints on video duration,\u0000and content deviation. These limitations significantly compromise the\u0000streamability of dynamic 3D Gaussian models, thereby restricting their utility\u0000in downstream applications, including volumetric video, autonomous vehicle, and\u0000immersive technologies such as virtual, augmented, and mixed reality. This paper introduces SwinGS, a novel framework for training, delivering, and\u0000rendering volumetric video in a real-time streaming fashion. To address the\u0000aforementioned challenges and enhance streamability, SwinGS integrates\u0000spacetime Gaussian with Markov Chain Monte Carlo (MCMC) to adapt the model to\u0000fit various 3D scenes across frames, in the meantime employing a sliding window\u0000captures Gaussian snapshots for each frame in an accumulative way. We implement\u0000a prototype of SwinGS and demonstrate its streamability across various datasets\u0000and scenes. Additionally, we develop an interactive WebGL viewer enabling\u0000real-time volumetric video playback on most devices with modern browsers,\u0000including smartphones and tablets. Experimental results show that SwinGS\u0000reduces transmission costs by 83.6% compared to previous work with ignorable\u0000compromise in PSNR. Moreover, SwinGS easily scales to long video sequences\u0000without compromising quality.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"35 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TMFNet: Two-Stream Multi-Channels Fusion Networks for Color Image Operation Chain Detection TMFNet:用于彩色图像操作链检测的双流多通道融合网络
arXiv - CS - Multimedia Pub Date : 2024-09-12 DOI: arxiv-2409.07701
Yakun Niu, Lei Tan, Lei Zhang, Xianyu Zuo
{"title":"TMFNet: Two-Stream Multi-Channels Fusion Networks for Color Image Operation Chain Detection","authors":"Yakun Niu, Lei Tan, Lei Zhang, Xianyu Zuo","doi":"arxiv-2409.07701","DOIUrl":"https://doi.org/arxiv-2409.07701","url":null,"abstract":"Image operation chain detection techniques have gained increasing attention\u0000recently in the field of multimedia forensics. However, existing detection\u0000methods suffer from the generalization problem. Moreover, the channel\u0000correlation of color images that provides additional forensic evidence is often\u0000ignored. To solve these issues, in this article, we propose a novel two-stream\u0000multi-channels fusion networks for color image operation chain detection in\u0000which the spatial artifact stream and the noise residual stream are explored in\u0000a complementary manner. Specifically, we first propose a novel deep residual\u0000architecture without pooling in the spatial artifact stream for learning the\u0000global features representation of multi-channel correlation. Then, a set of\u0000filters is designed to aggregate the correlation information of multi-channels\u0000while capturing the low-level features in the noise residual stream.\u0000Subsequently, the high-level features are extracted by the deep residual model.\u0000Finally, features from the two streams are fed into a fusion module, to\u0000effectively learn richer discriminative representations of the operation chain.\u0000Extensive experiments show that the proposed method achieves state-of-the-art\u0000generalization ability while maintaining robustness to JPEG compression. The\u0000source code used in these experiments will be released at\u0000https://github.com/LeiTan-98/TMFNet.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models Hi3D:利用视频扩散模型实现高分辨率图像到 3D 的生成
arXiv - CS - Multimedia Pub Date : 2024-09-11 DOI: arxiv-2409.07452
Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Chong-Wah Ngo, Tao Mei
{"title":"Hi3D: Pursuing High-Resolution Image-to-3D Generation with Video Diffusion Models","authors":"Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Chong-Wah Ngo, Tao Mei","doi":"arxiv-2409.07452","DOIUrl":"https://doi.org/arxiv-2409.07452","url":null,"abstract":"Despite having tremendous progress in image-to-3D generation, existing\u0000methods still struggle to produce multi-view consistent images with\u0000high-resolution textures in detail, especially in the paradigm of 2D diffusion\u0000that lacks 3D awareness. In this work, we present High-resolution Image-to-3D\u0000model (Hi3D), a new video diffusion based paradigm that redefines a single\u0000image to multi-view images as 3D-aware sequential image generation (i.e.,\u0000orbital video generation). This methodology delves into the underlying temporal\u0000consistency knowledge in video diffusion model that generalizes well to\u0000geometry consistency across multiple views in 3D generation. Technically, Hi3D\u0000first empowers the pre-trained video diffusion model with 3D-aware prior\u0000(camera pose condition), yielding multi-view images with low-resolution texture\u0000details. A 3D-aware video-to-video refiner is learnt to further scale up the\u0000multi-view images with high-resolution texture details. Such high-resolution\u0000multi-view images are further augmented with novel views through 3D Gaussian\u0000Splatting, which are finally leveraged to obtain high-fidelity meshes via 3D\u0000reconstruction. Extensive experiments on both novel view synthesis and single\u0000view reconstruction demonstrate that our Hi3D manages to produce superior\u0000multi-view consistency images with highly-detailed textures. Source code and\u0000data are available at url{https://github.com/yanghb22-fdu/Hi3D-Official}.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation DreamMesh:联合操纵和纹理三角网格,实现文本到 3D 的生成
arXiv - CS - Multimedia Pub Date : 2024-09-11 DOI: arxiv-2409.07454
Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Tao Mei
{"title":"DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation","authors":"Haibo Yang, Yang Chen, Yingwei Pan, Ting Yao, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Tao Mei","doi":"arxiv-2409.07454","DOIUrl":"https://doi.org/arxiv-2409.07454","url":null,"abstract":"Learning radiance fields (NeRF) with powerful 2D diffusion models has\u0000garnered popularity for text-to-3D generation. Nevertheless, the implicit 3D\u0000representations of NeRF lack explicit modeling of meshes and textures over\u0000surfaces, and such surface-undefined way may suffer from the issues, e.g.,\u0000noisy surfaces with ambiguous texture details or cross-view inconsistency. To\u0000alleviate this, we present DreamMesh, a novel text-to-3D architecture that\u0000pivots on well-defined surfaces (triangle meshes) to generate high-fidelity\u0000explicit 3D model. Technically, DreamMesh capitalizes on a distinctive\u0000coarse-to-fine scheme. In the coarse stage, the mesh is first deformed by\u0000text-guided Jacobians and then DreamMesh textures the mesh with an interlaced\u0000use of 2D diffusion models in a tuning free manner from multiple viewpoints. In\u0000the fine stage, DreamMesh jointly manipulates the mesh and refines the texture\u0000map, leading to high-quality triangle meshes with high-fidelity textured\u0000materials. Extensive experiments demonstrate that DreamMesh significantly\u0000outperforms state-of-the-art text-to-3D methods in faithfully generating 3D\u0000content with richer textual details and enhanced geometry. Our project page is\u0000available at https://dreammesh.github.io.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process FreeEnhance:通过内容一致的噪声和去噪过程实现无调谐图像增强
arXiv - CS - Multimedia Pub Date : 2024-09-11 DOI: arxiv-2409.07451
Yang Luo, Yiheng Zhang, Zhaofan Qiu, Ting Yao, Zhineng Chen, Yu-Gang Jiang, Tao Mei
{"title":"FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process","authors":"Yang Luo, Yiheng Zhang, Zhaofan Qiu, Ting Yao, Zhineng Chen, Yu-Gang Jiang, Tao Mei","doi":"arxiv-2409.07451","DOIUrl":"https://doi.org/arxiv-2409.07451","url":null,"abstract":"The emergence of text-to-image generation models has led to the recognition\u0000that image enhancement, performed as post-processing, would significantly\u0000improve the visual quality of the generated images. Exploring diffusion models\u0000to enhance the generated images nevertheless is not trivial and necessitates to\u0000delicately enrich plentiful details while preserving the visual appearance of\u0000key content in the original image. In this paper, we propose a novel framework,\u0000namely FreeEnhance, for content-consistent image enhancement using the\u0000off-the-shelf image diffusion models. Technically, FreeEnhance is a two-stage\u0000process that firstly adds random noise to the input image and then capitalizes\u0000on a pre-trained image diffusion model (i.e., Latent Diffusion Models) to\u0000denoise and enhance the image details. In the noising stage, FreeEnhance is\u0000devised to add lighter noise to the region with higher frequency to preserve\u0000the high-frequent patterns (e.g., edge, corner) in the original image. In the\u0000denoising stage, we present three target properties as constraints to\u0000regularize the predicted noise, enhancing images with high acutance and high\u0000visual quality. Extensive experiments conducted on the HPDv2 dataset\u0000demonstrate that our FreeEnhance outperforms the state-of-the-art image\u0000enhancement models in terms of quantitative metrics and human preference. More\u0000remarkably, FreeEnhance also shows higher human preference compared to the\u0000commercial image enhancement solution of Magnific AI.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding MIP-GAF:最重要人物定位和群体上下文理解的 MLLM 注释基准
arXiv - CS - Multimedia Pub Date : 2024-09-10 DOI: arxiv-2409.06224
Surbhi Madan, Shreya Ghosh, Lownish Rai Sookha, M. A. Ganaie, Ramanathan Subramanian, Abhinav Dhall, Tom Gedeon
{"title":"MIP-GAF: A MLLM-annotated Benchmark for Most Important Person Localization and Group Context Understanding","authors":"Surbhi Madan, Shreya Ghosh, Lownish Rai Sookha, M. A. Ganaie, Ramanathan Subramanian, Abhinav Dhall, Tom Gedeon","doi":"arxiv-2409.06224","DOIUrl":"https://doi.org/arxiv-2409.06224","url":null,"abstract":"Estimating the Most Important Person (MIP) in any social event setup is a\u0000challenging problem mainly due to contextual complexity and scarcity of labeled\u0000data. Moreover, the causality aspects of MIP estimation are quite subjective\u0000and diverse. To this end, we aim to address the problem by annotating a\u0000large-scale `in-the-wild' dataset for identifying human perceptions about the\u0000`Most Important Person (MIP)' in an image. The paper provides a thorough\u0000description of our proposed Multimodal Large Language Model (MLLM) based data\u0000annotation strategy, and a thorough data quality analysis. Further, we perform\u0000a comprehensive benchmarking of the proposed dataset utilizing state-of-the-art\u0000MIP localization methods, indicating a significant drop in performance compared\u0000to existing datasets. The performance drop shows that the existing MIP\u0000localization algorithms must be more robust with respect to `in-the-wild'\u0000situations. We believe the proposed dataset will play a vital role in building\u0000the next-generation social situation understanding methods. The code and data\u0000is available at https://github.com/surbhimadan92/MIP-GAF.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distilling Generative-Discriminative Representations for Very Low-Resolution Face Recognition 为极低分辨率人脸识别提取生成-鉴别表征
arXiv - CS - Multimedia Pub Date : 2024-09-10 DOI: arxiv-2409.06371
Junzheng Zhang, Weijia Guo, Bochao Liu, Ruixin Shi, Yong Li, Shiming Ge
{"title":"Distilling Generative-Discriminative Representations for Very Low-Resolution Face Recognition","authors":"Junzheng Zhang, Weijia Guo, Bochao Liu, Ruixin Shi, Yong Li, Shiming Ge","doi":"arxiv-2409.06371","DOIUrl":"https://doi.org/arxiv-2409.06371","url":null,"abstract":"Very low-resolution face recognition is challenging due to the serious loss\u0000of informative facial details in resolution degradation. In this paper, we\u0000propose a generative-discriminative representation distillation approach that\u0000combines generative representation with cross-resolution aligned knowledge\u0000distillation. This approach facilitates very low-resolution face recognition by\u0000jointly distilling generative and discriminative models via two distillation\u0000modules. Firstly, the generative representation distillation takes the encoder\u0000of a diffusion model pretrained for face super-resolution as the generative\u0000teacher to supervise the learning of the student backbone via feature\u0000regression, and then freezes the student backbone. After that, the\u0000discriminative representation distillation further considers a pretrained face\u0000recognizer as the discriminative teacher to supervise the learning of the\u0000student head via cross-resolution relational contrastive distillation. In this\u0000way, the general backbone representation can be transformed into discriminative\u0000head representation, leading to a robust and discriminative student model for\u0000very low-resolution face recognition. Our approach improves the recovery of the\u0000missing details in very low-resolution faces and achieves better knowledge\u0000transfer. Extensive experiments on face datasets demonstrate that our approach\u0000enhances the recognition accuracy of very low-resolution faces, showcasing its\u0000effectiveness and adaptability.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Offloading and Enhancement for Low-Light Video Analytics on Mobile Devices 移动设备低照度视频分析的自适应卸载和增强
arXiv - CS - Multimedia Pub Date : 2024-09-09 DOI: arxiv-2409.05297
Yuanyi He, Peng Yang, Tian Qin, Jiawei Hou, Ning Zhang
{"title":"Adaptive Offloading and Enhancement for Low-Light Video Analytics on Mobile Devices","authors":"Yuanyi He, Peng Yang, Tian Qin, Jiawei Hou, Ning Zhang","doi":"arxiv-2409.05297","DOIUrl":"https://doi.org/arxiv-2409.05297","url":null,"abstract":"In this paper, we explore adaptive offloading and enhancement strategies for\u0000video analytics tasks on computing-constrained mobile devices in low-light\u0000conditions. We observe that the accuracy of low-light video analytics varies\u0000from different enhancement algorithms. The root cause could be the disparities\u0000in the effectiveness of enhancement algorithms for feature extraction in\u0000analytic models. Specifically, the difference in class activation maps (CAMs)\u0000between enhanced and low-light frames demonstrates a positive correlation with\u0000video analytics accuracy. Motivated by such observations, a novel enhancement\u0000quality assessment method is proposed on CAMs to evaluate the effectiveness of\u0000different enhancement algorithms for low-light videos. Then, we design a\u0000multi-edge system, which adaptively offloads and enhances low-light video\u0000analytics tasks from mobile devices. To achieve the trade-off between the\u0000enhancement quality and the latency for all system-served mobile devices, we\u0000propose a genetic-based scheduling algorithm, which can find a near-optimal\u0000solution in a reasonable time to meet the latency requirement. Thereby, the\u0000offloading strategies and the enhancement algorithms are properly selected\u0000under the condition of limited end-edge bandwidth and edge computation\u0000resources. Simulation experiments demonstrate the superiority of the proposed\u0000system, improving accuracy up to 20.83% compared to existing benchmarks.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"75 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信