Displays最新文献

筛选
英文 中文
HyperTuneFaaS: A serverless framework for hyperparameter tuning in image processing models HyperTuneFaaS:用于图像处理模型超参数调优的无服务器框架
IF 3.7 2区 工程技术
Displays Pub Date : 2025-02-08 DOI: 10.1016/j.displa.2025.102990
Jiantao Zhang , Bojun Ren , Yicheng Fu , Rongbo Ma , Zinuo Cai , Weishan Zhang , Ruhui Ma , Jinshan Sun
{"title":"HyperTuneFaaS: A serverless framework for hyperparameter tuning in image processing models","authors":"Jiantao Zhang ,&nbsp;Bojun Ren ,&nbsp;Yicheng Fu ,&nbsp;Rongbo Ma ,&nbsp;Zinuo Cai ,&nbsp;Weishan Zhang ,&nbsp;Ruhui Ma ,&nbsp;Jinshan Sun","doi":"10.1016/j.displa.2025.102990","DOIUrl":"10.1016/j.displa.2025.102990","url":null,"abstract":"<div><div>Deep learning has achieved remarkable success across various fields, especially in image processing tasks like denoising, sharpening, and contrast enhancement. However, the performance of these models heavily relies on the careful selection of hyperparameters, which can be a computationally intensive and time-consuming task. Cloud-based hyperparameter search methods have gained popularity due to their ability to address the inefficiencies of single-machine training and the underutilization of computing resources. Nevertheless, these methods still encounters substantial challenges, including high computational demands, parallelism requirements, and prolonged search time.</div><div>In this study, we propose <span>HyperTuneFaaS</span>, a Function as a Service (FaaS)-based hyperparameter search framework that leverages distributed computing and asynchronous processing to tackle the issues encountered in hyperparameter search. By fully exploiting the parallelism offered by serverless computing, <span>HyperTuneFaaS</span> minimizes the overhead typically associated with model training on serverless platforms. Additionally, we enhance the traditional genetic algorithm, a powerful metaheuristic method, to improve its efficiency and integrate it with the framework to enhance the efficiency of hyperparameter tuning. Experimental results demonstrate significant improvements in efficiency and cost savings with the combination of the FaaS-based hyperparameter tuning framework and the optimized genetic algorithm, making <span>HyperTuneFaaS</span> a powerful tool for optimizing image processing models and achieving superior image quality.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102990"},"PeriodicalIF":3.7,"publicationDate":"2025-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143395684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAMR: Symmetric masked multimodal modeling for general multi-modal 3D motion retrieval SAMR:用于一般多模态三维运动检索的对称屏蔽多模态建模
IF 3.7 2区 工程技术
Displays Pub Date : 2025-02-07 DOI: 10.1016/j.displa.2025.102987
Yunhao Li , Sijing Wu , Yucheng Zhu , Wei Sun , Zhichao Zhang , Song Song , Guangtao Zhai
{"title":"SAMR: Symmetric masked multimodal modeling for general multi-modal 3D motion retrieval","authors":"Yunhao Li ,&nbsp;Sijing Wu ,&nbsp;Yucheng Zhu ,&nbsp;Wei Sun ,&nbsp;Zhichao Zhang ,&nbsp;Song Song ,&nbsp;Guangtao Zhai","doi":"10.1016/j.displa.2025.102987","DOIUrl":"10.1016/j.displa.2025.102987","url":null,"abstract":"<div><div>Recently, text to 3d human motion retrieval has been a hot topic in computer vision. However, current existing methods utilize contrastive learning and motion reconstruction as the main proxy task. Although these methods achieve great performance, such simple strategies may cause the network to lose temporal motion information and distort the text feature, which may injury motion retrieval results. Meanwhile, current motion retrieval methods ignore the post processing for predicted similarity matrices. Considering these two problems, in this work, we present <strong>SAMR</strong>, an encoder–decoder based transformer framework with symmetric masked multi-modal information modeling. Concretely, we remove the KL divergence loss and reconstruct the motion and text inputs jointly. To enhance the robustness of our retrieval model, we also propose a mask modeling strategy. Our SAMR performs joint masking on both image and text inputs, during training, for each modality, we simultaneously reconstruct the original input modality and masked modality to stabilize the training. After training, we also utilize the dual softmax optimization method to improve the final performance. We conduct extensive experiments on both text-to-motion dataset and speech-to-motion dataset. The experimental results demonstrate that SAMR achieves the state-of-the-art performance in various cross-modal motion retrieval tasks including speech to motion and text to motion, showing great potential to serve as a general foundation motion retrieval framework.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102987"},"PeriodicalIF":3.7,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143421118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Refactored Maskformer: Refactor localization and classification for improved universal image segmentation 重构Maskformer:重构定位和分类,以改进通用图像分割
IF 3.7 2区 工程技术
Displays Pub Date : 2025-02-07 DOI: 10.1016/j.displa.2025.102981
Xingliang Zhu , Xiaoyu Dong , Weiwei Yu , Huawei Liang , Bin Kong
{"title":"Refactored Maskformer: Refactor localization and classification for improved universal image segmentation","authors":"Xingliang Zhu ,&nbsp;Xiaoyu Dong ,&nbsp;Weiwei Yu ,&nbsp;Huawei Liang ,&nbsp;Bin Kong","doi":"10.1016/j.displa.2025.102981","DOIUrl":"10.1016/j.displa.2025.102981","url":null,"abstract":"<div><div>The introduction of DEtection TRansformers (DETR) has marked a new era for universal image segmentation in computer vision. However, methods that use shared queries and attention layers for simultaneous localization and classification often encounter inter-task optimization conflicts. In this paper, we propose a novel architecture called <strong>Refactored Maskformer</strong>, which builds upon the Mask2Former through two key modifications: Decoupler and Reconciler. The Decoupler separates decoding pathways for localization and classification, enabling task-specific query and attention layer learning. Additionally, it employs a unified masked attention to confine the regions of interest for both tasks within the same object, along with a query Interactive-Attention layer to enhance task interaction. In the Reconciler module, we mitigate the optimization conflict issue by introducing localization supervised matching cost and task alignment learning loss functions. These functions aim to encourage high localization accuracy samples, while reducing the impact of high classification confidence samples with low localization accuracy on network optimization. Extensive experimental results demonstrate that our Refactored Maskformer achieves performance comparable to existing state-of-the-art models across all unified tasks, surpassing our baseline network, Mask2former, with 1.2% PQ on COCO, 6.8% AP on ADE20k, and 1.1% mIoU on Cityscapes. The code is available at <span><span>https://github.com/leonzx7/Refactored-Maskformer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102981"},"PeriodicalIF":3.7,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143376929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Chinese sign language recognition and translation with virtual digital human dataset 基于虚拟数字人数据集的汉语手语识别与翻译
IF 3.7 2区 工程技术
Displays Pub Date : 2025-02-05 DOI: 10.1016/j.displa.2025.102989
Hao Zhang , Zenghui Liu , Zhihang Yan , Songrui Guo , ChunMing Gao , Xiyao Liu
{"title":"Chinese sign language recognition and translation with virtual digital human dataset","authors":"Hao Zhang ,&nbsp;Zenghui Liu ,&nbsp;Zhihang Yan ,&nbsp;Songrui Guo ,&nbsp;ChunMing Gao ,&nbsp;Xiyao Liu","doi":"10.1016/j.displa.2025.102989","DOIUrl":"10.1016/j.displa.2025.102989","url":null,"abstract":"<div><div>Sign language recognition and translation are crucial for communication among individuals who are deaf or mute. Deep learning methods have advanced sign language tasks, surpassing traditional techniques in accuracy through autonomous data learning. However, the scarcity of annotated sign language datasets limits the potential of these methods in practical applications. To address this, we propose using digital twin technology to build a virtual human system at the word level, which can automatically generate sign language sentences, eliminating human input, and creating numerous sign language data pairs for efficient virtual-to-real transfer. To enhance the generalization of virtual sign language data and mitigate the bias between virtual and real data, we designed novel embedding representations and augmentation methods based on skeletal information. We also established a multi-task learning framework and a pose attention module for sign language recognition and translation. Our experiments confirm the efficacy of our approach, yielding state-of-the-art results in recognition and translation.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102989"},"PeriodicalIF":3.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143372668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UnSP: Improving event-to-image reconstruction with uncertainty guided self-paced learning UnSP:利用不确定性引导的自定进度学习改进事件到图像的重建
IF 3.7 2区 工程技术
Displays Pub Date : 2025-02-05 DOI: 10.1016/j.displa.2025.102985
Jianye Yang, Xiaolin Zhang, Shaofan Wang, Yanfeng Sun, Baocai Yin
{"title":"UnSP: Improving event-to-image reconstruction with uncertainty guided self-paced learning","authors":"Jianye Yang,&nbsp;Xiaolin Zhang,&nbsp;Shaofan Wang,&nbsp;Yanfeng Sun,&nbsp;Baocai Yin","doi":"10.1016/j.displa.2025.102985","DOIUrl":"10.1016/j.displa.2025.102985","url":null,"abstract":"<div><div>Asynchronous events, produced by event cameras, possess several advantages against traditional cameras: high temporal resolution, dynamic range, etc. Traditional event-to-image reconstruction methods adopt computer vision techniques and establish a correspondence between event streams and the reconstruction image. Despite great successes, those methods ignore filtering the non-confident event frames, and hence produce unsatisfactory reconstruction results. In this paper, we propose a plug-and-play model by using uncertainty guided self-paced learning (dubbed UnSP) for finetuning the event-to-image reconstruction process. The key observation of UnSP is that, different event streams, though corresponding to a common reconstruction image, serve as different functions during the training process of event-to-image reconstruction networks (e.g., shape, intensity, details are extracted in different training phases of networks). Typically, UnSP proposes an uncertainty modeling for each event frame based on its reconstruction errors induced by three metrics, and then filters confident event frames in a self-paced learning fashion. Experiments on the six subsets of the Event Camera Dataset shows that UnSP can be incorporated with any event-to-image reconstruction networks seamlessly and achieve significant improvement in both quantitative and qualitative results. In summary, the uncertainty-driven adaptive sampling and self-learning mechanisms of UnSP, coupled with its plug-and-play capability, enhance the robustness, efficiency, and versatility for event-to-image reconstruction. Code is available at <span><span>https://github.com/wangsfan/UnSP</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102985"},"PeriodicalIF":3.7,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143350303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring user engagement by diagnosing visual guides in onboarding screens with linear regression and XGBoost 通过使用线性回归和XGBoost诊断登录屏幕中的视觉指南来探索用户参与度
IF 3.7 2区 工程技术
Displays Pub Date : 2025-02-04 DOI: 10.1016/j.displa.2025.102975
Lik-Hang Lee , Kit-Yung Lam , Pan Hui
{"title":"Exploring user engagement by diagnosing visual guides in onboarding screens with linear regression and XGBoost","authors":"Lik-Hang Lee ,&nbsp;Kit-Yung Lam ,&nbsp;Pan Hui","doi":"10.1016/j.displa.2025.102975","DOIUrl":"10.1016/j.displa.2025.102975","url":null,"abstract":"<div><div>Onboarding screens are regarded as the first service point when a user experiences a new application, which presents the key functions and features of such an application. The User Interface (UI) walkthroughs, product tours, and tooltips are three common categories of visual guides (VGs) in the onboarding screens for users to get familiar with the app. It is important to offer first-time users appropriate VG to explain the key functions in the app interface. In this paper, we study the effective VG elements that help users adapt to the app UI. We first crowd-sourced user engagement (UE) assessments, and collected 7,080 responses reflecting user cognitive preferences to 114 collected apps containing 1,194 visual guides. Our analytics of the responses shows the improvement of VG following the analysis in three perspectives (types of UI elements, semantic, and spatial analysis). Accordingly, the proposed Parallel Boosted Regression Trees resulted in a highly accurate rating (85%) of the VGs into a three-level UE score, providing app designers useful hints on designing VGs for high levels of user retention and user engagement.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102975"},"PeriodicalIF":3.7,"publicationDate":"2025-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143376537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-precision 3D teeth reconstruction based on five-view intra-oral photos 基于五视图口腔内照片的高精度三维牙齿重建
IF 3.7 2区 工程技术
Displays Pub Date : 2025-02-03 DOI: 10.1016/j.displa.2025.102988
Yanqi Wang , Xinyue Sun , Jun Jia , Zuolin Jin , Yanning Ma
{"title":"High-precision 3D teeth reconstruction based on five-view intra-oral photos","authors":"Yanqi Wang ,&nbsp;Xinyue Sun ,&nbsp;Jun Jia ,&nbsp;Zuolin Jin ,&nbsp;Yanning Ma","doi":"10.1016/j.displa.2025.102988","DOIUrl":"10.1016/j.displa.2025.102988","url":null,"abstract":"<div><div>Reconstructing 3D dental model from multi-view intra-oral photos plays an important role in the process of orthodontic treatment. Compared with cone-beam computed tomography (CBCT) or intra-oral scanner (IOS), 3D reconstruction provides a low-cost solution to monitor teeth, which does not require professional devices and operations. This paper introduces an enhanced fully automated framework for 3D tooth reconstruction using five-view intraoral photos, capable of automatically generating the shapes, alignments, and occlusal relationships of both upper and lower teeth. The proposed framework includes three phases. Initially, a parametric dental model based on a statistical shape is built to represent the shape and position of each tooth. Next, in the feature extraction stage, the segment anything model (SAM) is used to accurately detect the tooth boundaries from intra-oral photos, and the single-view depth estimation approach known as Depth Anything is used to obtain depth information. And grayscale conversion and normalization processing are performed on the photos to extract luminance information separately in order to deal with the problem of tooth surface reflection. Finally, an iterative reconstruction process in two stages is implemented: the first stage involves alternating between searching for point correspondences and optimizing a composite loss function to align the parameterized tooth model with the predicted contours of teeth; in the second stage, image depth and lightness information are utilized for additional refinement. Extensive experiments are conducted to validate the proposed methods. Compared with existing methods, the proposed method not only qualitatively outperforms in misaligned, missing, or complex occlusion cases, but also quantificationally achieve good RMSD and Dice.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102988"},"PeriodicalIF":3.7,"publicationDate":"2025-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stacked neural filtering network for reliable NEV monitoring 可靠的新能源汽车监测的堆叠神经滤波网络
IF 3.7 2区 工程技术
Displays Pub Date : 2025-02-02 DOI: 10.1016/j.displa.2025.102976
Yingzi Wang , Ce Yu , Xianglei Zhu , Hongcan Gao , Jie Shang
{"title":"Stacked neural filtering network for reliable NEV monitoring","authors":"Yingzi Wang ,&nbsp;Ce Yu ,&nbsp;Xianglei Zhu ,&nbsp;Hongcan Gao ,&nbsp;Jie Shang","doi":"10.1016/j.displa.2025.102976","DOIUrl":"10.1016/j.displa.2025.102976","url":null,"abstract":"<div><div>Reliable monitoring of new energy vehicles (NEVs) is crucial for ensuring traffic safety and energy efficiency. However, traditional Transformer-based methods struggle with quadratic computational complexity and sensitivity to noise due to the self-attention mechanism, leading to efficiency and accuracy limitations in real-time applications. To address these issues, we propose the Stacked Neural Filtering Network (SNFN), which replaces self-attention with a learnable filter block that operates in the frequency domain, reducing complexity to logarithmic-linear levels. This novel design improves computational efficiency, mitigates overfitting, and enhances noise robustness. Experimental evaluations on two real-world NEV datasets demonstrate that SNFN consistently achieves superior accuracy and efficiency compared to traditional methods, making it a reliable solution for real-time NEV monitoring.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102976"},"PeriodicalIF":3.7,"publicationDate":"2025-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-supervised pre-training based multi-task network for thyroid-associated ophthalmopathy classification 基于半监督预训练的甲状腺相关眼病分类多任务网络
IF 3.7 2区 工程技术
Displays Pub Date : 2025-02-01 DOI: 10.1016/j.displa.2025.102974
MingFei Yang , TianFeng Zhang , XueFei Song , YuZhong Zhang , Lei Zhou
{"title":"Semi-supervised pre-training based multi-task network for thyroid-associated ophthalmopathy classification","authors":"MingFei Yang ,&nbsp;TianFeng Zhang ,&nbsp;XueFei Song ,&nbsp;YuZhong Zhang ,&nbsp;Lei Zhou","doi":"10.1016/j.displa.2025.102974","DOIUrl":"10.1016/j.displa.2025.102974","url":null,"abstract":"<div><div>Thyroid-associated ophthalmopathy (TAO) is a blinding autoimmune disorder, and early diagnosis is crucial in preventing vision loss. Orbital CT imaging has emerged as a valuable tool for diagnosing and screening TAO. Radiomic is currently the most dominant technique for TAO diagnosis, however it is costly due to the need for manual image labeling by medical professionals. Convolutional Neural Network (CNN) is another promising technique for TAO diagnosis. However, the performance of CNN based classification may degrade due to the limited size of collected data or the complexity of designed model. Utilizing pretraining model is a crucial technique for boosting the performance of CNN based TAO classification. Therefore, a novel semi-supervised pretraining based multi-task network for TAO classification is proposed in this paper. Firstly, a multi-task network is designed, which consists of an encoder, a classification branch and two segmentation decoder. Then, the multi-task network is pretrained by minimizing the prediction difference between two segmentation decoders through a semi-supervised way. In this way, the pseudo voxel-level supervision can be generated for the unlabeled images. Finally, the encoder and one light-weighted decoder can be initialized by the pretrained weights, and then they are jointly optimized for TAO classification with the classification branch through multi-task learning. Our proposed network model was comprehensively evaluated on a private dataset which consists of 982 orbital CT scans for TAO diagnosis. We also tested the classification generalization performance using an external dataset. The experimental results demonstrate that our model significantly improves the classification performance when compared with current SOTA methods. The source code is publically available at <span><span>https://github.com/VLAD-KONATA/TAO_CT</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102974"},"PeriodicalIF":3.7,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143162410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual discriminator GANs with multi-focus label matching for image-aware layout generation 具有多焦点标签匹配的双鉴别gan用于图像感知布局生成
IF 3.7 2区 工程技术
Displays Pub Date : 2025-01-31 DOI: 10.1016/j.displa.2025.102970
Chenchen Xu , Kaixin Han , Min Zhou , Weiwei Xu
{"title":"Dual discriminator GANs with multi-focus label matching for image-aware layout generation","authors":"Chenchen Xu ,&nbsp;Kaixin Han ,&nbsp;Min Zhou ,&nbsp;Weiwei Xu","doi":"10.1016/j.displa.2025.102970","DOIUrl":"10.1016/j.displa.2025.102970","url":null,"abstract":"<div><div>Image-aware layout generation involves arranging graphic elements, including logo, text, underlay, and embellishment, at the appropriate position on the canvas, constituting a fundamental step in poster design. This task requires considering both the relationships among elements and the interaction between elements and images. However, existing layout generation models struggle to simultaneously satisfy explicit aesthetic principles like alignment and non-overlapping, along with implicit aesthetic principles related to the harmonious composition of images and elements. To overcome these challenges, this paper designs a GAN with dual discriminators, called DD-GAN, to generate graphic layouts according to image contents. In addition, we introduce a multi-focus label matching method to provide richer supervision and optimize model training. The incorporation of multi-focus label matching not only accelerates convergence during training but also enables the model to better capture both explicit and implicit aesthetic principles in image-aware layout generation. Quantitative and qualitative evaluations consistently demonstrate that DD-GAN, coupled with multi-focus label matching, achieves state-of-the-art performance, producing high-quality image-aware graphic layouts for advertising posters.</div></div>","PeriodicalId":50570,"journal":{"name":"Displays","volume":"87 ","pages":"Article 102970"},"PeriodicalIF":3.7,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143281163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信