Advancing Traditional Dunhuang Regional Pattern Design with Diffusion Adapter Networks and Cross-Entropy.

IF 2.1 3区物理与天体物理 Q2 PHYSICS, MULTIDISCIPLINARY

Entropy Pub Date : 2025-05-21 DOI:10.3390/e27050546

Yihuan Tian, Tao Yu, Zuling Cheng, Sunjung Lee

{"title":"Advancing Traditional Dunhuang Regional Pattern Design with Diffusion Adapter Networks and Cross-Entropy.","authors":"Yihuan Tian, Tao Yu, Zuling Cheng, Sunjung Lee","doi":"10.3390/e27050546","DOIUrl":null,"url":null,"abstract":"<p><p>To promote the inheritance of traditional culture, a variety of emerging methods rooted in machine learning and deep learning have been introduced. Dunhuang patterns, an important part of traditional Chinese culture, are difficult to collect in large numbers due to their limited availability. However, existing text-to-image methods are computationally intensive and struggle to capture fine details and complex semantic relationships in text and images. To address these challenges, this paper proposes the Diffusion Adapter Network (DANet). It employs a lightweight adapter module to extract visual structural information, enabling the diffusion model to generate Dunhuang patterns with high accuracy, while eliminating the need for expensive fine-tuning of the original model. The attention adapter incorporates a multihead attention module (MHAM) to enhance image modality cues, allowing the model to focus more effectively on key information. A multiscale attention module (MSAM) is employed to capture features at different scales, thereby providing more precise generative guidance. In addition, an adaptive control mechanism (ACM) dynamically adjusts the guidance coefficients across feature layers to further enhance generation quality. In addition, incorporating a cross-entropy loss function enhances the model's capability in semantic understanding and the classification of Dunhuang patterns. The DANet achieves state-of-the-art (SOTA) performance on the proposed Diversified Dunhuang Patterns Dataset (DDHP). Specifically, it attains a perceptual similarity score (LPIPS) of 0.498, a graph matching score (CLIP score) of 0.533, and a feature similarity score (CLIP-I) of 0.772.</p>","PeriodicalId":11694,"journal":{"name":"Entropy","volume":"27 5","pages":""},"PeriodicalIF":2.1000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12111379/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Entropy","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.3390/e27050546","RegionNum":3,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

To promote the inheritance of traditional culture, a variety of emerging methods rooted in machine learning and deep learning have been introduced. Dunhuang patterns, an important part of traditional Chinese culture, are difficult to collect in large numbers due to their limited availability. However, existing text-to-image methods are computationally intensive and struggle to capture fine details and complex semantic relationships in text and images. To address these challenges, this paper proposes the Diffusion Adapter Network (DANet). It employs a lightweight adapter module to extract visual structural information, enabling the diffusion model to generate Dunhuang patterns with high accuracy, while eliminating the need for expensive fine-tuning of the original model. The attention adapter incorporates a multihead attention module (MHAM) to enhance image modality cues, allowing the model to focus more effectively on key information. A multiscale attention module (MSAM) is employed to capture features at different scales, thereby providing more precise generative guidance. In addition, an adaptive control mechanism (ACM) dynamically adjusts the guidance coefficients across feature layers to further enhance generation quality. In addition, incorporating a cross-entropy loss function enhances the model's capability in semantic understanding and the classification of Dunhuang patterns. The DANet achieves state-of-the-art (SOTA) performance on the proposed Diversified Dunhuang Patterns Dataset (DDHP). Specifically, it attains a perceptual similarity score (LPIPS) of 0.498, a graph matching score (CLIP score) of 0.533, and a feature similarity score (CLIP-I) of 0.772.

查看原文本刊更多论文

用扩散适配器网络和交叉熵推进传统敦煌区域格局设计。

为了促进传统文化的传承，各种植根于机器学习和深度学习的新兴方法被引入。敦煌纹样是中国传统文化的重要组成部分，由于其可获得性有限，很难大量收集。然而，现有的文本到图像方法计算量大，难以捕获文本和图像中的精细细节和复杂的语义关系。为了解决这些挑战，本文提出了扩散适配器网络（DANet）。它采用轻量级的适配器模块提取视觉结构信息，使扩散模型能够高精度地生成敦煌图案，同时无需对原始模型进行昂贵的微调。注意适配器包含一个多头注意模块（MHAM）来增强图像模态线索，使模型更有效地关注关键信息。采用多尺度注意模块（MSAM）捕获不同尺度的特征，从而提供更精确的生成指导。此外，自适应控制机制（ACM）在特征层之间动态调整引导系数，进一步提高生成质量。此外，引入交叉熵损失函数增强了模型的语义理解能力和敦煌模式的分类能力。DANet在多元敦煌模式数据集（DDHP）上实现了最先进的（SOTA）性能。具体来说，它的感知相似性得分（LPIPS）为0.498，图匹配得分（CLIP得分）为0.533，特征相似性得分（CLIP- i）为0.772。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Entropy PHYSICS, MULTIDISCIPLINARY-

CiteScore

4.90

自引率

11.10%

发文量

1580

审稿时长

21.05 days

期刊介绍： Entropy (ISSN 1099-4300), an international and interdisciplinary journal of entropy and information studies, publishes reviews, regular research papers and short notes. Our aim is to encourage scientists to publish as much as possible their theoretical and experimental details. There is no restriction on the length of the papers. If there are computation and the experiment, the details must be provided so that the results can be reproduced.