UI2HTML：利用具有思维链的LLM代理将UI转换为HTML代码

IF 2 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2025-04-17 DOI:10.1007/s10515-025-00509-5

Dawei Yuan, Guocang Yang, Tao Zhang

{"title":"UI2HTML：利用具有思维链的LLM代理将UI转换为HTML代码","authors":"Dawei Yuan, Guocang Yang, Tao Zhang","doi":"10.1007/s10515-025-00509-5","DOIUrl":null,"url":null,"abstract":"<div><p>The exponential growth of the internet has led to the creation of over 1.11 billion active websites, with approximately 252,000 new sites emerging daily. This burgeoning landscape underscores a pressing need for rapid and diverse website development, particularly to support advanced functionalities like Web3 interfaces and AI-generated content platforms. Traditional methods that manually convert visual designs into functional code are not only time-consuming but also error-prone, especially challenging for non-experts. In this paper, we introduce “UI2HTML” an innovative system that harnesses the capabilities of Web Real-Time Communication and Large Language Models (LLMs) to convert website layout designs into functional user interface (UI) code. The UI2HTML system employs a sophisticated divide-and-conquer approach, augmented by Chain of Thought reasoning, to enhance the processing and accurate analysis of UI designs. It efficiently captures real-time video and audio inputs from product managers via mobile devices, utilizing advanced image processing algorithms like OpenCV to extract and categorize UI elements. This rich data, complemented by audio descriptions of UI components, is processed by backend cloud services employing Multimodal Large Language Models (MLLMs). These AI agents interpret the multimodal data to generate requirement documents and initial software architecture drafts, effectively automating the translation of webpage designs into executable code. Our comprehensive evaluation demonstrates that UI2HTML significantly outperforms existing methods in terms of visual similarity and functional accuracy through extensive testing across real-world datasets and various MLLM configurations. By offering a robust solution for the automated generation of UI code from screenshots, UI2HTML sets a new benchmark in the field, particularly beneficial in today’s fast-evolving digital environment.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UI2HTML: utilizing LLM agents with chain of thought to convert UI into HTML code\",\"authors\":\"Dawei Yuan, Guocang Yang, Tao Zhang\",\"doi\":\"10.1007/s10515-025-00509-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>The exponential growth of the internet has led to the creation of over 1.11 billion active websites, with approximately 252,000 new sites emerging daily. This burgeoning landscape underscores a pressing need for rapid and diverse website development, particularly to support advanced functionalities like Web3 interfaces and AI-generated content platforms. Traditional methods that manually convert visual designs into functional code are not only time-consuming but also error-prone, especially challenging for non-experts. In this paper, we introduce “UI2HTML” an innovative system that harnesses the capabilities of Web Real-Time Communication and Large Language Models (LLMs) to convert website layout designs into functional user interface (UI) code. The UI2HTML system employs a sophisticated divide-and-conquer approach, augmented by Chain of Thought reasoning, to enhance the processing and accurate analysis of UI designs. It efficiently captures real-time video and audio inputs from product managers via mobile devices, utilizing advanced image processing algorithms like OpenCV to extract and categorize UI elements. This rich data, complemented by audio descriptions of UI components, is processed by backend cloud services employing Multimodal Large Language Models (MLLMs). These AI agents interpret the multimodal data to generate requirement documents and initial software architecture drafts, effectively automating the translation of webpage designs into executable code. Our comprehensive evaluation demonstrates that UI2HTML significantly outperforms existing methods in terms of visual similarity and functional accuracy through extensive testing across real-world datasets and various MLLM configurations. By offering a robust solution for the automated generation of UI code from screenshots, UI2HTML sets a new benchmark in the field, particularly beneficial in today’s fast-evolving digital environment.</p></div>\",\"PeriodicalId\":55414,\"journal\":{\"name\":\"Automated Software Engineering\",\"volume\":\"32 2\",\"pages\":\"\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2025-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Automated Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://link.springer.com/article/10.1007/s10515-025-00509-5\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00509-5","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

互联网的指数级增长导致了超过11.1亿个活跃网站的创建，每天大约有25.2万个新网站出现。这一蓬勃发展的领域强调了对快速和多样化的网站开发的迫切需求，特别是支持Web3接口和人工智能生成内容平台等高级功能。手工将视觉设计转换为功能代码的传统方法不仅耗时而且容易出错，对非专业人士来说尤其具有挑战性。在本文中，我们介绍了“UI2HTML”这一创新系统，它利用网络实时通信和大型语言模型（llm）的能力，将网站布局设计转换为功能性用户界面（UI）代码。UI2HTML系统采用了一种复杂的分而治之的方法，辅以思维链推理，以增强对UI设计的处理和准确分析。它通过移动设备有效地捕获产品经理的实时视频和音频输入，利用先进的图像处理算法（如OpenCV）提取和分类UI元素。这些丰富的数据，辅以UI组件的音频描述，由使用多模态大型语言模型（Multimodal Large Language Models, mllm）的后端云服务处理。这些人工智能代理解释多模态数据以生成需求文档和初始软件架构草案，有效地自动将网页设计转换为可执行代码。通过对真实世界数据集和各种MLLM配置的广泛测试，我们的综合评估表明，UI2HTML在视觉相似性和功能准确性方面明显优于现有方法。通过为从屏幕截图自动生成UI代码提供一个强大的解决方案，UI2HTML在该领域树立了一个新的基准，在当今快速发展的数字环境中尤其有益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

UI2HTML: utilizing LLM agents with chain of thought to convert UI into HTML code

The exponential growth of the internet has led to the creation of over 1.11 billion active websites, with approximately 252,000 new sites emerging daily. This burgeoning landscape underscores a pressing need for rapid and diverse website development, particularly to support advanced functionalities like Web3 interfaces and AI-generated content platforms. Traditional methods that manually convert visual designs into functional code are not only time-consuming but also error-prone, especially challenging for non-experts. In this paper, we introduce “UI2HTML” an innovative system that harnesses the capabilities of Web Real-Time Communication and Large Language Models (LLMs) to convert website layout designs into functional user interface (UI) code. The UI2HTML system employs a sophisticated divide-and-conquer approach, augmented by Chain of Thought reasoning, to enhance the processing and accurate analysis of UI designs. It efficiently captures real-time video and audio inputs from product managers via mobile devices, utilizing advanced image processing algorithms like OpenCV to extract and categorize UI elements. This rich data, complemented by audio descriptions of UI components, is processed by backend cloud services employing Multimodal Large Language Models (MLLMs). These AI agents interpret the multimodal data to generate requirement documents and initial software architecture drafts, effectively automating the translation of webpage designs into executable code. Our comprehensive evaluation demonstrates that UI2HTML significantly outperforms existing methods in terms of visual similarity and functional accuracy through extensive testing across real-world datasets and various MLLM configurations. By offering a robust solution for the automated generation of UI code from screenshots, UI2HTML sets a new benchmark in the field, particularly beneficial in today’s fast-evolving digital environment.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.