从听觉到视觉：用声景图生成人工智能将听觉和视觉场所感知联系起来

IF 7.1 1区地球科学 Q1 ENVIRONMENTAL STUDIES

Computers Environment and Urban Systems Pub Date : 2024-05-01 DOI:10.1016/j.compenvurbsys.2024.102122

Yonggai Zhuang , Yuhao Kang , Teng Fei , Meng Bian , Yunyan Du

{"title":"从听觉到视觉：用声景图生成人工智能将听觉和视觉场所感知联系起来","authors":"Yonggai Zhuang , Yuhao Kang , Teng Fei , Meng Bian , Yunyan Du","doi":"10.1016/j.compenvurbsys.2024.102122","DOIUrl":null,"url":null,"abstract":"<div><p>People experience the world through multiple senses simultaneously, contributing to our sense of place. Prior quantitative geography studies have mostly emphasized human visual perceptions, neglecting human auditory perceptions at place due to the challenges in characterizing the acoustic environment vividly. Also, few studies have synthesized the two-dimensional (auditory and visual) perceptions in understanding human sense of place. To bridge these gaps, we propose a Soundscape-to-Image Diffusion model, a generative Artificial Intelligence (AI) model supported by Large Language Models (LLMs), aiming to visualize soundscapes through the generation of street view images. By creating audio-image pairs, acoustic environments are first represented as high-dimensional semantic audio vectors. Our proposed Soundscape-to-Image Diffusion model, which contains a Low-Resolution Diffusion Model and a Super-Resolution Diffusion Model, can then translate those semantic audio vectors into visual representations of place effectively. We evaluated our proposed model by using both machine-based and human-centered approaches. We proved that the generated street view images align with our common perceptions, and accurately create several key street elements of the original soundscapes. It also demonstrates that soundscapes provide sufficient visual information places. This study stands at the forefront of the intersection between generative AI and human geography, demonstrating how human multi-sensory experiences can be linked. We aim to enrich geospatial data science and AI studies with human experiences. It has the potential to inform multiple domains such as human geography, environmental psychology, and urban design and planning, as well as advancing our knowledge of human-environment relationships.</p></div>","PeriodicalId":48241,"journal":{"name":"Computers Environment and Urban Systems","volume":"110 ","pages":"Article 102122"},"PeriodicalIF":7.1000,"publicationDate":"2024-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From hearing to seeing: Linking auditory and visual place perceptions with soundscape-to-image generative artificial intelligence\",\"authors\":\"Yonggai Zhuang , Yuhao Kang , Teng Fei , Meng Bian , Yunyan Du\",\"doi\":\"10.1016/j.compenvurbsys.2024.102122\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>People experience the world through multiple senses simultaneously, contributing to our sense of place. Prior quantitative geography studies have mostly emphasized human visual perceptions, neglecting human auditory perceptions at place due to the challenges in characterizing the acoustic environment vividly. Also, few studies have synthesized the two-dimensional (auditory and visual) perceptions in understanding human sense of place. To bridge these gaps, we propose a Soundscape-to-Image Diffusion model, a generative Artificial Intelligence (AI) model supported by Large Language Models (LLMs), aiming to visualize soundscapes through the generation of street view images. By creating audio-image pairs, acoustic environments are first represented as high-dimensional semantic audio vectors. Our proposed Soundscape-to-Image Diffusion model, which contains a Low-Resolution Diffusion Model and a Super-Resolution Diffusion Model, can then translate those semantic audio vectors into visual representations of place effectively. We evaluated our proposed model by using both machine-based and human-centered approaches. We proved that the generated street view images align with our common perceptions, and accurately create several key street elements of the original soundscapes. It also demonstrates that soundscapes provide sufficient visual information places. This study stands at the forefront of the intersection between generative AI and human geography, demonstrating how human multi-sensory experiences can be linked. We aim to enrich geospatial data science and AI studies with human experiences. It has the potential to inform multiple domains such as human geography, environmental psychology, and urban design and planning, as well as advancing our knowledge of human-environment relationships.</p></div>\",\"PeriodicalId\":48241,\"journal\":{\"name\":\"Computers Environment and Urban Systems\",\"volume\":\"110 \",\"pages\":\"Article 102122\"},\"PeriodicalIF\":7.1000,\"publicationDate\":\"2024-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers Environment and Urban Systems\",\"FirstCategoryId\":\"89\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0198971524000516\",\"RegionNum\":1,\"RegionCategory\":\"地球科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENVIRONMENTAL STUDIES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers Environment and Urban Systems","FirstCategoryId":"89","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0198971524000516","RegionNum":1,"RegionCategory":"地球科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL STUDIES","Score":null,"Total":0}

引用次数: 0

摘要

人们通过多种感官同时体验世界，从而形成我们的地方感。之前的定量地理研究大多强调人类的视觉感知，而忽视了人类在地方的听觉感知，原因是很难生动地描述声学环境的特征。此外，很少有研究综合二维（听觉和视觉）感知来理解人类的地方感。为了弥补这些差距，我们提出了一个 "声景到图像扩散模型"，这是一个由大型语言模型（LLM）支持的生成式人工智能（AI）模型，旨在通过生成街景图像将声景可视化。通过创建音频图像对，声学环境首先被表示为高维语义音频向量。我们提出的声景到图像扩散模型包含一个低分辨率扩散模型和一个超分辨率扩散模型，可以有效地将这些语义音频向量转化为地点的视觉表征。我们采用基于机器和以人为本的方法对我们提出的模型进行了评估。我们证明，生成的街景图像与我们的共同感知一致，并准确地创建了原始声音景观的几个关键街道元素。这也证明了声音景观能提供足够的视觉信息。这项研究站在了生成式人工智能与人文地理学交叉领域的前沿，展示了人类的多感官体验是如何联系在一起的。我们的目标是用人类体验丰富地理空间数据科学和人工智能研究。它有可能为人文地理学、环境心理学、城市设计与规划等多个领域提供信息，并推进我们对人类与环境关系的认识。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

From hearing to seeing: Linking auditory and visual place perceptions with soundscape-to-image generative artificial intelligence

People experience the world through multiple senses simultaneously, contributing to our sense of place. Prior quantitative geography studies have mostly emphasized human visual perceptions, neglecting human auditory perceptions at place due to the challenges in characterizing the acoustic environment vividly. Also, few studies have synthesized the two-dimensional (auditory and visual) perceptions in understanding human sense of place. To bridge these gaps, we propose a Soundscape-to-Image Diffusion model, a generative Artificial Intelligence (AI) model supported by Large Language Models (LLMs), aiming to visualize soundscapes through the generation of street view images. By creating audio-image pairs, acoustic environments are first represented as high-dimensional semantic audio vectors. Our proposed Soundscape-to-Image Diffusion model, which contains a Low-Resolution Diffusion Model and a Super-Resolution Diffusion Model, can then translate those semantic audio vectors into visual representations of place effectively. We evaluated our proposed model by using both machine-based and human-centered approaches. We proved that the generated street view images align with our common perceptions, and accurately create several key street elements of the original soundscapes. It also demonstrates that soundscapes provide sufficient visual information places. This study stands at the forefront of the intersection between generative AI and human geography, demonstrating how human multi-sensory experiences can be linked. We aim to enrich geospatial data science and AI studies with human experiences. It has the potential to inform multiple domains such as human geography, environmental psychology, and urban design and planning, as well as advancing our knowledge of human-environment relationships.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers Environment and Urban Systems Multiple-

CiteScore

13.30

自引率

7.40%

发文量

111

审稿时长

32 days

期刊介绍： Computers, Environment and Urban Systemsis an interdisciplinary journal publishing cutting-edge and innovative computer-based research on environmental and urban systems, that privileges the geospatial perspective. The journal welcomes original high quality scholarship of a theoretical, applied or technological nature, and provides a stimulating presentation of perspectives, research developments, overviews of important new technologies and uses of major computational, information-based, and visualization innovations. Applied and theoretical contributions demonstrate the scope of computer-based analysis fostering a better understanding of environmental and urban systems, their spatial scope and their dynamics.