Fragmented Layers of Design Thinking: Limitations and Opportunities of Neural Language Model-assisted processes for Design Creativity

E. Vermisso
{"title":"Fragmented Layers of Design Thinking: Limitations and Opportunities of Neural Language Model-assisted processes for Design Creativity","authors":"E. Vermisso","doi":"10.47330/dcio.2022.mmlw2640","DOIUrl":null,"url":null,"abstract":"This paper offers insights about the otherwise limited NLM-driven methodologies, supporting an examination of design creativity following the ‘process’ approach. [Abraham 2018] Recent application of AI models which rely on natural language processing (semantic references) is increasingly popular because of their directness and ease-of-use. Neural Language Models (NLMs) like VQGAN+CLIP, DALL-E, MidJourney) offer promising results, [Rodrigues, et al. 2021] seemingly bypassing the need for expensive datasets and technical expertise. Naturally, such models are limited because they cannot capture the multimodal complexity of architectural thinking and human cognition in general [Penrose 1989]. Alternative approaches propose the combination of NLMs with other artificial neural networks (ANNs) i.e. StyleGAN; CycleGAN which are custom-trained on domain-specific data. [Bolojan, Vermisso and Yousif 2022] Architects seek to expand their agency within such AI-assisted processes by controling the input encoding, so they can subsequently convert the generated outcomes to 3D models fairly directly. Still, AI models of computer vision like NLMs and GANs offer 2-dimensional output, which requires extensive decoding into 3-dimensional format. While this may seem severely constraining, it presents a silver lining when it comes to furthering design creativity. Designers are asked to scrutinize their methods from a cognitive standpoint, because these methodologies not only encourage, but demand thorough interrogation of the design intentionality, the design decision making factors and qualification criteria. Text-to-image correlation, on which NLMs rely, and their 2-dimensional output, ensure that certain important considerations are not circumvented. Instead of obtaining a 3D model, multiple possible -fragmented- versions of it are separately implied. Often, ‘fake’ images generated by the ANNs promote contradictory inferences of space, which require further examination. The hidden opportunity within the limited format of AI models echo Neil Spiller’s comments about the advantage of drawing over animation techniques twenty years ago: “Enigma is a creative tool that allows designers to see bifurcated outcomes in their sketches and drawings; it plays on the inability of drawings to faithfully record the distinct placement and extent of architectural elements”. [Spiller 2001] Comparing animations to static drawings, Spiller praised the drawing’s ability to hold “…an imagined past and an imagined future”. ‘Reading’ these results involves the (human) disentanglement of high and low-level features and consciously allocating their corresponding qualities for curation. The process of evaluating ‘parts-to-whole’ visual relationships is noteworthy because it depends on shifting our attention away from certain features, and an unconscious binding of visual elements. [Dehaene 2014] The philosopher Alain wrote that “The art of paying attention, the great art,…supposes the art of not paying attention…the royal art”. [Dehaene 2021]. According to neuroscientists, the brain uses attention as an amplifier and selective filter, during one of the three major attention systems (Alerting; Orienting; Executive Attention). [Dehaene 2021] Orienting our attention addresses what we focus on and what we don’t. Suppressing the unwanted information, through interfering electrical waves, is useful for processing the object of attention. Considering the ANNs’ results at ‘Gestalt’ level, we can structure the AI-assisted process to ensure low-level features (composition) is retained while enhancing high-level (detail) features (Fig.1a).","PeriodicalId":129906,"journal":{"name":"Design Computation Input/Output 2022","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Design Computation Input/Output 2022","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47330/dcio.2022.mmlw2640","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper offers insights about the otherwise limited NLM-driven methodologies, supporting an examination of design creativity following the ‘process’ approach. [Abraham 2018] Recent application of AI models which rely on natural language processing (semantic references) is increasingly popular because of their directness and ease-of-use. Neural Language Models (NLMs) like VQGAN+CLIP, DALL-E, MidJourney) offer promising results, [Rodrigues, et al. 2021] seemingly bypassing the need for expensive datasets and technical expertise. Naturally, such models are limited because they cannot capture the multimodal complexity of architectural thinking and human cognition in general [Penrose 1989]. Alternative approaches propose the combination of NLMs with other artificial neural networks (ANNs) i.e. StyleGAN; CycleGAN which are custom-trained on domain-specific data. [Bolojan, Vermisso and Yousif 2022] Architects seek to expand their agency within such AI-assisted processes by controling the input encoding, so they can subsequently convert the generated outcomes to 3D models fairly directly. Still, AI models of computer vision like NLMs and GANs offer 2-dimensional output, which requires extensive decoding into 3-dimensional format. While this may seem severely constraining, it presents a silver lining when it comes to furthering design creativity. Designers are asked to scrutinize their methods from a cognitive standpoint, because these methodologies not only encourage, but demand thorough interrogation of the design intentionality, the design decision making factors and qualification criteria. Text-to-image correlation, on which NLMs rely, and their 2-dimensional output, ensure that certain important considerations are not circumvented. Instead of obtaining a 3D model, multiple possible -fragmented- versions of it are separately implied. Often, ‘fake’ images generated by the ANNs promote contradictory inferences of space, which require further examination. The hidden opportunity within the limited format of AI models echo Neil Spiller’s comments about the advantage of drawing over animation techniques twenty years ago: “Enigma is a creative tool that allows designers to see bifurcated outcomes in their sketches and drawings; it plays on the inability of drawings to faithfully record the distinct placement and extent of architectural elements”. [Spiller 2001] Comparing animations to static drawings, Spiller praised the drawing’s ability to hold “…an imagined past and an imagined future”. ‘Reading’ these results involves the (human) disentanglement of high and low-level features and consciously allocating their corresponding qualities for curation. The process of evaluating ‘parts-to-whole’ visual relationships is noteworthy because it depends on shifting our attention away from certain features, and an unconscious binding of visual elements. [Dehaene 2014] The philosopher Alain wrote that “The art of paying attention, the great art,…supposes the art of not paying attention…the royal art”. [Dehaene 2021]. According to neuroscientists, the brain uses attention as an amplifier and selective filter, during one of the three major attention systems (Alerting; Orienting; Executive Attention). [Dehaene 2021] Orienting our attention addresses what we focus on and what we don’t. Suppressing the unwanted information, through interfering electrical waves, is useful for processing the object of attention. Considering the ANNs’ results at ‘Gestalt’ level, we can structure the AI-assisted process to ensure low-level features (composition) is retained while enhancing high-level (detail) features (Fig.1a).
设计思维的碎片化层次:神经语言模型辅助设计创意过程的局限性与机遇
本文提供了关于有限的nlm驱动方法的见解,支持按照“过程”方法检查设计创造力。[Abraham 2018]最近依赖于自然语言处理(语义引用)的人工智能模型的应用越来越受欢迎,因为它们的直接性和易用性。神经语言模型(nlm)(如VQGAN+CLIP, DALL-E, MidJourney)提供了有希望的结果,[Rodrigues等,2021]似乎绕过了对昂贵的数据集和技术专业知识的需求。当然,这样的模型是有限的,因为它们不能捕捉建筑思维和人类认知的多模态复杂性[Penrose 1989]。替代方法提出将nlm与其他人工神经网络(ann)相结合,例如StyleGAN;CycleGAN是根据特定领域的数据进行定制训练的。[Bolojan, Vermisso和Yousif 2022]建筑师试图通过控制输入编码来扩大他们在人工智能辅助过程中的代理,因此他们可以随后将生成的结果相当直接地转换为3D模型。尽管如此,计算机视觉的人工智能模型,如nlm和gan提供二维输出,这需要大量解码成三维格式。虽然这似乎是严重的限制,但当涉及到进一步的设计创造力时,它提供了一线希望。设计师被要求从认知的角度审视他们的方法,因为这些方法不仅鼓励,而且要求对设计意向性、设计决策因素和资格标准进行彻底的询问。nlm所依赖的文本到图像的相关性及其二维输出确保了某些重要的考虑不会被规避。而不是获得一个三维模型,它的多个可能的碎片版本被单独暗示。通常,人工神经网络生成的“假”图像会促进对空间的矛盾推断,这需要进一步检查。AI模型的有限格式中隐藏的机会与Neil Spiller在20年前关于绘图优于动画技术的评论相呼应:“Enigma是一种创造性工具,允许设计师在草图和绘图中看到分支结果;它利用了图纸无法忠实地记录建筑元素的独特位置和范围”。[Spiller 2001]将动画与静态绘画进行比较,Spiller称赞了绘画的能力,即“……一个想象的过去和一个想象的未来”。“阅读”这些结果涉及(人类)对高级和低级特征的解开,并有意识地分配它们相应的品质进行策展。评估“部分到整体”视觉关系的过程是值得注意的,因为它依赖于将我们的注意力从某些特征转移,以及视觉元素的无意识绑定。[Dehaene 2014]哲学家阿兰写道:“专注的艺术,伟大的艺术,……假设不专注的艺术……皇家艺术”。Dehaene[2021]。根据神经科学家的说法,在三个主要的注意力系统(警觉;定向;高管的关注)。[Dehaene 2021]引导我们的注意力是指我们关注什么,不关注什么。通过干扰电波来抑制不需要的信息,对处理关注的对象是有用的。考虑到人工神经网络在“格式塔”层面的结果,我们可以构建人工智能辅助过程,以确保保留低级特征(组成),同时增强高级(细节)特征(图1a)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信