为什么约翰尼不能提示:非ai专家如何尝试(和失败)设计LLM提示

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems Pub Date : 2023-04-19 DOI:10.1145/3544548.3581388

J. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, Qiang Yang

{"title":"为什么约翰尼不能提示:非ai专家如何尝试(和失败)设计LLM提示","authors":"J. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, Qiang Yang","doi":"10.1145/3544548.3581388","DOIUrl":null,"url":null,"abstract":"Pre-trained large language models (“LLMs”) like GPT-3 can engage in fluent, multi-turn instruction-taking out-of-the-box, making them attractive materials for designing natural language interactions. Using natural language to steer LLM outputs (“prompting”) has emerged as an important design technique potentially accessible to non-AI-experts. Crafting effective prompts can be challenging, however, and prompt-based interactions are brittle. Here, we explore whether non-AI-experts can successfully engage in “end-user prompt engineering” using a design probe—a prototype LLM-based chatbot design tool supporting development and systematic evaluation of prompting strategies. Ultimately, our probe participants explored prompt designs opportunistically, not systematically, and struggled in ways echoing end-user programming systems and interactive machine learning systems. Expectations stemming from human-to-human instructional experiences, and a tendency to overgeneralize, were barriers to effective prompt design. These findings have implications for non-AI-expert-facing LLM-based tool design and for improving LLM-and-prompt literacy among programmers and the public, and present opportunities for further research.","PeriodicalId":314098,"journal":{"name":"Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"83","resultStr":"{\"title\":\"Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts\",\"authors\":\"J. Zamfirescu-Pereira, Richmond Y. Wong, Bjoern Hartmann, Qiang Yang\",\"doi\":\"10.1145/3544548.3581388\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pre-trained large language models (“LLMs”) like GPT-3 can engage in fluent, multi-turn instruction-taking out-of-the-box, making them attractive materials for designing natural language interactions. Using natural language to steer LLM outputs (“prompting”) has emerged as an important design technique potentially accessible to non-AI-experts. Crafting effective prompts can be challenging, however, and prompt-based interactions are brittle. Here, we explore whether non-AI-experts can successfully engage in “end-user prompt engineering” using a design probe—a prototype LLM-based chatbot design tool supporting development and systematic evaluation of prompting strategies. Ultimately, our probe participants explored prompt designs opportunistically, not systematically, and struggled in ways echoing end-user programming systems and interactive machine learning systems. Expectations stemming from human-to-human instructional experiences, and a tendency to overgeneralize, were barriers to effective prompt design. These findings have implications for non-AI-expert-facing LLM-based tool design and for improving LLM-and-prompt literacy among programmers and the public, and present opportunities for further research.\",\"PeriodicalId\":314098,\"journal\":{\"name\":\"Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"83\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3544548.3581388\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3544548.3581388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 83

摘要

像GPT-3这样的预训练大型语言模型(“llm”)可以进行流畅的、多回合的开箱即用的教学，使它们成为设计自然语言交互的有吸引力的材料。使用自然语言来引导法学硕士输出(“提示”)已经成为一种重要的设计技术，非人工智能专家也可以使用。然而，制作有效的提示是具有挑战性的，基于提示的交互是脆弱的。在这里，我们探讨了非人工智能专家是否可以使用设计探针成功地参与“终端用户提示工程”。设计探针是一种基于llm的原型聊天机器人设计工具，支持提示策略的开发和系统评估。最终，我们的调查参与者机会主义地而不是系统地探索了提示设计，并以与最终用户编程系统和交互式机器学习系统相呼应的方式进行了挣扎。源于人与人之间教学经验的期望，以及过度一般化的倾向，是有效提示设计的障碍。这些发现对非面向人工智能专家的基于法学硕士的工具设计和提高程序员和公众的法学硕士素养具有启示意义，并为进一步研究提供了机会。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts

Pre-trained large language models (“LLMs”) like GPT-3 can engage in fluent, multi-turn instruction-taking out-of-the-box, making them attractive materials for designing natural language interactions. Using natural language to steer LLM outputs (“prompting”) has emerged as an important design technique potentially accessible to non-AI-experts. Crafting effective prompts can be challenging, however, and prompt-based interactions are brittle. Here, we explore whether non-AI-experts can successfully engage in “end-user prompt engineering” using a design probe—a prototype LLM-based chatbot design tool supporting development and systematic evaluation of prompting strategies. Ultimately, our probe participants explored prompt designs opportunistically, not systematically, and struggled in ways echoing end-user programming systems and interactive machine learning systems. Expectations stemming from human-to-human instructional experiences, and a tendency to overgeneralize, were barriers to effective prompt design. These findings have implications for non-AI-expert-facing LLM-based tool design and for improving LLM-and-prompt literacy among programmers and the public, and present opportunities for further research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems

自引率

0.00%

发文量