WildVis:野外百万级聊天记录开源可视化工具

Yuntian Deng, Wenting Zhao, Jack Hessel, Xiang Ren, Claire Cardie, Yejin Choi
{"title":"WildVis:野外百万级聊天记录开源可视化工具","authors":"Yuntian Deng, Wenting Zhao, Jack Hessel, Xiang Ren, Claire Cardie, Yejin Choi","doi":"arxiv-2409.03753","DOIUrl":null,"url":null,"abstract":"The increasing availability of real-world conversation data offers exciting\nopportunities for researchers to study user-chatbot interactions. However, the\nsheer volume of this data makes manually examining individual conversations\nimpractical. To overcome this challenge, we introduce WildVis, an interactive\ntool that enables fast, versatile, and large-scale conversation analysis.\nWildVis provides search and visualization capabilities in the text and\nembedding spaces based on a list of criteria. To manage million-scale datasets,\nwe implemented optimizations including search index construction, embedding\nprecomputation and compression, and caching to ensure responsive user\ninteractions within seconds. We demonstrate WildVis's utility through three\ncase studies: facilitating chatbot misuse research, visualizing and comparing\ntopic distributions across datasets, and characterizing user-specific\nconversation patterns. WildVis is open-source and designed to be extendable,\nsupporting additional datasets and customized search and visualization\nfunctionalities.","PeriodicalId":501281,"journal":{"name":"arXiv - CS - Information Retrieval","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild\",\"authors\":\"Yuntian Deng, Wenting Zhao, Jack Hessel, Xiang Ren, Claire Cardie, Yejin Choi\",\"doi\":\"arxiv-2409.03753\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The increasing availability of real-world conversation data offers exciting\\nopportunities for researchers to study user-chatbot interactions. However, the\\nsheer volume of this data makes manually examining individual conversations\\nimpractical. To overcome this challenge, we introduce WildVis, an interactive\\ntool that enables fast, versatile, and large-scale conversation analysis.\\nWildVis provides search and visualization capabilities in the text and\\nembedding spaces based on a list of criteria. To manage million-scale datasets,\\nwe implemented optimizations including search index construction, embedding\\nprecomputation and compression, and caching to ensure responsive user\\ninteractions within seconds. We demonstrate WildVis's utility through three\\ncase studies: facilitating chatbot misuse research, visualizing and comparing\\ntopic distributions across datasets, and characterizing user-specific\\nconversation patterns. WildVis is open-source and designed to be extendable,\\nsupporting additional datasets and customized search and visualization\\nfunctionalities.\",\"PeriodicalId\":501281,\"journal\":{\"name\":\"arXiv - CS - Information Retrieval\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.03753\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.03753","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

真实世界对话数据的可用性越来越高,这为研究人员研究用户与聊天机器人的交互提供了令人兴奋的机会。然而,由于数据量巨大,手动检查单个对话并不现实。为了克服这一挑战,我们引入了 WildVis,这是一种交互式工具,可实现快速、多功能和大规模的对话分析。WildVis 可根据一系列标准在文本和嵌入空间中提供搜索和可视化功能。为了管理百万规模的数据集,我们进行了优化,包括搜索索引构建、嵌入式预计算和压缩以及缓存,以确保在数秒内响应用户交互。我们通过三个案例研究展示了 WildVis 的实用性:促进聊天机器人滥用研究、可视化和比较跨数据集的主题分布,以及描述特定用户的对话模式。WildVis 是开源的,旨在进行扩展,支持额外的数据集以及定制的搜索和可视化功能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild
The increasing availability of real-world conversation data offers exciting opportunities for researchers to study user-chatbot interactions. However, the sheer volume of this data makes manually examining individual conversations impractical. To overcome this challenge, we introduce WildVis, an interactive tool that enables fast, versatile, and large-scale conversation analysis. WildVis provides search and visualization capabilities in the text and embedding spaces based on a list of criteria. To manage million-scale datasets, we implemented optimizations including search index construction, embedding precomputation and compression, and caching to ensure responsive user interactions within seconds. We demonstrate WildVis's utility through three case studies: facilitating chatbot misuse research, visualizing and comparing topic distributions across datasets, and characterizing user-specific conversation patterns. WildVis is open-source and designed to be extendable, supporting additional datasets and customized search and visualization functionalities.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信