Conversation Disentanglement As-a-Service

E. Riggio, Marco Raglianti, Michele Lanza
{"title":"Conversation Disentanglement As-a-Service","authors":"E. Riggio, Marco Raglianti, Michele Lanza","doi":"10.1109/ICPC58990.2023.00018","DOIUrl":null,"url":null,"abstract":"Modern instant messaging applications (e.g., Gitter, Slack, Discord) provide users with real-time communication means. Developers use them for collaborative development, to ask for code reviews, and to have software-related discussions. In short, a (potential) treasure trove for program comprehension. However, as with any high-throughput \"chat application\", messages interleave, leading to concurrent conversations. Associating messages to conversations is called conversation disentanglement, a useful and necessary pre-processing step to analyze datasets of instant messages. Although various conversation disentanglement algorithms have been proposed, it is cumbersome to set up proper execution environments and hard to ensure input data format consistency, calling for better practices and tool support.We present CODI, a RESTful API micro-service and web interface for conversation disentanglement. It provides an easy way to disentangle conversation transcripts with pre-trained models or to train new ones on custom datasets, features, and hyper-parameters. CODI achieves state-of-the-art performances on transcripts of IRC, Slack, and Discord conversations. We show how CODI can provide a significant improvement to reusability (and replicability) of research results, while reducing the efforts and potential mistakes due to configuration, setup, and execution.CODI’s source code: https://github.com/USIREVEAL/CODI","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC58990.2023.00018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Modern instant messaging applications (e.g., Gitter, Slack, Discord) provide users with real-time communication means. Developers use them for collaborative development, to ask for code reviews, and to have software-related discussions. In short, a (potential) treasure trove for program comprehension. However, as with any high-throughput "chat application", messages interleave, leading to concurrent conversations. Associating messages to conversations is called conversation disentanglement, a useful and necessary pre-processing step to analyze datasets of instant messages. Although various conversation disentanglement algorithms have been proposed, it is cumbersome to set up proper execution environments and hard to ensure input data format consistency, calling for better practices and tool support.We present CODI, a RESTful API micro-service and web interface for conversation disentanglement. It provides an easy way to disentangle conversation transcripts with pre-trained models or to train new ones on custom datasets, features, and hyper-parameters. CODI achieves state-of-the-art performances on transcripts of IRC, Slack, and Discord conversations. We show how CODI can provide a significant improvement to reusability (and replicability) of research results, while reducing the efforts and potential mistakes due to configuration, setup, and execution.CODI’s source code: https://github.com/USIREVEAL/CODI
会话解开即服务
现代即时通讯应用程序(例如,Gitter, Slack, Discord)为用户提供实时通信手段。开发人员使用它们进行协作开发,要求代码审查,并进行与软件相关的讨论。简而言之,这是程序理解的(潜在的)宝库。然而,与任何高吞吐量的“聊天应用程序”一样,消息交错,导致并发对话。将消息与会话关联称为会话解纠缠,这是分析即时消息数据集的有用且必要的预处理步骤。尽管已经提出了各种会话解纠错算法,但设置适当的执行环境很麻烦,并且难以确保输入数据格式的一致性,需要更好的实践和工具支持。我们提出了CODI,一个RESTful API微服务和用于会话解除的web接口。它提供了一种简单的方法,可以用预先训练好的模型来解开对话记录,或者在自定义数据集、特征和超参数上训练新的模型。CODI在IRC, Slack和Discord对话的文本上实现了最先进的性能。我们将展示CODI如何显著提高研究结果的可重用性(和可复制性),同时减少由于配置、设置和执行而导致的工作量和潜在错误。CODI源代码:https://github.com/USIREVEAL/CODI
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信