diaLogic: Interaction-Focused Speaker Diarization

R. Duke, A. Doboli
{"title":"diaLogic: Interaction-Focused Speaker Diarization","authors":"R. Duke, A. Doboli","doi":"10.1109/SysCon48628.2021.9447101","DOIUrl":null,"url":null,"abstract":"diaLogic is a user-friendly Python program which performs social interaction classification through speaker diarization. The main libraries used include Python’s PyQt5 and Keras APIs, Matplotlib, and the computational R language. Speaker diarization is achieved with high consistency due to a simple four-layer convolutional neural network (CNN) trained on the Librispeech ASR corpus. Speaker interactions are modeled through a custom R language script. The data generated by the program allows the characterization of speaker traits within social experiments. Group leaders, followers, and level of speaker contribution can be characterized. These traits can be used to determine overall group performance, as well as the performance of individuals. The interface is designed to be simplistic and intuitive, which allows easy operation by nonengineers. This design consideration allows program operation with minimal training for users in the social sciences disciplines. The program is designed with a modular backend, which is invisible to the user of the program. The backend allows easy expansion through modular algorithms. For future iterations of the program, speaker interaction data collection will be fully automated through machine learning and/or logical constructs. The integration of voice-based emotion recognition will be the next phase for this program. Overall, the diaLogic program is the central workspace for social interaction characterization.","PeriodicalId":384949,"journal":{"name":"2021 IEEE International Systems Conference (SysCon)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Systems Conference (SysCon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SysCon48628.2021.9447101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

diaLogic is a user-friendly Python program which performs social interaction classification through speaker diarization. The main libraries used include Python’s PyQt5 and Keras APIs, Matplotlib, and the computational R language. Speaker diarization is achieved with high consistency due to a simple four-layer convolutional neural network (CNN) trained on the Librispeech ASR corpus. Speaker interactions are modeled through a custom R language script. The data generated by the program allows the characterization of speaker traits within social experiments. Group leaders, followers, and level of speaker contribution can be characterized. These traits can be used to determine overall group performance, as well as the performance of individuals. The interface is designed to be simplistic and intuitive, which allows easy operation by nonengineers. This design consideration allows program operation with minimal training for users in the social sciences disciplines. The program is designed with a modular backend, which is invisible to the user of the program. The backend allows easy expansion through modular algorithms. For future iterations of the program, speaker interaction data collection will be fully automated through machine learning and/or logical constructs. The integration of voice-based emotion recognition will be the next phase for this program. Overall, the diaLogic program is the central workspace for social interaction characterization.
对话:以互动为中心的演讲者对话
diaLogic是一个用户友好的Python程序,它通过说话者拨号来执行社会互动分类。使用的主要库包括Python的PyQt5和Keras api、Matplotlib和计算R语言。通过在librisspeech ASR语料库上训练简单的四层卷积神经网络(CNN),实现了高一致性的说话人划分。通过自定义R语言脚本对说话者交互进行建模。该程序生成的数据可以在社会实验中对说话人的特征进行表征。小组领导,追随者和演讲者的贡献水平可以表征。这些特征可以用来决定整体的团队表现,以及个人的表现。界面设计简单直观,非工程师也能轻松操作。这种设计考虑允许程序在对社会科学学科的用户进行最少培训的情况下运行。该程序采用模块化后端设计,对程序的用户是不可见的。后端允许通过模块化算法轻松扩展。对于该程序的未来迭代,演讲者交互数据收集将通过机器学习和/或逻辑构造完全自动化。整合基于语音的情感识别将是这个项目的下一个阶段。总的来说,diaLogic项目是社会互动表征的中心工作空间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信