A semi-supervised high-quality pseudo labels algorithm based on multi-constraint optimization for speech deception detection

IF 3.1 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Huawei Tao , Hang Yu , Man Liu , Hongliang Fu , Chunhua Zhu , Yue Xie
{"title":"A semi-supervised high-quality pseudo labels algorithm based on multi-constraint optimization for speech deception detection","authors":"Huawei Tao ,&nbsp;Hang Yu ,&nbsp;Man Liu ,&nbsp;Hongliang Fu ,&nbsp;Chunhua Zhu ,&nbsp;Yue Xie","doi":"10.1016/j.csl.2023.101586","DOIUrl":null,"url":null,"abstract":"<div><p>Deep learning-based speech deception detection research relies on a large amount of labeled data. However, in the process of collecting speech deception detection data, the identification of truth and lies requires researchers to have a professional knowledge reserve, which greatly limits the number of annotated samples. Improving the accuracy of lie detection with insufficient annotation data is the focus of this study at this stage. In this paper, we propose a semi-supervised high-quality pseudo-label algorithm based on multi-constraint optimization (HQPL-MC) for speech deception detection. Firstly, the algorithm exploits the potential feature information of unlabeled data by using deep auto-encoder networks; secondly, it achieves entropy minimization with the help of the pseudo labeling technique to reduce the class overlap distribution of truth and deception data; finally, it improves the quality of pseudo labels by optimizing the unlabeled loss and reconstruction loss to further enhance the classification performance of the model when the labeled data is insufficient. We recorded an interview-style corpus by ourselves and used it in this paper for the experimental demonstration of the algorithm together with the Columbia/SRI/Colorado(CSC) corpus. The detection performance of the proposed algorithm is better than most state-of-the-art algorithms.</p></div>","PeriodicalId":50638,"journal":{"name":"Computer Speech and Language","volume":null,"pages":null},"PeriodicalIF":3.1000,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Speech and Language","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0885230823001055","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Deep learning-based speech deception detection research relies on a large amount of labeled data. However, in the process of collecting speech deception detection data, the identification of truth and lies requires researchers to have a professional knowledge reserve, which greatly limits the number of annotated samples. Improving the accuracy of lie detection with insufficient annotation data is the focus of this study at this stage. In this paper, we propose a semi-supervised high-quality pseudo-label algorithm based on multi-constraint optimization (HQPL-MC) for speech deception detection. Firstly, the algorithm exploits the potential feature information of unlabeled data by using deep auto-encoder networks; secondly, it achieves entropy minimization with the help of the pseudo labeling technique to reduce the class overlap distribution of truth and deception data; finally, it improves the quality of pseudo labels by optimizing the unlabeled loss and reconstruction loss to further enhance the classification performance of the model when the labeled data is insufficient. We recorded an interview-style corpus by ourselves and used it in this paper for the experimental demonstration of the algorithm together with the Columbia/SRI/Colorado(CSC) corpus. The detection performance of the proposed algorithm is better than most state-of-the-art algorithms.

基于多约束优化的半监督高质量伪标签语音欺骗检测算法
基于深度学习的语音欺骗检测研究依赖于大量的标记数据。然而,在收集语音欺骗检测数据的过程中,真实和谎言的识别需要研究人员有专业的知识储备,这极大地限制了标注样本的数量。提高标注数据不足的测谎准确率是本阶段研究的重点。本文提出了一种基于多约束优化的半监督高质量伪标签算法(HQPL-MC)用于语音欺骗检测。该算法首先利用深度自编码器网络挖掘未标记数据的潜在特征信息;其次,利用伪标记技术实现熵最小化,减少真实和欺骗数据的类重叠分布;最后,通过优化未标记损失和重建损失来提高伪标签的质量,进一步提高模型在标记数据不足时的分类性能。我们自己录制了一个访谈式语料库,并将其与Columbia/SRI/Colorado(CSC)语料库一起用于本文算法的实验演示。该算法的检测性能优于大多数最先进的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Speech and Language
Computer Speech and Language 工程技术-计算机:人工智能
CiteScore
11.30
自引率
4.70%
发文量
80
审稿时长
22.9 weeks
期刊介绍: Computer Speech & Language publishes reports of original research related to the recognition, understanding, production, coding and mining of speech and language. The speech and language sciences have a long history, but it is only relatively recently that large-scale implementation of and experimentation with complex models of speech and language processing has become feasible. Such research is often carried out somewhat separately by practitioners of artificial intelligence, computer science, electronic engineering, information retrieval, linguistics, phonetics, or psychology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信