Can earnings conference calls tell more lies? A contrastive multimodal dialogue network for advanced financial statement fraud detection

IF 6.8 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Decision Support Systems Pub Date : 2025-02-01 DOI:10.1016/j.dss.2024.114381

Qi Lu , Wei Du , Shaochen Yang , Wei Xu , J. Leon Zhao

{"title":"Can earnings conference calls tell more lies? A contrastive multimodal dialogue network for advanced financial statement fraud detection","authors":"Qi Lu , Wei Du , Shaochen Yang , Wei Xu , J. Leon Zhao","doi":"10.1016/j.dss.2024.114381","DOIUrl":null,"url":null,"abstract":"<div><div>Financial statement frauds by listed firms pose significant challenges to public investors and jeopardize the stability of financial markets. Previous studies have identified deceptive verbal and vocal cues from earnings conference calls as indicators of financial statement fraud. However, these studies only extracted managers' verbal and vocal cues separately over the entire call, neglecting the utterance-level fusion between verbal and vocal cues as well as the multi-turn interaction between analysts and managers. To fill this gap, we develop a novel end-to-end <em><strong>c</strong>ontrastive <strong>m</strong>ulti<strong>m</strong>odal <strong>d</strong>ialogue network</em> (CMMD) that considers both verbal-vocal fusion and multi-role interactions to uncover hidden deceptive cues in earnings conference calls. The proposed model comprises two core modules, namely, the <em>Multimodal Fusion Learning module and the Dialogue Interaction Learning module</em>. Building on Vrij's verbal-nonverbal complementary mechanisms in deception detection, the designed <em>Multimodal Fusion Learning</em> employs contrastive learning to align verbal and vocal cues and a co-attention mechanism to learn cross-modal interaction. Inspired by the Interpersonal Deception Theory that emphasizes the dynamic interaction process between deceivers and targets, the <em>Dialogue Interaction Learning</em> utilizes a dialogue-aware co-attention mechanism to model multi-turn analyst-manager interaction and uses contrastive learning to improve dialogue representations. Our extensive empirical results show that CMMD achieves 8.64 % improvement in detecting fraudulent cases compared to the best baseline model. As such, our study advances the research frontier in fraud detection and contributes an innovative IT artifact in practice.</div></div>","PeriodicalId":55181,"journal":{"name":"Decision Support Systems","volume":"189 ","pages":"Article 114381"},"PeriodicalIF":6.8000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Decision Support Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167923624002148","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Financial statement frauds by listed firms pose significant challenges to public investors and jeopardize the stability of financial markets. Previous studies have identified deceptive verbal and vocal cues from earnings conference calls as indicators of financial statement fraud. However, these studies only extracted managers' verbal and vocal cues separately over the entire call, neglecting the utterance-level fusion between verbal and vocal cues as well as the multi-turn interaction between analysts and managers. To fill this gap, we develop a novel end-to-end contrastive multimodal dialogue network (CMMD) that considers both verbal-vocal fusion and multi-role interactions to uncover hidden deceptive cues in earnings conference calls. The proposed model comprises two core modules, namely, the Multimodal Fusion Learning module and the Dialogue Interaction Learning module. Building on Vrij's verbal-nonverbal complementary mechanisms in deception detection, the designed Multimodal Fusion Learning employs contrastive learning to align verbal and vocal cues and a co-attention mechanism to learn cross-modal interaction. Inspired by the Interpersonal Deception Theory that emphasizes the dynamic interaction process between deceivers and targets, the Dialogue Interaction Learning utilizes a dialogue-aware co-attention mechanism to model multi-turn analyst-manager interaction and uses contrastive learning to improve dialogue representations. Our extensive empirical results show that CMMD achieves 8.64 % improvement in detecting fraudulent cases compared to the best baseline model. As such, our study advances the research frontier in fraud detection and contributes an innovative IT artifact in practice.

查看原文本刊更多论文

财报电话会议能说更多谎言吗？一种用于高级财务报表欺诈检测的对比多模式对话网络

上市公司财务报表舞弊对公众投资者构成重大挑战，并危及金融市场的稳定。先前的研究已经确定了来自盈利电话会议的欺骗性口头和口头线索，作为财务报表欺诈的指标。然而，这些研究只是在整个电话会议中分别提取了管理者的言语和声音线索，而忽略了言语和声音线索在话语层面的融合以及分析师与管理者之间的多回合互动。为了填补这一空白，我们开发了一种新颖的端到端对比多模态对话网络（CMMD），该网络考虑了语言-语音融合和多角色互动，以发现财报电话会议中隐藏的欺骗性线索。该模型包括两个核心模块，即多模态融合学习模块和对话交互学习模块。基于Vrij在欺骗检测中的语言-非语言互补机制，设计的多模态融合学习采用对比学习来对齐语言和声音线索，并采用共同注意机制来学习跨模态交互。受人际欺骗理论的启发，强调欺骗者与目标之间的动态互动过程，对话互动学习利用对话感知的共同注意机制来模拟多回合分析者与管理者之间的互动，并利用对比学习来改进对话表征。我们广泛的实证结果表明，与最佳基线模型相比，CMMD在检测欺诈案件方面实现了8.64%的提高。因此，我们的研究推动了欺诈检测的研究前沿，并在实践中贡献了创新的IT工件。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Decision Support Systems 工程技术-计算机：人工智能

CiteScore

14.70

自引率

6.70%

发文量

119

审稿时长

13 months

期刊介绍： The common thread of articles published in Decision Support Systems is their relevance to theoretical and technical issues in the support of enhanced decision making. The areas addressed may include foundations, functionality, interfaces, implementation, impacts, and evaluation of decision support systems (DSSs).