MF-Saudi: A multimodal framework for bridging the gap between audio and textual data for Saudi dialect detection

IF 5.2 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Raed Alharbi
{"title":"MF-Saudi: A multimodal framework for bridging the gap between audio and textual data for Saudi dialect detection","authors":"Raed Alharbi","doi":"10.1016/j.jksuci.2024.102084","DOIUrl":null,"url":null,"abstract":"<div><p>Detecting variations in dialects within a language can be challenging, particularly in regions with rich linguistic diversity like Saudi Arabia. To our knowledge, no prior attempts have been made to develop a multimodal, audio–textual framework for Saudi dialect detection. The current approaches often concentrate on detecting dialects only based on audio or textual data, which fails to capture the complex relationship between both modalities. In this paper, we propose a novel Multimodal Framework, called MF-Saudi, for Saudi dialect detection. The framework consists of three main components: (1) a pretrained BERT encoder for extracting and encoding textual information; (2) an acoustic model for representing audio signals and fusing them with textual information via the fusion layer; and (3) an alignment learning module to develop meaningful representations that capture the complexities of audio–text relationships, resulting in improved dialect detection. We conduct empirical evaluations on a real-world dataset, demonstrating that our solution outperforms some of the state-of-the-art baseline methods. The experiment’s code can be found here: <span>https://github.com/raed19/MF-Saudi</span><svg><path></path></svg>.</p></div>","PeriodicalId":48547,"journal":{"name":"Journal of King Saud University-Computer and Information Sciences","volume":null,"pages":null},"PeriodicalIF":5.2000,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1319157824001733/pdfft?md5=99b69313cadb5fce44b832f5ddaa2066&pid=1-s2.0-S1319157824001733-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of King Saud University-Computer and Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1319157824001733","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

Detecting variations in dialects within a language can be challenging, particularly in regions with rich linguistic diversity like Saudi Arabia. To our knowledge, no prior attempts have been made to develop a multimodal, audio–textual framework for Saudi dialect detection. The current approaches often concentrate on detecting dialects only based on audio or textual data, which fails to capture the complex relationship between both modalities. In this paper, we propose a novel Multimodal Framework, called MF-Saudi, for Saudi dialect detection. The framework consists of three main components: (1) a pretrained BERT encoder for extracting and encoding textual information; (2) an acoustic model for representing audio signals and fusing them with textual information via the fusion layer; and (3) an alignment learning module to develop meaningful representations that capture the complexities of audio–text relationships, resulting in improved dialect detection. We conduct empirical evaluations on a real-world dataset, demonstrating that our solution outperforms some of the state-of-the-art baseline methods. The experiment’s code can be found here: https://github.com/raed19/MF-Saudi.

MF-Saudi:弥合音频和文本数据鸿沟的多模态框架,用于沙特方言检测
检测一种语言内部的方言变化是一项挑战,尤其是在沙特阿拉伯这样语言多样性丰富的地区。据我们所知,此前还没有人尝试过为沙特方言检测开发多模态、音频和文本框架。目前的方法通常只集中在基于音频或文本数据的方言检测上,无法捕捉两种模式之间的复杂关系。在本文中,我们提出了一种用于沙特方言检测的新型多模态框架,称为 MF-Saudi。该框架由三个主要部分组成:(1) 预训练 BERT 编码器,用于提取和编码文本信息;(2) 声学模型,用于表示音频信号,并通过融合层将音频信号与文本信息融合;以及 (3) 对齐学习模块,用于开发有意义的表征,捕捉音频与文本之间的复杂关系,从而改进方言检测。我们在真实世界的数据集上进行了实证评估,证明我们的解决方案优于一些最先进的基线方法。实验代码请访问:https://github.com/raed19/MF-Saudi。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
10.50
自引率
8.70%
发文量
656
审稿时长
29 days
期刊介绍: In 2022 the Journal of King Saud University - Computer and Information Sciences will become an author paid open access journal. Authors who submit their manuscript after October 31st 2021 will be asked to pay an Article Processing Charge (APC) after acceptance of their paper to make their work immediately, permanently, and freely accessible to all. The Journal of King Saud University Computer and Information Sciences is a refereed, international journal that covers all aspects of both foundations of computer and its practical applications.
文献相关原料
公司名称 产品信息 采购帮参考价格
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信