利用改进的启发式方法,从多模态数据中建立基于注意力的新型混合稀疏网络抑郁分类模型

IF 0.8 Q4 COMPUTER SCIENCE, SOFTWARE ENGINEERING
B. Manjulatha, Suresh Pabboju
{"title":"利用改进的启发式方法,从多模态数据中建立基于注意力的新型混合稀疏网络抑郁分类模型","authors":"B. Manjulatha, Suresh Pabboju","doi":"10.1142/s0219467826500105","DOIUrl":null,"url":null,"abstract":"Automatic depression classification from multimodal input data is a challenging task. Modern methods use paralinguistic information such as audio and video signals. Using linguistic information such as speech signals and text data for depression classification is a complicated task in deep learning models. Best audio and video features are built to produce a dependable depression classification system. Textual signals related to depression classification are analyzed using text-based content data. Moreover, to increase the achievements of the depression classification system, audio, visual, and text descriptors are used. So, a deep learning-based depression classification model is developed to detect the person with depression from multimodal data. The EEG signal, Speech signal, video, and text are gathered from standard databases. Four stages of feature extraction take place. In the first stage, the features from the decomposed EEG signals are attained by the empirical mode decomposition (EMD) method, and features are extracted by means of linear and nonlinear feature extraction. In the second stage, the spectral features of the speech signals from the Mel-frequency cepstral coefficients (MFCC) are extracted. In the third stage, the facial texture features from the input video are extracted. In the fourth stage of feature extraction, the input text data are pre-processed, and from the pre-processed data, the textual features are extracted by using the Transformer Net. All four sets of features are optimally selected and combined with the optimal weights to get the weighted fused features using the enhanced mountaineering team-based optimization algorithm (EMTOA). The optimal weighted fused features are finally given to the hybrid attention-based dilated network (HADN). The HDAN is developed by combining temporal convolutional network (TCN) with bidirectional long short-term memory (Bi-LSTM). The parameters in the HDAN are optimized with the assistance of the developed EMTOA algorithm. At last, the classified output of depression is obtained from the HDAN. The efficiency of the developed deep learning HDAN is validated by comparing it with various traditional classification models.","PeriodicalId":44688,"journal":{"name":"International Journal of Image and Graphics","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Novel Hybrid Attention-Based Dilated Network for Depression Classification Model from Multimodal Data Using Improved Heuristic Approach\",\"authors\":\"B. Manjulatha, Suresh Pabboju\",\"doi\":\"10.1142/s0219467826500105\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Automatic depression classification from multimodal input data is a challenging task. Modern methods use paralinguistic information such as audio and video signals. Using linguistic information such as speech signals and text data for depression classification is a complicated task in deep learning models. Best audio and video features are built to produce a dependable depression classification system. Textual signals related to depression classification are analyzed using text-based content data. Moreover, to increase the achievements of the depression classification system, audio, visual, and text descriptors are used. So, a deep learning-based depression classification model is developed to detect the person with depression from multimodal data. The EEG signal, Speech signal, video, and text are gathered from standard databases. Four stages of feature extraction take place. In the first stage, the features from the decomposed EEG signals are attained by the empirical mode decomposition (EMD) method, and features are extracted by means of linear and nonlinear feature extraction. In the second stage, the spectral features of the speech signals from the Mel-frequency cepstral coefficients (MFCC) are extracted. In the third stage, the facial texture features from the input video are extracted. In the fourth stage of feature extraction, the input text data are pre-processed, and from the pre-processed data, the textual features are extracted by using the Transformer Net. All four sets of features are optimally selected and combined with the optimal weights to get the weighted fused features using the enhanced mountaineering team-based optimization algorithm (EMTOA). The optimal weighted fused features are finally given to the hybrid attention-based dilated network (HADN). The HDAN is developed by combining temporal convolutional network (TCN) with bidirectional long short-term memory (Bi-LSTM). The parameters in the HDAN are optimized with the assistance of the developed EMTOA algorithm. At last, the classified output of depression is obtained from the HDAN. The efficiency of the developed deep learning HDAN is validated by comparing it with various traditional classification models.\",\"PeriodicalId\":44688,\"journal\":{\"name\":\"International Journal of Image and Graphics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.8000,\"publicationDate\":\"2024-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Image and Graphics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1142/s0219467826500105\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Image and Graphics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s0219467826500105","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

从多模态输入数据中自动进行抑郁分类是一项具有挑战性的任务。现代方法使用音频和视频信号等副语言信息。在深度学习模型中,使用语音信号和文本数据等语言信息进行抑郁分类是一项复杂的任务。建立最佳的音频和视频特征,才能产生可靠的抑郁分类系统。与抑郁分类相关的文本信号是利用基于文本的内容数据进行分析的。此外,为了提高抑郁分类系统的成就,还使用了音频、视觉和文本描述符。因此,我们开发了基于深度学习的抑郁分类模型,以便从多模态数据中检测抑郁症患者。脑电信号、语音信号、视频和文本都是从标准数据库中收集的。特征提取分为四个阶段。第一阶段,通过经验模式分解(EMD)方法从分解的脑电信号中获取特征,并通过线性和非线性特征提取方法提取特征。第二阶段,从梅尔频率倒频谱系数(MFCC)中提取语音信号的频谱特征。第三阶段,从输入视频中提取面部纹理特征。在特征提取的第四阶段,对输入的文本数据进行预处理,并使用变换网从预处理后的数据中提取文本特征。使用基于登山队的增强优化算法(EMTOA)对所有四组特征进行优化选择,并结合最佳权重,得到加权融合特征。最后将最优的加权融合特征赋予基于注意力的混合扩张网络(HADN)。HDAN 是通过将时序卷积网络 (TCN) 与双向长短时记忆 (Bi-LSTM) 相结合而开发的。在所开发的 EMTOA 算法的帮助下,HDAN 的参数得到了优化。最后,从 HDAN 中获得抑郁症的分类输出。通过与各种传统分类模型进行比较,验证了所开发的深度学习 HDAN 的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Novel Hybrid Attention-Based Dilated Network for Depression Classification Model from Multimodal Data Using Improved Heuristic Approach
Automatic depression classification from multimodal input data is a challenging task. Modern methods use paralinguistic information such as audio and video signals. Using linguistic information such as speech signals and text data for depression classification is a complicated task in deep learning models. Best audio and video features are built to produce a dependable depression classification system. Textual signals related to depression classification are analyzed using text-based content data. Moreover, to increase the achievements of the depression classification system, audio, visual, and text descriptors are used. So, a deep learning-based depression classification model is developed to detect the person with depression from multimodal data. The EEG signal, Speech signal, video, and text are gathered from standard databases. Four stages of feature extraction take place. In the first stage, the features from the decomposed EEG signals are attained by the empirical mode decomposition (EMD) method, and features are extracted by means of linear and nonlinear feature extraction. In the second stage, the spectral features of the speech signals from the Mel-frequency cepstral coefficients (MFCC) are extracted. In the third stage, the facial texture features from the input video are extracted. In the fourth stage of feature extraction, the input text data are pre-processed, and from the pre-processed data, the textual features are extracted by using the Transformer Net. All four sets of features are optimally selected and combined with the optimal weights to get the weighted fused features using the enhanced mountaineering team-based optimization algorithm (EMTOA). The optimal weighted fused features are finally given to the hybrid attention-based dilated network (HADN). The HDAN is developed by combining temporal convolutional network (TCN) with bidirectional long short-term memory (Bi-LSTM). The parameters in the HDAN are optimized with the assistance of the developed EMTOA algorithm. At last, the classified output of depression is obtained from the HDAN. The efficiency of the developed deep learning HDAN is validated by comparing it with various traditional classification models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Image and Graphics
International Journal of Image and Graphics COMPUTER SCIENCE, SOFTWARE ENGINEERING-
CiteScore
2.40
自引率
18.80%
发文量
67
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信