探索用于阿尔茨海默氏症痴呆症检测的深度迁移学习技术。

IF 2.7 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Frontiers in Computer Science Pub Date : 2021-05-01 Epub Date: 2021-05-12 DOI:10.3389/fcomp.2021.624683

Youxiang Zhu, Xiaohui Liang, John A Batsis, Robert M Roth

{"title":"探索用于阿尔茨海默氏症痴呆症检测的深度迁移学习技术。","authors":"Youxiang Zhu, Xiaohui Liang, John A Batsis, Robert M Roth","doi":"10.3389/fcomp.2021.624683","DOIUrl":null,"url":null,"abstract":"Examination of speech datasets for detecting dementia, collected via various speech tasks, has revealed links between speech and cognitive abilities. However, the speech dataset available for this research is extremely limited because the collection process of speech and baseline data from patients with dementia in clinical settings is expensive. In this paper, we study the spontaneous speech dataset from a recent ADReSS challenge, a Cookie Theft Picture (CTP) dataset with balanced groups of participants in age, gender, and cognitive status. We explore state-of-the-art deep transfer learning techniques from image, audio, speech, and language domains. We envision that one advantage of transfer learning is to eliminate the design of handcrafted features based on the tasks and datasets. Transfer learning further mitigates the limited dementia-relevant speech data problem by inheriting knowledge from similar but much larger datasets. Specifically, we built a variety of transfer learning models using commonly employed MobileNet (image), YAMNet (audio), Mockingjay (speech), and BERT (text) models. Results indicated that the transfer learning models of text data showed significantly better performance than those of audio data. Performance gains of the text models may be due to the high similarity between the pre-training text dataset and the CTP text dataset. Our multi-modal transfer learning introduced a slight improvement in accuracy, demonstrating that audio and text data provide limited complementary information. Multi-task transfer learning resulted in limited improvements in classification and a negative impact in regression. By analyzing the meaning behind the AD/non-AD labels and Mini-Mental State Examination (MMSE) scores, we observed that the inconsistency between labels and scores could limit the performance of the multi-task learning, especially when the outputs of the single-task models are highly consistent with the corresponding labels/scores. In sum, we conducted a large comparative analysis of varying transfer learning models focusing less on model customization but more on pre-trained models and pre-training datasets. We revealed insightful relations among models, data types, and data labels in this research area.","PeriodicalId":52823,"journal":{"name":"Frontiers in Computer Science","volume":"3 ","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8153512/pdf/","citationCount":"0","resultStr":"{\"title\":\"Exploring Deep Transfer Learning Techniques for Alzheimer's Dementia Detection.\",\"authors\":\"Youxiang Zhu, Xiaohui Liang, John A Batsis, Robert M Roth\",\"doi\":\"10.3389/fcomp.2021.624683\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Examination of speech datasets for detecting dementia, collected via various speech tasks, has revealed links between speech and cognitive abilities. However, the speech dataset available for this research is extremely limited because the collection process of speech and baseline data from patients with dementia in clinical settings is expensive. In this paper, we study the spontaneous speech dataset from a recent ADReSS challenge, a Cookie Theft Picture (CTP) dataset with balanced groups of participants in age, gender, and cognitive status. We explore state-of-the-art deep transfer learning techniques from image, audio, speech, and language domains. We envision that one advantage of transfer learning is to eliminate the design of handcrafted features based on the tasks and datasets. Transfer learning further mitigates the limited dementia-relevant speech data problem by inheriting knowledge from similar but much larger datasets. Specifically, we built a variety of transfer learning models using commonly employed MobileNet (image), YAMNet (audio), Mockingjay (speech), and BERT (text) models. Results indicated that the transfer learning models of text data showed significantly better performance than those of audio data. Performance gains of the text models may be due to the high similarity between the pre-training text dataset and the CTP text dataset. Our multi-modal transfer learning introduced a slight improvement in accuracy, demonstrating that audio and text data provide limited complementary information. Multi-task transfer learning resulted in limited improvements in classification and a negative impact in regression. By analyzing the meaning behind the AD/non-AD labels and Mini-Mental State Examination (MMSE) scores, we observed that the inconsistency between labels and scores could limit the performance of the multi-task learning, especially when the outputs of the single-task models are highly consistent with the corresponding labels/scores. In sum, we conducted a large comparative analysis of varying transfer learning models focusing less on model customization but more on pre-trained models and pre-training datasets. We revealed insightful relations among models, data types, and data labels in this research area.\",\"PeriodicalId\":52823,\"journal\":{\"name\":\"Frontiers in Computer Science\",\"volume\":\"3 \",\"pages\":\"\"},\"PeriodicalIF\":2.7000,\"publicationDate\":\"2021-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8153512/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Frontiers in Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3389/fcomp.2021.624683\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2021/5/12 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fcomp.2021.624683","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/5/12 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

通过各种语音任务收集到的用于检测痴呆症的语音数据集显示，语音与认知能力之间存在联系。然而，由于在临床环境中收集痴呆症患者的语音和基线数据的过程非常昂贵，因此可用于这项研究的语音数据集非常有限。在本文中，我们研究了最近一次 ADReSS 挑战赛中的自发语音数据集，即 Cookie Theft Picture（CTP）数据集，该数据集的参与者在年龄、性别和认知状态上都是均衡的。我们探索了图像、音频、语音和语言领域最先进的深度迁移学习技术。我们认为，迁移学习的一个优势是消除了基于任务和数据集的手工特征设计。迁移学习通过继承类似但规模更大的数据集的知识，进一步缓解了痴呆症相关语音数据有限的问题。具体来说，我们使用常用的 MobileNet（图像）、YAMNet（音频）、Mockingjay（语音）和 BERT（文本）模型建立了各种迁移学习模型。结果表明，文本数据迁移学习模型的性能明显优于音频数据迁移学习模型。文本模型的性能提升可能是由于预训练文本数据集与 CTP 文本数据集之间的高度相似性。我们的多模态迁移学习略微提高了准确率，这表明音频和文本数据提供的互补信息有限。多任务迁移学习在分类方面的改进有限，而在回归方面则产生了负面影响。通过分析注意力缺失/非注意力缺失（AD/non-AD）标签和迷你精神状态检查（MMSE）分数背后的含义，我们发现标签和分数之间的不一致性可能会限制多任务学习的性能，尤其是当单任务模型的输出与相应的标签/分数高度一致时。总之，我们对不同的迁移学习模型进行了大量比较分析，重点不是模型定制，而是预训练模型和预训练数据集。我们揭示了这一研究领域中模型、数据类型和数据标签之间的深刻关系。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Exploring Deep Transfer Learning Techniques for Alzheimer's Dementia Detection.

查看原文本刊更多论文

Exploring Deep Transfer Learning Techniques for Alzheimer's Dementia Detection.

Examination of speech datasets for detecting dementia, collected via various speech tasks, has revealed links between speech and cognitive abilities. However, the speech dataset available for this research is extremely limited because the collection process of speech and baseline data from patients with dementia in clinical settings is expensive. In this paper, we study the spontaneous speech dataset from a recent ADReSS challenge, a Cookie Theft Picture (CTP) dataset with balanced groups of participants in age, gender, and cognitive status. We explore state-of-the-art deep transfer learning techniques from image, audio, speech, and language domains. We envision that one advantage of transfer learning is to eliminate the design of handcrafted features based on the tasks and datasets. Transfer learning further mitigates the limited dementia-relevant speech data problem by inheriting knowledge from similar but much larger datasets. Specifically, we built a variety of transfer learning models using commonly employed MobileNet (image), YAMNet (audio), Mockingjay (speech), and BERT (text) models. Results indicated that the transfer learning models of text data showed significantly better performance than those of audio data. Performance gains of the text models may be due to the high similarity between the pre-training text dataset and the CTP text dataset. Our multi-modal transfer learning introduced a slight improvement in accuracy, demonstrating that audio and text data provide limited complementary information. Multi-task transfer learning resulted in limited improvements in classification and a negative impact in regression. By analyzing the meaning behind the AD/non-AD labels and Mini-Mental State Examination (MMSE) scores, we observed that the inconsistency between labels and scores could limit the performance of the multi-task learning, especially when the outputs of the single-task models are highly consistent with the corresponding labels/scores. In sum, we conducted a large comparative analysis of varying transfer learning models focusing less on model customization but more on pre-trained models and pre-training datasets. We revealed insightful relations among models, data types, and data labels in this research area.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊