{"title":"基于混合词嵌入和语言特征的阿尔茨海默病早期诊断","authors":"Yangyang Li","doi":"10.1145/3446132.3446197","DOIUrl":null,"url":null,"abstract":"Early detection of Alzheimer's Disease (AD) is of great importance to the benefits of AD patients, including lessening symptoms and alleviating the financial burden of health care. As one of the leading signs of AD, changes of language capability can potentially be used for early diagnosis of AD. In this paper, I develop an automatic and accurate diagnostic model by using the linguistic characteristics of the subjects and hybrid word embedding. I detected linguistic features such as pauses, unintelligible words, repetitions, etc. from transcripts of interviews. Then I create a text embedding by combining word vectors from Doc2vec and ELMo. Moreover, by tuning hyperparameters of the machine learning pipeline (e.g., model regularization parameter, learning rate and vector size of Doc2vec, and vector size of ELMo), I achieve 91% classification accuracy and an Area Under the Curve (AUC) of 97% for distinguishing early AD from healthy subjects. Compared with the method which only uses word count, I improved the absolute detection accuracy by 10%, and the absolute AUC by 9%. Moreover, I study the stability of the model by repeating experiment and find out that the model is stable even though my training data is split randomly. My algorithms have high detection accuracy and are stable. This model could be used as a large-scale screening method for AD, as well as a complement to doctors’ detection of AD.","PeriodicalId":125388,"journal":{"name":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Early Diagnosis of Alzheimer's Disease Using Hybrid Word Embedding and Linguistic Characteristics\",\"authors\":\"Yangyang Li\",\"doi\":\"10.1145/3446132.3446197\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Early detection of Alzheimer's Disease (AD) is of great importance to the benefits of AD patients, including lessening symptoms and alleviating the financial burden of health care. As one of the leading signs of AD, changes of language capability can potentially be used for early diagnosis of AD. In this paper, I develop an automatic and accurate diagnostic model by using the linguistic characteristics of the subjects and hybrid word embedding. I detected linguistic features such as pauses, unintelligible words, repetitions, etc. from transcripts of interviews. Then I create a text embedding by combining word vectors from Doc2vec and ELMo. Moreover, by tuning hyperparameters of the machine learning pipeline (e.g., model regularization parameter, learning rate and vector size of Doc2vec, and vector size of ELMo), I achieve 91% classification accuracy and an Area Under the Curve (AUC) of 97% for distinguishing early AD from healthy subjects. Compared with the method which only uses word count, I improved the absolute detection accuracy by 10%, and the absolute AUC by 9%. Moreover, I study the stability of the model by repeating experiment and find out that the model is stable even though my training data is split randomly. My algorithms have high detection accuracy and are stable. This model could be used as a large-scale screening method for AD, as well as a complement to doctors’ detection of AD.\",\"PeriodicalId\":125388,\"journal\":{\"name\":\"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence\",\"volume\":\"71 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3446132.3446197\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 3rd International Conference on Algorithms, Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3446132.3446197","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Early Diagnosis of Alzheimer's Disease Using Hybrid Word Embedding and Linguistic Characteristics
Early detection of Alzheimer's Disease (AD) is of great importance to the benefits of AD patients, including lessening symptoms and alleviating the financial burden of health care. As one of the leading signs of AD, changes of language capability can potentially be used for early diagnosis of AD. In this paper, I develop an automatic and accurate diagnostic model by using the linguistic characteristics of the subjects and hybrid word embedding. I detected linguistic features such as pauses, unintelligible words, repetitions, etc. from transcripts of interviews. Then I create a text embedding by combining word vectors from Doc2vec and ELMo. Moreover, by tuning hyperparameters of the machine learning pipeline (e.g., model regularization parameter, learning rate and vector size of Doc2vec, and vector size of ELMo), I achieve 91% classification accuracy and an Area Under the Curve (AUC) of 97% for distinguishing early AD from healthy subjects. Compared with the method which only uses word count, I improved the absolute detection accuracy by 10%, and the absolute AUC by 9%. Moreover, I study the stability of the model by repeating experiment and find out that the model is stable even though my training data is split randomly. My algorithms have high detection accuracy and are stable. This model could be used as a large-scale screening method for AD, as well as a complement to doctors’ detection of AD.