{"title":"查询抑郁症视频日志","authors":"M. J. Correia, B. Raj, I. Trancoso","doi":"10.1109/SLT.2018.8639555","DOIUrl":null,"url":null,"abstract":"Speech based diagnosis-aid tools for depression typically depend on few and small datasets, that are expensive to collect. The limited availability of training data poses a limitation to the quality that these systems can achieve. An unexplored alternative for large scale source of data are vlogs collected from online multimedia repositories. Along with the automation of the mining process, it is necessary to automate the labeling process too.In this work, we propose a framework to automatically label a corpus of in-the-wild vlogs of possibly depressed subjects, and we estimate the quality of the predicted labels, without ever having access to a ground truth for the majority of the corpus. The framework uses a small subset to train a model and estimate the labels for the remainder of the corpus. Then, using the predicted labels, we train a noisy model and attempt to reconstruct the labels of the original labeled subset. We hypothesize that the quality of the estimated labels for the unlabelled subset of the corpus is correlated to the quality of the label reconstruction of the labeled subset.The results of the bi-modal experiment using in-the-wild data are compared to the ones obtained using controlled data.","PeriodicalId":377307,"journal":{"name":"2018 IEEE Spoken Language Technology Workshop (SLT)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Querying Depression Vlogs\",\"authors\":\"M. J. Correia, B. Raj, I. Trancoso\",\"doi\":\"10.1109/SLT.2018.8639555\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Speech based diagnosis-aid tools for depression typically depend on few and small datasets, that are expensive to collect. The limited availability of training data poses a limitation to the quality that these systems can achieve. An unexplored alternative for large scale source of data are vlogs collected from online multimedia repositories. Along with the automation of the mining process, it is necessary to automate the labeling process too.In this work, we propose a framework to automatically label a corpus of in-the-wild vlogs of possibly depressed subjects, and we estimate the quality of the predicted labels, without ever having access to a ground truth for the majority of the corpus. The framework uses a small subset to train a model and estimate the labels for the remainder of the corpus. Then, using the predicted labels, we train a noisy model and attempt to reconstruct the labels of the original labeled subset. We hypothesize that the quality of the estimated labels for the unlabelled subset of the corpus is correlated to the quality of the label reconstruction of the labeled subset.The results of the bi-modal experiment using in-the-wild data are compared to the ones obtained using controlled data.\",\"PeriodicalId\":377307,\"journal\":{\"name\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT.2018.8639555\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT.2018.8639555","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Speech based diagnosis-aid tools for depression typically depend on few and small datasets, that are expensive to collect. The limited availability of training data poses a limitation to the quality that these systems can achieve. An unexplored alternative for large scale source of data are vlogs collected from online multimedia repositories. Along with the automation of the mining process, it is necessary to automate the labeling process too.In this work, we propose a framework to automatically label a corpus of in-the-wild vlogs of possibly depressed subjects, and we estimate the quality of the predicted labels, without ever having access to a ground truth for the majority of the corpus. The framework uses a small subset to train a model and estimate the labels for the remainder of the corpus. Then, using the predicted labels, we train a noisy model and attempt to reconstruct the labels of the original labeled subset. We hypothesize that the quality of the estimated labels for the unlabelled subset of the corpus is correlated to the quality of the label reconstruction of the labeled subset.The results of the bi-modal experiment using in-the-wild data are compared to the ones obtained using controlled data.