结合大众生成的媒体和个人数据:半监督学习的语境识别

PDM '13 Pub Date : 2013-10-22 DOI:10.1145/2509352.2509396

Long-Van Nguyen-Dinh, M. Rossi, Ulf Blanke, G. Tröster

{"title":"结合大众生成的媒体和个人数据:半监督学习的语境识别","authors":"Long-Van Nguyen-Dinh, M. Rossi, Ulf Blanke, G. Tröster","doi":"10.1145/2509352.2509396","DOIUrl":null,"url":null,"abstract":"The growing ubiquity of sensors in mobile phones has opened many opportunities for personal daily activity sensing. Most context recognition systems require a cumbersome preparation by collecting and manually annotating training examples. Recently, mining online crowd-generated repositories for free annotated training data has been proposed to build context models. A crowd-generated dataset can capture a large variety both in terms of class number and in intra-class diversity, but may not cover all user-specific contexts. Thus, performance is often significantly worse than that of user-centric training. In this work, we exploit for the first time the combination of both crowd-generated audio dataset available in the web and unlabeled audio data obtained from users' mobile phones. We use a semi-supervised Gaussian mixture model to combine labeled data from the crowd-generated database and unlabeled personal recording data. Hereby we refine generic knowledge with data from the user to train a personalized model. This technique has been tested on 7 users on mobile phones with a total data of 14 days and up to 9 context classes. Preliminary results show that a semi-supervised model can improve the recognition accuracy up to 21%.","PeriodicalId":173211,"journal":{"name":"PDM '13","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Combining crowd-generated media and personal data: semi-supervised learning for context recognition\",\"authors\":\"Long-Van Nguyen-Dinh, M. Rossi, Ulf Blanke, G. Tröster\",\"doi\":\"10.1145/2509352.2509396\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growing ubiquity of sensors in mobile phones has opened many opportunities for personal daily activity sensing. Most context recognition systems require a cumbersome preparation by collecting and manually annotating training examples. Recently, mining online crowd-generated repositories for free annotated training data has been proposed to build context models. A crowd-generated dataset can capture a large variety both in terms of class number and in intra-class diversity, but may not cover all user-specific contexts. Thus, performance is often significantly worse than that of user-centric training. In this work, we exploit for the first time the combination of both crowd-generated audio dataset available in the web and unlabeled audio data obtained from users' mobile phones. We use a semi-supervised Gaussian mixture model to combine labeled data from the crowd-generated database and unlabeled personal recording data. Hereby we refine generic knowledge with data from the user to train a personalized model. This technique has been tested on 7 users on mobile phones with a total data of 14 days and up to 9 context classes. Preliminary results show that a semi-supervised model can improve the recognition accuracy up to 21%.\",\"PeriodicalId\":173211,\"journal\":{\"name\":\"PDM '13\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PDM '13\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2509352.2509396\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PDM '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2509352.2509396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

手机中传感器的日益普及为个人日常活动传感提供了许多机会。大多数上下文识别系统需要通过收集和手动注释训练示例来进行繁琐的准备工作。最近，人们提出了从在线人群生成的库中挖掘免费的带注释的训练数据来构建上下文模型。群体生成的数据集可以捕获类数量和类内多样性方面的大量变化，但可能无法覆盖所有特定于用户的上下文。因此，性能通常比以用户为中心的训练差得多。在这项工作中，我们首次利用了网络上可用的人群生成的音频数据集和从用户手机上获得的未标记音频数据的组合。我们使用半监督高斯混合模型来组合来自人群生成数据库的标记数据和未标记的个人记录数据。在此基础上，我们利用用户数据对通用知识进行提炼，从而训练出个性化的模型。这项技术已经在7个手机用户身上进行了测试，总共有14天的数据和多达9个上下文类。初步结果表明，半监督模型可将识别准确率提高21%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Combining crowd-generated media and personal data: semi-supervised learning for context recognition

The growing ubiquity of sensors in mobile phones has opened many opportunities for personal daily activity sensing. Most context recognition systems require a cumbersome preparation by collecting and manually annotating training examples. Recently, mining online crowd-generated repositories for free annotated training data has been proposed to build context models. A crowd-generated dataset can capture a large variety both in terms of class number and in intra-class diversity, but may not cover all user-specific contexts. Thus, performance is often significantly worse than that of user-centric training. In this work, we exploit for the first time the combination of both crowd-generated audio dataset available in the web and unlabeled audio data obtained from users' mobile phones. We use a semi-supervised Gaussian mixture model to combine labeled data from the crowd-generated database and unlabeled personal recording data. Hereby we refine generic knowledge with data from the user to train a personalized model. This technique has been tested on 7 users on mobile phones with a total data of 14 days and up to 9 context classes. Preliminary results show that a semi-supervised model can improve the recognition accuracy up to 21%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PDM '13

自引率

0.00%

发文量