结合大众生成的媒体和个人数据:半监督学习的语境识别

PDM '13 Pub Date : 2013-10-22 DOI:10.1145/2509352.2509396
Long-Van Nguyen-Dinh, M. Rossi, Ulf Blanke, G. Tröster
{"title":"结合大众生成的媒体和个人数据:半监督学习的语境识别","authors":"Long-Van Nguyen-Dinh, M. Rossi, Ulf Blanke, G. Tröster","doi":"10.1145/2509352.2509396","DOIUrl":null,"url":null,"abstract":"The growing ubiquity of sensors in mobile phones has opened many opportunities for personal daily activity sensing. Most context recognition systems require a cumbersome preparation by collecting and manually annotating training examples. Recently, mining online crowd-generated repositories for free annotated training data has been proposed to build context models. A crowd-generated dataset can capture a large variety both in terms of class number and in intra-class diversity, but may not cover all user-specific contexts. Thus, performance is often significantly worse than that of user-centric training. In this work, we exploit for the first time the combination of both crowd-generated audio dataset available in the web and unlabeled audio data obtained from users' mobile phones. We use a semi-supervised Gaussian mixture model to combine labeled data from the crowd-generated database and unlabeled personal recording data. Hereby we refine generic knowledge with data from the user to train a personalized model. This technique has been tested on 7 users on mobile phones with a total data of 14 days and up to 9 context classes. Preliminary results show that a semi-supervised model can improve the recognition accuracy up to 21%.","PeriodicalId":173211,"journal":{"name":"PDM '13","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Combining crowd-generated media and personal data: semi-supervised learning for context recognition\",\"authors\":\"Long-Van Nguyen-Dinh, M. Rossi, Ulf Blanke, G. Tröster\",\"doi\":\"10.1145/2509352.2509396\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growing ubiquity of sensors in mobile phones has opened many opportunities for personal daily activity sensing. Most context recognition systems require a cumbersome preparation by collecting and manually annotating training examples. Recently, mining online crowd-generated repositories for free annotated training data has been proposed to build context models. A crowd-generated dataset can capture a large variety both in terms of class number and in intra-class diversity, but may not cover all user-specific contexts. Thus, performance is often significantly worse than that of user-centric training. In this work, we exploit for the first time the combination of both crowd-generated audio dataset available in the web and unlabeled audio data obtained from users' mobile phones. We use a semi-supervised Gaussian mixture model to combine labeled data from the crowd-generated database and unlabeled personal recording data. Hereby we refine generic knowledge with data from the user to train a personalized model. This technique has been tested on 7 users on mobile phones with a total data of 14 days and up to 9 context classes. Preliminary results show that a semi-supervised model can improve the recognition accuracy up to 21%.\",\"PeriodicalId\":173211,\"journal\":{\"name\":\"PDM '13\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-10-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PDM '13\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2509352.2509396\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PDM '13","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2509352.2509396","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12

摘要

手机中传感器的日益普及为个人日常活动传感提供了许多机会。大多数上下文识别系统需要通过收集和手动注释训练示例来进行繁琐的准备工作。最近,人们提出了从在线人群生成的库中挖掘免费的带注释的训练数据来构建上下文模型。群体生成的数据集可以捕获类数量和类内多样性方面的大量变化,但可能无法覆盖所有特定于用户的上下文。因此,性能通常比以用户为中心的训练差得多。在这项工作中,我们首次利用了网络上可用的人群生成的音频数据集和从用户手机上获得的未标记音频数据的组合。我们使用半监督高斯混合模型来组合来自人群生成数据库的标记数据和未标记的个人记录数据。在此基础上,我们利用用户数据对通用知识进行提炼,从而训练出个性化的模型。这项技术已经在7个手机用户身上进行了测试,总共有14天的数据和多达9个上下文类。初步结果表明,半监督模型可将识别准确率提高21%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Combining crowd-generated media and personal data: semi-supervised learning for context recognition
The growing ubiquity of sensors in mobile phones has opened many opportunities for personal daily activity sensing. Most context recognition systems require a cumbersome preparation by collecting and manually annotating training examples. Recently, mining online crowd-generated repositories for free annotated training data has been proposed to build context models. A crowd-generated dataset can capture a large variety both in terms of class number and in intra-class diversity, but may not cover all user-specific contexts. Thus, performance is often significantly worse than that of user-centric training. In this work, we exploit for the first time the combination of both crowd-generated audio dataset available in the web and unlabeled audio data obtained from users' mobile phones. We use a semi-supervised Gaussian mixture model to combine labeled data from the crowd-generated database and unlabeled personal recording data. Hereby we refine generic knowledge with data from the user to train a personalized model. This technique has been tested on 7 users on mobile phones with a total data of 14 days and up to 9 context classes. Preliminary results show that a semi-supervised model can improve the recognition accuracy up to 21%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信