Leveraging Multi-modal Prior Knowledge for Large-scale Concept Learning in Noisy Web Data

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pub Date : 2017-06-06 DOI:10.1145/3078971.3079003

Junwei Liang, Lu Jiang, Deyu Meng, Alexander Hauptmann

{"title":"Leveraging Multi-modal Prior Knowledge for Large-scale Concept Learning in Noisy Web Data","authors":"Junwei Liang, Lu Jiang, Deyu Meng, Alexander Hauptmann","doi":"10.1145/3078971.3079003","DOIUrl":null,"url":null,"abstract":"Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community. A considerable amount of videos on the web is associated with rich but noisy contextual information, such as the title and other multi-modal information, which provides weak annotations or labels about the video content. To tackle the problem of large-scale noisy learning, We propose a novel method called Multi-modal WEbly-Labeled Learning (WELL-MM), which is established on the state-of-the-art machine learning algorithm inspired by the learning process of human. WELL-MM introduces a novel multi-modal approach to incorporate meaningful prior knowledge called curriculum from the noisy web videos. We empirically study the curriculum constructed from the multi-modal features of the Internet videos and images. The comprehensive experimental results on FCVID and YFCC100M demonstrate that WELL-MM outperforms state-of-the-art studies by a statically significant margin on learning concepts from noisy web video data. In addition, the results also verify that WELL-MM is robust to the level of noisiness in the video data. Notably, WELL-MM trained on sufficient noisy web labels is able to achieve a better accuracy to supervised learning methods trained on the clean manually labeled data.","PeriodicalId":403556,"journal":{"name":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3078971.3079003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

Learning video concept detectors automatically from the big but noisy web data with no additional manual annotations is a novel but challenging area in the multimedia and the machine learning community. A considerable amount of videos on the web is associated with rich but noisy contextual information, such as the title and other multi-modal information, which provides weak annotations or labels about the video content. To tackle the problem of large-scale noisy learning, We propose a novel method called Multi-modal WEbly-Labeled Learning (WELL-MM), which is established on the state-of-the-art machine learning algorithm inspired by the learning process of human. WELL-MM introduces a novel multi-modal approach to incorporate meaningful prior knowledge called curriculum from the noisy web videos. We empirically study the curriculum constructed from the multi-modal features of the Internet videos and images. The comprehensive experimental results on FCVID and YFCC100M demonstrate that WELL-MM outperforms state-of-the-art studies by a statically significant margin on learning concepts from noisy web video data. In addition, the results also verify that WELL-MM is robust to the level of noisiness in the video data. Notably, WELL-MM trained on sufficient noisy web labels is able to achieve a better accuracy to supervised learning methods trained on the clean manually labeled data.

查看原文本刊更多论文

利用多模态先验知识在噪声网络数据中进行大规模概念学习

在多媒体和机器学习社区中，从大量但嘈杂的网络数据中自动学习视频概念检测器而不需要额外的人工注释是一个新颖但具有挑战性的领域。网络上相当数量的视频与丰富但嘈杂的上下文信息相关联，例如标题和其他多模态信息，这些信息提供了关于视频内容的弱注释或标签。为了解决大规模的噪声学习问题，我们在人类学习过程的启发下，在最先进的机器学习算法的基础上提出了一种新的方法，称为多模态WEbly-Labeled learning (WELL-MM)。WELL-MM引入了一种新颖的多模态方法来整合有意义的先验知识，即来自噪声网络视频的课程。我们从网络视频和图像的多模态特征出发，对课程构建进行了实证研究。在FCVID和YFCC100M上的综合实验结果表明，WELL-MM在从噪声网络视频数据中学习概念方面的性能优于最先进的研究。此外，结果还验证了WELL-MM对视频数据中的噪声水平具有鲁棒性。值得注意的是，与在干净的手动标记数据上训练的监督学习方法相比，在足够的噪声网络标签上训练的WELL-MM能够达到更好的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval

自引率

0.00%

发文量