在现实世界中推动图像识别:迈向识别数百万实体

WISMM '14 Pub Date : 2014-11-07 DOI:10.1145/2661714.2661716
Xiansheng Hua
{"title":"在现实世界中推动图像识别:迈向识别数百万实体","authors":"Xiansheng Hua","doi":"10.1145/2661714.2661716","DOIUrl":null,"url":null,"abstract":"Building a system that can recognize \"what,\" \"who,\" and \"where\" from arbitrary images has motivated researchers in computer vision, multimedia and machine learning areas for decades. Significant progresses have been made in recently years based on distributed computation and/or deep neural networks techniques. However, it is still very challenging to realize a general purpose real world image recognition engine that has reasonable recognition accuracy, semantic coverage, and recognition speed.\n In this talk, firstly we will review the current status of this area, analyze the difficulties, and discuss the potential solutions. Then two promising schemes to attack this challenge will be introduced, including (1) learning millions of concepts from search engine click logs, and (2) recognizing whatever you want without data labeling. The first work tries to build large-scale recognition models by mining search engine click logs. Challenges in training data selection and model selection will be discussed, and efficient and scalable approaches for model training and prediction will be introduced. The second work aims at building image recognition engines for any set of entities without using any human labeled training data, which helps generalize image recognition to a wide range of semantic concepts. Automatic training data generation steps will be presented, and techniques for improving recognition accuracy, which effectively leveraging massive amount of Internet data will be discussed. Different parallelization strategies for different computation tasks will be introduced, which guarantee the efficiency and scalability of the entire system. And last, we will discuss possible directions in pushing image recognition in the real world.","PeriodicalId":365687,"journal":{"name":"WISMM '14","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pushing Image Recognition in the Real World: Towards Recognizing Millions of Entities\",\"authors\":\"Xiansheng Hua\",\"doi\":\"10.1145/2661714.2661716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Building a system that can recognize \\\"what,\\\" \\\"who,\\\" and \\\"where\\\" from arbitrary images has motivated researchers in computer vision, multimedia and machine learning areas for decades. Significant progresses have been made in recently years based on distributed computation and/or deep neural networks techniques. However, it is still very challenging to realize a general purpose real world image recognition engine that has reasonable recognition accuracy, semantic coverage, and recognition speed.\\n In this talk, firstly we will review the current status of this area, analyze the difficulties, and discuss the potential solutions. Then two promising schemes to attack this challenge will be introduced, including (1) learning millions of concepts from search engine click logs, and (2) recognizing whatever you want without data labeling. The first work tries to build large-scale recognition models by mining search engine click logs. Challenges in training data selection and model selection will be discussed, and efficient and scalable approaches for model training and prediction will be introduced. The second work aims at building image recognition engines for any set of entities without using any human labeled training data, which helps generalize image recognition to a wide range of semantic concepts. Automatic training data generation steps will be presented, and techniques for improving recognition accuracy, which effectively leveraging massive amount of Internet data will be discussed. Different parallelization strategies for different computation tasks will be introduced, which guarantee the efficiency and scalability of the entire system. And last, we will discuss possible directions in pushing image recognition in the real world.\",\"PeriodicalId\":365687,\"journal\":{\"name\":\"WISMM '14\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"WISMM '14\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2661714.2661716\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"WISMM '14","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2661714.2661716","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

几十年来,建立一个能够从任意图像中识别“什么”、“谁”和“在哪里”的系统一直激励着计算机视觉、多媒体和机器学习领域的研究人员。近年来,基于分布式计算和/或深度神经网络技术取得了重大进展。然而,实现一个具有合理的识别精度、语义覆盖和识别速度的通用现实世界图像识别引擎仍然是非常具有挑战性的。在本次演讲中,我们将首先回顾这一领域的现状,分析其中的困难,并讨论可能的解决方案。然后将介绍两种有希望的解决这一挑战的方案,包括(1)从搜索引擎点击日志中学习数百万个概念,以及(2)在没有数据标记的情况下识别您想要的任何内容。第一项工作试图通过挖掘搜索引擎点击日志来建立大规模的识别模型。将讨论训练数据选择和模型选择方面的挑战,并介绍有效和可扩展的模型训练和预测方法。第二项工作旨在为任何实体集构建图像识别引擎,而不使用任何人类标记的训练数据,这有助于将图像识别推广到广泛的语义概念。本文将介绍自动训练数据生成的步骤,并讨论如何有效利用海量互联网数据来提高识别精度的技术。针对不同的计算任务引入不同的并行化策略,保证了整个系统的效率和可扩展性。最后,我们将讨论在现实世界中推动图像识别的可能方向。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Pushing Image Recognition in the Real World: Towards Recognizing Millions of Entities
Building a system that can recognize "what," "who," and "where" from arbitrary images has motivated researchers in computer vision, multimedia and machine learning areas for decades. Significant progresses have been made in recently years based on distributed computation and/or deep neural networks techniques. However, it is still very challenging to realize a general purpose real world image recognition engine that has reasonable recognition accuracy, semantic coverage, and recognition speed. In this talk, firstly we will review the current status of this area, analyze the difficulties, and discuss the potential solutions. Then two promising schemes to attack this challenge will be introduced, including (1) learning millions of concepts from search engine click logs, and (2) recognizing whatever you want without data labeling. The first work tries to build large-scale recognition models by mining search engine click logs. Challenges in training data selection and model selection will be discussed, and efficient and scalable approaches for model training and prediction will be introduced. The second work aims at building image recognition engines for any set of entities without using any human labeled training data, which helps generalize image recognition to a wide range of semantic concepts. Automatic training data generation steps will be presented, and techniques for improving recognition accuracy, which effectively leveraging massive amount of Internet data will be discussed. Different parallelization strategies for different computation tasks will be introduced, which guarantee the efficiency and scalability of the entire system. And last, we will discuss possible directions in pushing image recognition in the real world.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信