Category Specific Prediction Modules for Visual Relation Recognition

International Journal of Mathematical Sciences and Computing Pub Date : 2019-04-08 DOI:10.5815/IJMSC.2019.02.02

S. A. Chowdhury, Tanbirul Hashan, A. A. Rahman, A. Saif

{"title":"Category Specific Prediction Modules for Visual Relation Recognition","authors":"S. A. Chowdhury, Tanbirul Hashan, A. A. Rahman, A. Saif","doi":"10.5815/IJMSC.2019.02.02","DOIUrl":null,"url":null,"abstract":"Object classification in an image does not provide a complete understanding of the information contained in it. Visual relation information such as “person playing with dog” provides substantially more understanding than just “person, dog”. The visual inter-relations of the objects can provide substantial insight for truly understanding the complete picture. Due to the complex nature of such combinations, conventional computer vision techniques have not been able to show significant promise. Monolithic approaches are lacking in precision and accuracy due to the vastness of possible relation combinations. Solving this problem is crucial to development of advanced computer vision applications that impact every sector of the modern world. We propose a model using recent advances in novel applications of Convolution Neural Networks (Deep Learning) combined with a divide and conquer approach to relation detection. The possible relations are broken down to categories such as spatial (left, right), vehicle-related (riding, driving), etc. Then the task is divided to segmenting the objects, estimating possible relationship category and performing recognition on modules specially built for that relation category. The training process can be done for each module on significantly smaller datasets with less computation required. Additionally this approach provides recall rates that are comparable to state of the art research, while still being precise and accurate for the specific relation categories.","PeriodicalId":312036,"journal":{"name":"International Journal of Mathematical Sciences and Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Mathematical Sciences and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/IJMSC.2019.02.02","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Object classification in an image does not provide a complete understanding of the information contained in it. Visual relation information such as “person playing with dog” provides substantially more understanding than just “person, dog”. The visual inter-relations of the objects can provide substantial insight for truly understanding the complete picture. Due to the complex nature of such combinations, conventional computer vision techniques have not been able to show significant promise. Monolithic approaches are lacking in precision and accuracy due to the vastness of possible relation combinations. Solving this problem is crucial to development of advanced computer vision applications that impact every sector of the modern world. We propose a model using recent advances in novel applications of Convolution Neural Networks (Deep Learning) combined with a divide and conquer approach to relation detection. The possible relations are broken down to categories such as spatial (left, right), vehicle-related (riding, driving), etc. Then the task is divided to segmenting the objects, estimating possible relationship category and performing recognition on modules specially built for that relation category. The training process can be done for each module on significantly smaller datasets with less computation required. Additionally this approach provides recall rates that are comparable to state of the art research, while still being precise and accurate for the specific relation categories.

查看原文本刊更多论文

用于视觉关系识别的类别特定预测模块

图像中的对象分类不能提供对图像中包含的信息的完整理解。像“人与狗玩耍”这样的视觉关系信息比“人与狗”更能让人理解。物体的视觉相互关系可以为真正理解整个画面提供实质性的洞察力。由于这种组合的复杂性，传统的计算机视觉技术还不能显示出显著的前景。由于可能存在大量的关系组合，整体方法缺乏精确性和准确性。解决这个问题对于影响现代世界各个领域的先进计算机视觉应用的发展至关重要。我们提出了一个模型，利用卷积神经网络(深度学习)的新应用的最新进展，结合分而治之的方法来进行关系检测。可能的关系被分解为空间(左、右)、车辆相关(骑行、驾驶)等类别。然后将任务分为对目标进行分割，估计可能的关系类别，并对专门为该关系类别构建的模块进行识别。每个模块的训练过程可以在更小的数据集上完成，所需的计算量更少。此外，这种方法提供了与最先进的研究相媲美的召回率，同时对于特定的关系类别仍然是精确和准确的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Mathematical Sciences and Computing

自引率

0.00%

发文量