S. A. Chowdhury, Tanbirul Hashan, A. A. Rahman, A. Saif
{"title":"Category Specific Prediction Modules for Visual Relation Recognition","authors":"S. A. Chowdhury, Tanbirul Hashan, A. A. Rahman, A. Saif","doi":"10.5815/IJMSC.2019.02.02","DOIUrl":null,"url":null,"abstract":"Object classification in an image does not provide a complete understanding of the information contained in it. Visual relation information such as “person playing with dog” provides substantially more understanding than just “person, dog”. The visual inter-relations of the objects can provide substantial insight for truly understanding the complete picture. Due to the complex nature of such combinations, conventional computer vision techniques have not been able to show significant promise. Monolithic approaches are lacking in precision and accuracy due to the vastness of possible relation combinations. Solving this problem is crucial to development of advanced computer vision applications that impact every sector of the modern world. We propose a model using recent advances in novel applications of Convolution Neural Networks (Deep Learning) combined with a divide and conquer approach to relation detection. The possible relations are broken down to categories such as spatial (left, right), vehicle-related (riding, driving), etc. Then the task is divided to segmenting the objects, estimating possible relationship category and performing recognition on modules specially built for that relation category. The training process can be done for each module on significantly smaller datasets with less computation required. Additionally this approach provides recall rates that are comparable to state of the art research, while still being precise and accurate for the specific relation categories.","PeriodicalId":312036,"journal":{"name":"International Journal of Mathematical Sciences and Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Mathematical Sciences and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/IJMSC.2019.02.02","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Object classification in an image does not provide a complete understanding of the information contained in it. Visual relation information such as “person playing with dog” provides substantially more understanding than just “person, dog”. The visual inter-relations of the objects can provide substantial insight for truly understanding the complete picture. Due to the complex nature of such combinations, conventional computer vision techniques have not been able to show significant promise. Monolithic approaches are lacking in precision and accuracy due to the vastness of possible relation combinations. Solving this problem is crucial to development of advanced computer vision applications that impact every sector of the modern world. We propose a model using recent advances in novel applications of Convolution Neural Networks (Deep Learning) combined with a divide and conquer approach to relation detection. The possible relations are broken down to categories such as spatial (left, right), vehicle-related (riding, driving), etc. Then the task is divided to segmenting the objects, estimating possible relationship category and performing recognition on modules specially built for that relation category. The training process can be done for each module on significantly smaller datasets with less computation required. Additionally this approach provides recall rates that are comparable to state of the art research, while still being precise and accurate for the specific relation categories.