Enhancing Multimedia Imbalanced Concept Detection Using VIMP in Random Forests.

Proceedings of the ... IEEE International Conference on Information Reuse and Integration. IEEE International Conference on Information Reuse and Integration Pub Date : 2016-07-01 Epub Date: 2016-12-19 DOI:10.1109/IRI.2016.87

Saad Sadiq, Yilin Yan, Mei-Ling Shyu, Shu-Ching Chen, Hemant Ishwaran

{"title":"Enhancing Multimedia Imbalanced Concept Detection Using VIMP in Random Forests.","authors":"Saad Sadiq, Yilin Yan, Mei-Ling Shyu, Shu-Ching Chen, Hemant Ishwaran","doi":"10.1109/IRI.2016.87","DOIUrl":null,"url":null,"abstract":"<p><p>Recent developments in social media and cloud storage lead to an exponential growth in the amount of multimedia data, which increases the complexity of managing, storing, indexing, and retrieving information from such big data. Many current content-based concept detection approaches lag from successfully bridging the semantic gap. To solve this problem, a multi-stage random forest framework is proposed to generate predictor variables based on multivariate regressions using variable importance (VIMP). By fine tuning the forests and significantly reducing the predictor variables, the concept detection scores are evaluated when the concept of interest is rare and imbalanced, i.e., having little collaboration with other high level concepts. Using classical multivariate statistics, estimating the value of one coordinate using other coordinates standardizes the covariates and it depends upon the variance of the correlations instead of the mean. Thus, conditional dependence on the data being normally distributed is eliminated. Experimental results demonstrate that the proposed framework outperforms those approaches in the comparison in terms of the Mean Average Precision (MAP) values.</p>","PeriodicalId":89460,"journal":{"name":"Proceedings of the ... IEEE International Conference on Information Reuse and Integration. IEEE International Conference on Information Reuse and Integration","volume":"2016 ","pages":"601-608"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/IRI.2016.87","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... IEEE International Conference on Information Reuse and Integration. IEEE International Conference on Information Reuse and Integration","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2016.87","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2016/12/19 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Recent developments in social media and cloud storage lead to an exponential growth in the amount of multimedia data, which increases the complexity of managing, storing, indexing, and retrieving information from such big data. Many current content-based concept detection approaches lag from successfully bridging the semantic gap. To solve this problem, a multi-stage random forest framework is proposed to generate predictor variables based on multivariate regressions using variable importance (VIMP). By fine tuning the forests and significantly reducing the predictor variables, the concept detection scores are evaluated when the concept of interest is rare and imbalanced, i.e., having little collaboration with other high level concepts. Using classical multivariate statistics, estimating the value of one coordinate using other coordinates standardizes the covariates and it depends upon the variance of the correlations instead of the mean. Thus, conditional dependence on the data being normally distributed is eliminated. Experimental results demonstrate that the proposed framework outperforms those approaches in the comparison in terms of the Mean Average Precision (MAP) values.

Abstract Image

查看原文本刊更多论文

随机森林中VIMP增强多媒体不平衡概念检测。

社交媒体和云存储的最新发展导致多媒体数据量呈指数级增长，这增加了管理、存储、索引和从这些大数据中检索信息的复杂性。目前许多基于内容的概念检测方法在成功弥合语义差距方面存在滞后。为了解决这一问题，提出了一种基于变量重要性(VIMP)的多变量回归的多阶段随机森林框架来生成预测变量。通过微调森林并显著减少预测变量，当感兴趣的概念很少且不平衡时，即与其他高级概念很少协作时，评估概念检测分数。使用经典的多变量统计，使用其他坐标估计一个坐标的值使协变量标准化，它取决于相关的方差而不是平均值。因此，消除了对正态分布数据的条件依赖。实验结果表明，该框架在平均精度(MAP)值方面优于其他方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ... IEEE International Conference on Information Reuse and Integration. IEEE International Conference on Information Reuse and Integration

自引率

0.00%

发文量