International Journal of Multimedia Information Retrieval最新文献_第7页

A local representation-enhanced recurrent convolutional network for image captioning 图像标注的局部表示增强递归卷积网络

IF 5.6 3区计算机科学

International Journal of Multimedia Information Retrieval Pub Date : 2022-04-12 DOI: 10.1007/s13735-022-00231-y

Xiaoyi Wang, Jun Huang

引用次数: 0

Multimodal Quasi-AutoRegression: forecasting the visual popularity of new fashion products 多模态准自回归:预测新时尚产品的视觉流行度

IF 5.6 3区计算机科学

International Journal of Multimedia Information Retrieval Pub Date : 2022-04-08 DOI: 10.48550/arXiv.2204.04014

Stefanos Papadopoulos, C. Koutlis, S. Papadopoulos, Y. Kompatsiaris

{"title":"Multimodal Quasi-AutoRegression: forecasting the visual popularity of new fashion products","authors":"Stefanos Papadopoulos, C. Koutlis, S. Papadopoulos, Y. Kompatsiaris","doi":"10.48550/arXiv.2204.04014","DOIUrl":"https://doi.org/10.48550/arXiv.2204.04014","url":null,"abstract":"Estimating the preferences of consumers is of utmost importance for the fashion industry as appropriately leveraging this information can be beneficial in terms of profit. Trend detection in fashion is a challenging task due to the fast pace of change in the fashion industry. Moreover, forecasting the visual popularity of new garment designs is even more demanding due to lack of historical data. To this end, we propose MuQAR, a Multimodal Quasi-AutoRegressive deep learning architecture that combines two modules: (1) a multimodal multilayer perceptron processing categorical, visual and textual features of the product and (2) a Quasi-AutoRegressive neural network modelling the “target” time series of the product’s attributes along with the “exogenous” time series of all other attributes. We utilize computer vision, image classification and image captioning, for automatically extracting visual features and textual descriptions from the images of new products. Product design in fashion is initially expressed visually and these features represent the products’ unique characteristics without interfering with the creative process of its designers by requiring additional inputs (e.g. manually written texts). We employ the product’s target attributes time series as a proxy of temporal popularity patterns, mitigating the lack of historical data, while exogenous time series help capture trends among interrelated attributes. We perform an extensive ablation analysis on two large-scale image fashion datasets, Mallzee-P and SHIFT15m to assess the adequacy of MuQAR and also use the Amazon Reviews: Home and Kitchen dataset to assess generalization to other domains. A comparative study on the VISUELLE dataset shows that MuQAR is capable of competing and surpassing the domain’s current state of the art by 4.65% and 4.8% in terms of WAPE and MAE, respectively.","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"61 1","pages":"717-729"},"PeriodicalIF":5.6,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84560561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

PDS-Net: A novel point and depth-wise separable convolution for real-time object detection PDS-Net:一种用于实时目标检测的新颖的点和深度可分离卷积

IF 5.6 3区计算机科学

International Journal of Multimedia Information Retrieval Pub Date : 2022-03-24 DOI: 10.1007/s13735-022-00229-6

M. Junayed, Md Baharul Islam, H. Imani, Tarkan Aydin

引用次数: 3

Caption TLSTMs: combining transformer with LSTMs for image captioning tlstm:结合变压器和lstm进行图像字幕

IF 5.6 3区计算机科学

International Journal of Multimedia Information Retrieval Pub Date : 2022-03-23 DOI: 10.1007/s13735-022-00228-7

Jie Yan, Yuxiang Xie, Xidao Luan, Yanming Guo, Quanzhi Gong, Suru Feng

引用次数: 4

Few2Decide: towards a robust model via using few neuron connections to decide Few2Decide:通过使用较少的神经元连接来决定一个鲁棒模型

IF 5.6 3区计算机科学

International Journal of Multimedia Information Retrieval Pub Date : 2022-01-30 DOI: 10.1007/s13735-021-00223-4

Jian Li, Yanming Guo, Songyang Lao, Xiang Zhao, Liang Bai, Haoran Wang

引用次数: 1

Enhancing the performance of 3D auto-correlation gradient features in depth action classification 增强三维自相关梯度特征在深度动作分类中的性能

IF 5.6 3区计算机科学

International Journal of Multimedia Information Retrieval Pub Date : 2022-01-16 DOI: 10.1007/s13735-021-00226-1

Mohammad Farhad Bulbul, S. Islam, Zannatul Azme, Preksha Pareek, Md. Humaun Kabir, Hazrat Ali

引用次数: 1

Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey. 生成对抗网络及其在生物医学图像分割中的应用综述。

IF 5.6 3区计算机科学

International Journal of Multimedia Information Retrieval Pub Date : 2022-01-01 DOI: 10.1007/s13735-022-00240-x

Ahmed Iqbal, Muhammad Sharif, Mussarat Yasmin, Mudassar Raza, Shabib Aftab

{"title":"Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey.","authors":"Ahmed Iqbal, Muhammad Sharif, Mussarat Yasmin, Mudassar Raza, Shabib Aftab","doi":"10.1007/s13735-022-00240-x","DOIUrl":"https://doi.org/10.1007/s13735-022-00240-x","url":null,"abstract":"Recent advancements with deep generative models have proven significant potential in the task of image synthesis, detection, segmentation, and classification. Segmenting the medical images is considered a primary challenge in the biomedical imaging field. There have been various GANs-based models proposed in the literature to resolve medical segmentation challenges. Our research outcome has identified 151 papers; after the twofold screening, 138 papers are selected for the final survey. A comprehensive survey is conducted on GANs network application to medical image segmentation, primarily focused on various GANs-based models, performance metrics, loss function, datasets, augmentation methods, paper implementation, and source codes. Secondly, this paper provides a detailed overview of GANs network application in different human diseases segmentation. We conclude our research with critical discussion, limitations of GANs, and suggestions for future directions. We hope this survey is beneficial and increases awareness of GANs network implementations for biomedical image segmentation tasks.","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"11 3","pages":"333-368"},"PeriodicalIF":5.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10253310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

A review on deep learning in medical image analysis. 深度学习在医学图像分析中的研究进展。

IF 5.6 3区计算机科学

International Journal of Multimedia Information Retrieval Pub Date : 2022-01-01 Epub Date: 2021-09-04 DOI: 10.1007/s13735-021-00218-1

S Suganyadevi, V Seethalakshmi, K Balasamy

引用次数: 77

A unified approach of detecting misleading images via tracing its instances on web and analyzing its past context for the verification of multimedia content. 一种统一的方法，通过跟踪其在网络上的实例和分析其过去的上下文来检测误导图像，以验证多媒体内容。

IF 5.6 3区计算机科学

International Journal of Multimedia Information Retrieval Pub Date : 2022-01-01 Epub Date: 2022-07-11 DOI: 10.1007/s13735-022-00235-8

Deepika Varshney, Dinesh Kumar Vishwakarma

{"title":"A unified approach of detecting misleading images via tracing its instances on web and analyzing its past context for the verification of multimedia content.","authors":"Deepika Varshney, Dinesh Kumar Vishwakarma","doi":"10.1007/s13735-022-00235-8","DOIUrl":"https://doi.org/10.1007/s13735-022-00235-8","url":null,"abstract":"The verification of multimedia content over social media is one of the challenging and crucial issues in the current scenario and gaining prominence in an age where user-generated content and online social web-platforms are the leading sources in shaping and propagating news stories. As these sources allow users to share their opinions without restriction, opportunistic users often post misleading/unreliable content on social media such as Twitter, Facebook, etc. At present, to lure users toward the news story, the text is often attached with some multimedia content (images/videos/audios). Verifying these contents to maintain the credibility and reliability of social media information is of paramount importance. Motivated by this, we proposed a generalized system that supports the automatic classification of images into credible or misleading. In this paper, we investigated machine learning-based as well as deep learning-based approaches utilized to verify misleading multimedia content, where the available image traces are used to identify the credibility of the content. The experiment is performed on the real-world dataset (Media-eval-2015 dataset) collected from Twitter. It also demonstrates the efficiency of our proposed approach and features using both Machine and Deep Learning Model (Bi-directional LSTM). The experiment result reveals that the Microsoft BING image search engine is quite effective in retrieving titles and performs better than our study's Google image search engine. It also shows that gathering clues from attached multimedia content (image) is more effective than detecting only posted content-based features.","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":" ","pages":"445-459"},"PeriodicalIF":5.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9272873/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40601801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown. 远程交互式视频检索评估：在第 10 届视频浏览器对决中比较远程环境下的 16 个交互式视频搜索系统。

IF 3.6 3区计算机科学

International Journal of Multimedia Information Retrieval Pub Date : 2022-01-01 Epub Date: 2022-01-26 DOI: 10.1007/s13735-021-00225-2

Silvan Heller, Viktor Gsteiger, Werner Bailer, Cathal Gurrin, Björn Þór Jónsson, Jakub Lokoč, Andreas Leibetseder, František Mejzlík, Ladislav Peška, Luca Rossetto, Konstantin Schall, Klaus Schoeffmann, Heiko Schuldt, Florian Spiess, Ly-Duyen Tran, Lucia Vadicamo, Patrik Veselý, Stefanos Vrochidis, Jiaxin Wu

{"title":"Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown.","authors":"Silvan Heller, Viktor Gsteiger, Werner Bailer, Cathal Gurrin, Björn Þór Jónsson, Jakub Lokoč, Andreas Leibetseder, František Mejzlík, Ladislav Peška, Luca Rossetto, Konstantin Schall, Klaus Schoeffmann, Heiko Schuldt, Florian Spiess, Ly-Duyen Tran, Lucia Vadicamo, Patrik Veselý, Stefanos Vrochidis, Jiaxin Wu","doi":"10.1007/s13735-021-00225-2","DOIUrl":"10.1007/s13735-021-00225-2","url":null,"abstract":"The Video Browser Showdown addresses difficult video search challenges through an annual interactive evaluation campaign attracting research teams focusing on interactive video retrieval. The campaign aims to provide insights into the performance of participating interactive video retrieval systems, tested by selected search tasks on large video collections. For the first time in its ten year history, the Video Browser Showdown 2021 was organized in a fully remote setting and hosted a record number of sixteen scoring systems. In this paper, we describe the competition setting, tasks and results and give an overview of state-of-the-art methods used by the competing systems. By looking at query result logs provided by ten systems, we analyze differences in retrieval model performances and browsing times before a correct submission. Through advances in data gathering methodology and tools, we provide a comprehensive analysis of ad-hoc video search tasks, discuss results, task design and methodological challenges. We highlight that almost all top performing systems utilize some sort of joint embedding for text-image retrieval and enable specification of temporal context in queries for known-item search. Whereas a combination of these techniques drive the currently top performing systems, we identify several future challenges for interactive video search engines and the Video Browser Showdown competition itself.","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"11 1","pages":"1-18"},"PeriodicalIF":3.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8791088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39872573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0