International Journal of Multimedia Information Retrieval最新文献

筛选
英文 中文
A local representation-enhanced recurrent convolutional network for image captioning 图像标注的局部表示增强递归卷积网络
IF 5.6 3区 计算机科学
International Journal of Multimedia Information Retrieval Pub Date : 2022-04-12 DOI: 10.1007/s13735-022-00231-y
Xiaoyi Wang, Jun Huang
{"title":"A local representation-enhanced recurrent convolutional network for image captioning","authors":"Xiaoyi Wang, Jun Huang","doi":"10.1007/s13735-022-00231-y","DOIUrl":"https://doi.org/10.1007/s13735-022-00231-y","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"30 1","pages":"149 - 157"},"PeriodicalIF":5.6,"publicationDate":"2022-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78893857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal Quasi-AutoRegression: forecasting the visual popularity of new fashion products 多模态准自回归:预测新时尚产品的视觉流行度
IF 5.6 3区 计算机科学
International Journal of Multimedia Information Retrieval Pub Date : 2022-04-08 DOI: 10.48550/arXiv.2204.04014
Stefanos Papadopoulos, C. Koutlis, S. Papadopoulos, Y. Kompatsiaris
{"title":"Multimodal Quasi-AutoRegression: forecasting the visual popularity of new fashion products","authors":"Stefanos Papadopoulos, C. Koutlis, S. Papadopoulos, Y. Kompatsiaris","doi":"10.48550/arXiv.2204.04014","DOIUrl":"https://doi.org/10.48550/arXiv.2204.04014","url":null,"abstract":"Estimating the preferences of consumers is of utmost importance for the fashion industry as appropriately leveraging this information can be beneficial in terms of profit. Trend detection in fashion is a challenging task due to the fast pace of change in the fashion industry. Moreover, forecasting the visual popularity of new garment designs is even more demanding due to lack of historical data. To this end, we propose MuQAR, a Multimodal Quasi-AutoRegressive deep learning architecture that combines two modules: (1) a multimodal multilayer perceptron processing categorical, visual and textual features of the product and (2) a Quasi-AutoRegressive neural network modelling the “target” time series of the product’s attributes along with the “exogenous” time series of all other attributes. We utilize computer vision, image classification and image captioning, for automatically extracting visual features and textual descriptions from the images of new products. Product design in fashion is initially expressed visually and these features represent the products’ unique characteristics without interfering with the creative process of its designers by requiring additional inputs (e.g. manually written texts). We employ the product’s target attributes time series as a proxy of temporal popularity patterns, mitigating the lack of historical data, while exogenous time series help capture trends among interrelated attributes. We perform an extensive ablation analysis on two large-scale image fashion datasets, Mallzee-P and SHIFT15m to assess the adequacy of MuQAR and also use the Amazon Reviews: Home and Kitchen dataset to assess generalization to other domains. A comparative study on the VISUELLE dataset shows that MuQAR is capable of competing and surpassing the domain’s current state of the art by 4.65% and 4.8% in terms of WAPE and MAE, respectively.","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"61 1","pages":"717-729"},"PeriodicalIF":5.6,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84560561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
PDS-Net: A novel point and depth-wise separable convolution for real-time object detection PDS-Net:一种用于实时目标检测的新颖的点和深度可分离卷积
IF 5.6 3区 计算机科学
International Journal of Multimedia Information Retrieval Pub Date : 2022-03-24 DOI: 10.1007/s13735-022-00229-6
M. Junayed, Md Baharul Islam, H. Imani, Tarkan Aydin
{"title":"PDS-Net: A novel point and depth-wise separable convolution for real-time object detection","authors":"M. Junayed, Md Baharul Islam, H. Imani, Tarkan Aydin","doi":"10.1007/s13735-022-00229-6","DOIUrl":"https://doi.org/10.1007/s13735-022-00229-6","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"8 1","pages":"171 - 188"},"PeriodicalIF":5.6,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76468263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Caption TLSTMs: combining transformer with LSTMs for image captioning tlstm:结合变压器和lstm进行图像字幕
IF 5.6 3区 计算机科学
International Journal of Multimedia Information Retrieval Pub Date : 2022-03-23 DOI: 10.1007/s13735-022-00228-7
Jie Yan, Yuxiang Xie, Xidao Luan, Yanming Guo, Quanzhi Gong, Suru Feng
{"title":"Caption TLSTMs: combining transformer with LSTMs for image captioning","authors":"Jie Yan, Yuxiang Xie, Xidao Luan, Yanming Guo, Quanzhi Gong, Suru Feng","doi":"10.1007/s13735-022-00228-7","DOIUrl":"https://doi.org/10.1007/s13735-022-00228-7","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"9 1","pages":"111 - 121"},"PeriodicalIF":5.6,"publicationDate":"2022-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79134772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Few2Decide: towards a robust model via using few neuron connections to decide Few2Decide:通过使用较少的神经元连接来决定一个鲁棒模型
IF 5.6 3区 计算机科学
International Journal of Multimedia Information Retrieval Pub Date : 2022-01-30 DOI: 10.1007/s13735-021-00223-4
Jian Li, Yanming Guo, Songyang Lao, Xiang Zhao, Liang Bai, Haoran Wang
{"title":"Few2Decide: towards a robust model via using few neuron connections to decide","authors":"Jian Li, Yanming Guo, Songyang Lao, Xiang Zhao, Liang Bai, Haoran Wang","doi":"10.1007/s13735-021-00223-4","DOIUrl":"https://doi.org/10.1007/s13735-021-00223-4","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"45 1","pages":"189 - 198"},"PeriodicalIF":5.6,"publicationDate":"2022-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79015916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Enhancing the performance of 3D auto-correlation gradient features in depth action classification 增强三维自相关梯度特征在深度动作分类中的性能
IF 5.6 3区 计算机科学
International Journal of Multimedia Information Retrieval Pub Date : 2022-01-16 DOI: 10.1007/s13735-021-00226-1
Mohammad Farhad Bulbul, S. Islam, Zannatul Azme, Preksha Pareek, Md. Humaun Kabir, Hazrat Ali
{"title":"Enhancing the performance of 3D auto-correlation gradient features in depth action classification","authors":"Mohammad Farhad Bulbul, S. Islam, Zannatul Azme, Preksha Pareek, Md. Humaun Kabir, Hazrat Ali","doi":"10.1007/s13735-021-00226-1","DOIUrl":"https://doi.org/10.1007/s13735-021-00226-1","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"360 1","pages":"61 - 76"},"PeriodicalIF":5.6,"publicationDate":"2022-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78106335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey. 生成对抗网络及其在生物医学图像分割中的应用综述。
IF 5.6 3区 计算机科学
International Journal of Multimedia Information Retrieval Pub Date : 2022-01-01 DOI: 10.1007/s13735-022-00240-x
Ahmed Iqbal, Muhammad Sharif, Mussarat Yasmin, Mudassar Raza, Shabib Aftab
{"title":"Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey.","authors":"Ahmed Iqbal,&nbsp;Muhammad Sharif,&nbsp;Mussarat Yasmin,&nbsp;Mudassar Raza,&nbsp;Shabib Aftab","doi":"10.1007/s13735-022-00240-x","DOIUrl":"https://doi.org/10.1007/s13735-022-00240-x","url":null,"abstract":"<p><p>Recent advancements with deep generative models have proven significant potential in the task of image synthesis, detection, segmentation, and classification. Segmenting the medical images is considered a primary challenge in the biomedical imaging field. There have been various GANs-based models proposed in the literature to resolve medical segmentation challenges. Our research outcome has identified 151 papers; after the twofold screening, 138 papers are selected for the final survey. A comprehensive survey is conducted on GANs network application to medical image segmentation, primarily focused on various GANs-based models, performance metrics, loss function, datasets, augmentation methods, paper implementation, and source codes. Secondly, this paper provides a detailed overview of GANs network application in different human diseases segmentation. We conclude our research with critical discussion, limitations of GANs, and suggestions for future directions. We hope this survey is beneficial and increases awareness of GANs network implementations for biomedical image segmentation tasks.</p>","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"11 3","pages":"333-368"},"PeriodicalIF":5.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10253310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A review on deep learning in medical image analysis. 深度学习在医学图像分析中的研究进展。
IF 5.6 3区 计算机科学
International Journal of Multimedia Information Retrieval Pub Date : 2022-01-01 Epub Date: 2021-09-04 DOI: 10.1007/s13735-021-00218-1
S Suganyadevi, V Seethalakshmi, K Balasamy
{"title":"A review on deep learning in medical image analysis.","authors":"S Suganyadevi,&nbsp;V Seethalakshmi,&nbsp;K Balasamy","doi":"10.1007/s13735-021-00218-1","DOIUrl":"https://doi.org/10.1007/s13735-021-00218-1","url":null,"abstract":"<p><p>Ongoing improvements in AI, particularly concerning deep learning techniques, are assisting to identify, classify, and quantify patterns in clinical images. Deep learning is the quickest developing field in artificial intelligence and is effectively utilized lately in numerous areas, including medication. A brief outline is given on studies carried out on the region of application: neuro, brain, retinal, pneumonic, computerized pathology, bosom, heart, breast, bone, stomach, and musculoskeletal. For information exploration, knowledge deployment, and knowledge-based prediction, deep learning networks can be successfully applied to big data. In the field of medical image processing methods and analysis, fundamental information and state-of-the-art approaches with deep learning are presented in this paper. The primary goals of this paper are to present research on medical image processing as well as to define and implement the key guidelines that are identified and addressed.</p>","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"11 1","pages":"19-38"},"PeriodicalIF":5.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8417661/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39409372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 77
Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown. 远程交互式视频检索评估:在第 10 届视频浏览器对决中比较远程环境下的 16 个交互式视频搜索系统。
IF 3.6 3区 计算机科学
International Journal of Multimedia Information Retrieval Pub Date : 2022-01-01 Epub Date: 2022-01-26 DOI: 10.1007/s13735-021-00225-2
Silvan Heller, Viktor Gsteiger, Werner Bailer, Cathal Gurrin, Björn Þór Jónsson, Jakub Lokoč, Andreas Leibetseder, František Mejzlík, Ladislav Peška, Luca Rossetto, Konstantin Schall, Klaus Schoeffmann, Heiko Schuldt, Florian Spiess, Ly-Duyen Tran, Lucia Vadicamo, Patrik Veselý, Stefanos Vrochidis, Jiaxin Wu
{"title":"Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown.","authors":"Silvan Heller, Viktor Gsteiger, Werner Bailer, Cathal Gurrin, Björn Þór Jónsson, Jakub Lokoč, Andreas Leibetseder, František Mejzlík, Ladislav Peška, Luca Rossetto, Konstantin Schall, Klaus Schoeffmann, Heiko Schuldt, Florian Spiess, Ly-Duyen Tran, Lucia Vadicamo, Patrik Veselý, Stefanos Vrochidis, Jiaxin Wu","doi":"10.1007/s13735-021-00225-2","DOIUrl":"10.1007/s13735-021-00225-2","url":null,"abstract":"<p><p>The Video Browser Showdown addresses difficult video search challenges through an annual interactive evaluation campaign attracting research teams focusing on interactive video retrieval. The campaign aims to provide insights into the performance of participating interactive video retrieval systems, tested by selected search tasks on large video collections. For the first time in its ten year history, the Video Browser Showdown 2021 was organized in a fully remote setting and hosted a record number of sixteen scoring systems. In this paper, we describe the competition setting, tasks and results and give an overview of state-of-the-art methods used by the competing systems. By looking at query result logs provided by ten systems, we analyze differences in retrieval model performances and browsing times before a correct submission. Through advances in data gathering methodology and tools, we provide a comprehensive analysis of ad-hoc video search tasks, discuss results, task design and methodological challenges. We highlight that almost all top performing systems utilize some sort of joint embedding for text-image retrieval and enable specification of temporal context in queries for known-item search. Whereas a combination of these techniques drive the currently top performing systems, we identify several future challenges for interactive video search engines and the Video Browser Showdown competition itself.</p>","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"11 1","pages":"1-18"},"PeriodicalIF":3.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8791088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39872573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A fast and robust affine-invariant method for shape registration under partial occlusion 一种快速鲁棒的局部遮挡下形状配准的仿射不变方法
IF 5.6 3区 计算机科学
International Journal of Multimedia Information Retrieval Pub Date : 2021-11-30 DOI: 10.1007/s13735-021-00224-3
Sinda Elghoul, F. Ghorbel
{"title":"A fast and robust affine-invariant method for shape registration under partial occlusion","authors":"Sinda Elghoul, F. Ghorbel","doi":"10.1007/s13735-021-00224-3","DOIUrl":"https://doi.org/10.1007/s13735-021-00224-3","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"136 1","pages":"39 - 59"},"PeriodicalIF":5.6,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75080508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信