{"title":"A local representation-enhanced recurrent convolutional network for image captioning","authors":"Xiaoyi Wang, Jun Huang","doi":"10.1007/s13735-022-00231-y","DOIUrl":"https://doi.org/10.1007/s13735-022-00231-y","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"30 1","pages":"149 - 157"},"PeriodicalIF":5.6,"publicationDate":"2022-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78893857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stefanos Papadopoulos, C. Koutlis, S. Papadopoulos, Y. Kompatsiaris
{"title":"Multimodal Quasi-AutoRegression: forecasting the visual popularity of new fashion products","authors":"Stefanos Papadopoulos, C. Koutlis, S. Papadopoulos, Y. Kompatsiaris","doi":"10.48550/arXiv.2204.04014","DOIUrl":"https://doi.org/10.48550/arXiv.2204.04014","url":null,"abstract":"Estimating the preferences of consumers is of utmost importance for the fashion industry as appropriately leveraging this information can be beneficial in terms of profit. Trend detection in fashion is a challenging task due to the fast pace of change in the fashion industry. Moreover, forecasting the visual popularity of new garment designs is even more demanding due to lack of historical data. To this end, we propose MuQAR, a Multimodal Quasi-AutoRegressive deep learning architecture that combines two modules: (1) a multimodal multilayer perceptron processing categorical, visual and textual features of the product and (2) a Quasi-AutoRegressive neural network modelling the “target” time series of the product’s attributes along with the “exogenous” time series of all other attributes. We utilize computer vision, image classification and image captioning, for automatically extracting visual features and textual descriptions from the images of new products. Product design in fashion is initially expressed visually and these features represent the products’ unique characteristics without interfering with the creative process of its designers by requiring additional inputs (e.g. manually written texts). We employ the product’s target attributes time series as a proxy of temporal popularity patterns, mitigating the lack of historical data, while exogenous time series help capture trends among interrelated attributes. We perform an extensive ablation analysis on two large-scale image fashion datasets, Mallzee-P and SHIFT15m to assess the adequacy of MuQAR and also use the Amazon Reviews: Home and Kitchen dataset to assess generalization to other domains. A comparative study on the VISUELLE dataset shows that MuQAR is capable of competing and surpassing the domain’s current state of the art by 4.65% and 4.8% in terms of WAPE and MAE, respectively.","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"61 1","pages":"717-729"},"PeriodicalIF":5.6,"publicationDate":"2022-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84560561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Junayed, Md Baharul Islam, H. Imani, Tarkan Aydin
{"title":"PDS-Net: A novel point and depth-wise separable convolution for real-time object detection","authors":"M. Junayed, Md Baharul Islam, H. Imani, Tarkan Aydin","doi":"10.1007/s13735-022-00229-6","DOIUrl":"https://doi.org/10.1007/s13735-022-00229-6","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"8 1","pages":"171 - 188"},"PeriodicalIF":5.6,"publicationDate":"2022-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76468263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Few2Decide: towards a robust model via using few neuron connections to decide","authors":"Jian Li, Yanming Guo, Songyang Lao, Xiang Zhao, Liang Bai, Haoran Wang","doi":"10.1007/s13735-021-00223-4","DOIUrl":"https://doi.org/10.1007/s13735-021-00223-4","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"45 1","pages":"189 - 198"},"PeriodicalIF":5.6,"publicationDate":"2022-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79015916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Farhad Bulbul, S. Islam, Zannatul Azme, Preksha Pareek, Md. Humaun Kabir, Hazrat Ali
{"title":"Enhancing the performance of 3D auto-correlation gradient features in depth action classification","authors":"Mohammad Farhad Bulbul, S. Islam, Zannatul Azme, Preksha Pareek, Md. Humaun Kabir, Hazrat Ali","doi":"10.1007/s13735-021-00226-1","DOIUrl":"https://doi.org/10.1007/s13735-021-00226-1","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"360 1","pages":"61 - 76"},"PeriodicalIF":5.6,"publicationDate":"2022-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78106335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmed Iqbal, Muhammad Sharif, Mussarat Yasmin, Mudassar Raza, Shabib Aftab
{"title":"Generative adversarial networks and its applications in the biomedical image segmentation: a comprehensive survey.","authors":"Ahmed Iqbal, Muhammad Sharif, Mussarat Yasmin, Mudassar Raza, Shabib Aftab","doi":"10.1007/s13735-022-00240-x","DOIUrl":"https://doi.org/10.1007/s13735-022-00240-x","url":null,"abstract":"<p><p>Recent advancements with deep generative models have proven significant potential in the task of image synthesis, detection, segmentation, and classification. Segmenting the medical images is considered a primary challenge in the biomedical imaging field. There have been various GANs-based models proposed in the literature to resolve medical segmentation challenges. Our research outcome has identified 151 papers; after the twofold screening, 138 papers are selected for the final survey. A comprehensive survey is conducted on GANs network application to medical image segmentation, primarily focused on various GANs-based models, performance metrics, loss function, datasets, augmentation methods, paper implementation, and source codes. Secondly, this paper provides a detailed overview of GANs network application in different human diseases segmentation. We conclude our research with critical discussion, limitations of GANs, and suggestions for future directions. We hope this survey is beneficial and increases awareness of GANs network implementations for biomedical image segmentation tasks.</p>","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"11 3","pages":"333-368"},"PeriodicalIF":5.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10253310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A review on deep learning in medical image analysis.","authors":"S Suganyadevi, V Seethalakshmi, K Balasamy","doi":"10.1007/s13735-021-00218-1","DOIUrl":"https://doi.org/10.1007/s13735-021-00218-1","url":null,"abstract":"<p><p>Ongoing improvements in AI, particularly concerning deep learning techniques, are assisting to identify, classify, and quantify patterns in clinical images. Deep learning is the quickest developing field in artificial intelligence and is effectively utilized lately in numerous areas, including medication. A brief outline is given on studies carried out on the region of application: neuro, brain, retinal, pneumonic, computerized pathology, bosom, heart, breast, bone, stomach, and musculoskeletal. For information exploration, knowledge deployment, and knowledge-based prediction, deep learning networks can be successfully applied to big data. In the field of medical image processing methods and analysis, fundamental information and state-of-the-art approaches with deep learning are presented in this paper. The primary goals of this paper are to present research on medical image processing as well as to define and implement the key guidelines that are identified and addressed.</p>","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"11 1","pages":"19-38"},"PeriodicalIF":5.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8417661/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39409372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Silvan Heller, Viktor Gsteiger, Werner Bailer, Cathal Gurrin, Björn Þór Jónsson, Jakub Lokoč, Andreas Leibetseder, František Mejzlík, Ladislav Peška, Luca Rossetto, Konstantin Schall, Klaus Schoeffmann, Heiko Schuldt, Florian Spiess, Ly-Duyen Tran, Lucia Vadicamo, Patrik Veselý, Stefanos Vrochidis, Jiaxin Wu
{"title":"Interactive video retrieval evaluation at a distance: comparing sixteen interactive video search systems in a remote setting at the 10th Video Browser Showdown.","authors":"Silvan Heller, Viktor Gsteiger, Werner Bailer, Cathal Gurrin, Björn Þór Jónsson, Jakub Lokoč, Andreas Leibetseder, František Mejzlík, Ladislav Peška, Luca Rossetto, Konstantin Schall, Klaus Schoeffmann, Heiko Schuldt, Florian Spiess, Ly-Duyen Tran, Lucia Vadicamo, Patrik Veselý, Stefanos Vrochidis, Jiaxin Wu","doi":"10.1007/s13735-021-00225-2","DOIUrl":"10.1007/s13735-021-00225-2","url":null,"abstract":"<p><p>The Video Browser Showdown addresses difficult video search challenges through an annual interactive evaluation campaign attracting research teams focusing on interactive video retrieval. The campaign aims to provide insights into the performance of participating interactive video retrieval systems, tested by selected search tasks on large video collections. For the first time in its ten year history, the Video Browser Showdown 2021 was organized in a fully remote setting and hosted a record number of sixteen scoring systems. In this paper, we describe the competition setting, tasks and results and give an overview of state-of-the-art methods used by the competing systems. By looking at query result logs provided by ten systems, we analyze differences in retrieval model performances and browsing times before a correct submission. Through advances in data gathering methodology and tools, we provide a comprehensive analysis of ad-hoc video search tasks, discuss results, task design and methodological challenges. We highlight that almost all top performing systems utilize some sort of joint embedding for text-image retrieval and enable specification of temporal context in queries for known-item search. Whereas a combination of these techniques drive the currently top performing systems, we identify several future challenges for interactive video search engines and the Video Browser Showdown competition itself.</p>","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"11 1","pages":"1-18"},"PeriodicalIF":3.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8791088/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39872573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A fast and robust affine-invariant method for shape registration under partial occlusion","authors":"Sinda Elghoul, F. Ghorbel","doi":"10.1007/s13735-021-00224-3","DOIUrl":"https://doi.org/10.1007/s13735-021-00224-3","url":null,"abstract":"","PeriodicalId":48501,"journal":{"name":"International Journal of Multimedia Information Retrieval","volume":"136 1","pages":"39 - 59"},"PeriodicalIF":5.6,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75080508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}