2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)最新文献

筛选
英文 中文
GARGI: Selecting Gaze-Aware Representative Group Image from a Live Photo GARGI:从实时照片中选择具有注视意识的代表性群体图像
Omkar N. Kulkarni, Shashank Arora, P. Atrey
{"title":"GARGI: Selecting Gaze-Aware Representative Group Image from a Live Photo","authors":"Omkar N. Kulkarni, Shashank Arora, P. Atrey","doi":"10.1109/MIPR54900.2022.00027","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00027","url":null,"abstract":"The number of photos, especially group photos in live mode, has increased tremendously in today's world. Selecting a representative image in a live photo that preserves the aesthetic quality is a challenging task. In this paper, we propose a method to select a Gaze-Aware Representative Group Image, called GARGI, that considers the uni-formity, or consequently the deviation, of the people's gaze in live-mode group photos to make it aesthetically pleasing. We tested this method on our own live-mode group image dataset. We argue that the inbuilt representative im-age selection mechanism in an Apple iPhone does not con-sider the subject's gaze, especially in a group image. The GARGI considers the deviation of gazes for each subject with respect to their expected gaze directions and deter-mines an aesthetically better representative image with the least amount of gaze deviation for all the subjects. The re-sults presented in the paper also justify this claim. They can be used to pave the way for becoming a standard in any keyframe selection mechanisms that will include human subjects in live photos, burst mode shots, or even in videos.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122060632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Creative Improvised Interaction with Generative Musical Systems 创造性的即兴互动与生成音乐系统
S. Dubnov, G. Assayag, V. Gokul
{"title":"Creative Improvised Interaction with Generative Musical Systems","authors":"S. Dubnov, G. Assayag, V. Gokul","doi":"10.1109/MIPR54900.2022.00028","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00028","url":null,"abstract":"In this paper we survey the methods for control and cre-ative interaction with pre-trained generative models for au-dio and music. By using reduced (lossy) encoding and sym-bolization steps we are able to examine the level of information that is passing between the environment (the musician) and the agent (machine improvisation). We further use the concept of music information dynamics to find an optimal symbolization in terms of predictive information measure. Methods and strategies for generative models are surveyed in this paper and their implications for creative interaction with the machine are discussed in the musical improvisation framework.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129108581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rate-Adaptive Streaming of 360-Degree Videos with Head-Motion-Aware Viewport Margins 具有头部运动感知视口边缘的360度视频的速率自适应流
Mehmet N. Akcay, Burak Kara, A. Begen, Saba Ahsan, I. Curcio, Emre B. Aksu
{"title":"Rate-Adaptive Streaming of 360-Degree Videos with Head-Motion-Aware Viewport Margins","authors":"Mehmet N. Akcay, Burak Kara, A. Begen, Saba Ahsan, I. Curcio, Emre B. Aksu","doi":"10.1109/MIPR54900.2022.00056","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00056","url":null,"abstract":"Efficient use of available bandwidth is vital when streaming 360-degree videos as users rarely have enough bandwidth for a pleasant experience. A promising solution is the combination of viewport-dependent streaming using tiled video and rate adaptation, where the goal is to spend most of the available bandwidth for the viewport tiles. However, head motions resulting in a change in the viewport tiles briefly cause low-quality rendering until the new tiles can be replaced with high-quality versions. Previously, viewport margins-fixed regions around the viewport rendered at a medium quality-were proposed to make the viewport changes less abrupt. Later on, Head-motion-aware Viewport Margins (HMAVM) were implemented to further smooth the transitions at the expense of increased bandwidth consumption. In this paper, we manage the overall bandwidth cost of HMAVMs better by first developing a set of algorithms that trade off the quality of some viewport tiles and then making the margin selection part of the rate-adaptation algorithm.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127989487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Local-Global Metric Learning Method for Facial Expression Animation 人脸表情动画的局部-全局度量学习方法
Pengcheng Gao, Bin Huang, Jiayi Lyu, Haifeng Ma, Jian Xue
{"title":"A Local-Global Metric Learning Method for Facial Expression Animation","authors":"Pengcheng Gao, Bin Huang, Jiayi Lyu, Haifeng Ma, Jian Xue","doi":"10.1109/MIPR54900.2022.00046","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00046","url":null,"abstract":"Facial expression animation plays an important role in character animation. The Expression Blendshape Model (EBM) provides a simple representation of various expressions through a linear combination of base blendshapes with expression coefficients. However, it is challenging to distinguish subtle expression changes. In this paper, we propose a method that combines local features and global features to regress the expression coefficients. Furthermore, local metric leaning (LML) and global metric learning (GML) are proposed to enhance the recognizability of cross-individual expression features. Specifically, the LML increases the feature distance of each blendshape that appears or disappears from the perspective of local representation, resulting in better capture of local appearance changes, while the GML raises feature distance between neutral and emotional expression in the high dimensional feature space from the global perspective. Experimental results and feature visualizations on the FEAFA dataset show the effectiveness of local and global metric learning.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125345815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wasserstein Metric Attack on Person Re-identification Wasserstein对人再识别的度量攻击
Astha Verma, A. Subramanyam, R. Shah
{"title":"Wasserstein Metric Attack on Person Re-identification","authors":"Astha Verma, A. Subramanyam, R. Shah","doi":"10.1109/MIPR54900.2022.00049","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00049","url":null,"abstract":"Adversarial attacks in $l_{p}$ ball have been recently investi-gated against person re-identification (ReID) models. How-ever, the $l_{p}$ ball attacks disregard the geometry of the sam-ples. To this end, Wasserstein metric is a robust alternative as the attack incorporates a cost matrix for pixel mass movement. In our work, we propose the Wasserstein metric to perform adversarial attack on ReID system by projecting adversarial samples in the Wasserstein ball. We perform white-box and black-box attacks on state-of-the-art (SOTA) ReID models trained on Market-I 501, DukeMTMC-reID, and MSMTI7 datasets. The performance of best SOTA ReID models decreases drastically from 90.2% to as low as 0.4%. Our model outperforms the SOTA attack methods by 17.2% in white-box attacks and 14.4% in black-box at-tacks. To the best of our knowledge, our work is the first to propose the Wasserstein metric towards generating adversarial samples for ReID task.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121117830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Personalized Fashion Sequential Recommendation with Visual Feature Based on Conditional Hierarchical VAE 基于条件分层VAE的视觉特征个性化时装序列推荐
Keiichi Suekane, Ryoichi Osawa, Aozora Inagaki, Taiga Matsui, Tomohiro Tanabe, Keita Ishikawa, T. Takagi
{"title":"Personalized Fashion Sequential Recommendation with Visual Feature Based on Conditional Hierarchical VAE","authors":"Keiichi Suekane, Ryoichi Osawa, Aozora Inagaki, Taiga Matsui, Tomohiro Tanabe, Keita Ishikawa, T. Takagi","doi":"10.1109/MIPR54900.2022.00071","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00071","url":null,"abstract":"With the increase of online shopping services, there has been much research on fashion item recommendation. Unlike standard recommendation systems, a recommendation for fashion items needs to take into account the context of the item IDs in the user behavior and that of the fashion-specific visual features such as color and design. In this study, we propose the conditional hierarchical variational auto-encoder (CHVAE) for extracting fashion-specific visual features, and construct a fashion item recommendation system based on it. CHVAE is an extension of VAE to enable conditional and hierarchical learning. It can capture the continuous latent space of color and design using item images and labels, and extract visual features for fashion recommendations. In our experiments, we show that the proposed method outperforms an extensive list of state-of-the-art sequential recommendation models and achieves the same or better performance as human stylists.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"128 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134033277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Fast VVC Intra Coding by Skipping Redundant Coding Block Structures and Unnecessary Directional Partition 跳过冗余编码块结构和不必要的方向划分的快速VVC内部编码
Ziheng Zhang, Chang-Hong Fu, Kai Xie, Hong Hong, Guan-Ming Su
{"title":"Fast VVC Intra Coding by Skipping Redundant Coding Block Structures and Unnecessary Directional Partition","authors":"Ziheng Zhang, Chang-Hong Fu, Kai Xie, Hong Hong, Guan-Ming Su","doi":"10.1109/MIPR54900.2022.00022","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00022","url":null,"abstract":"The Joint Video Exploration Team (JVET) has released the latest video coding standard, Versatile Video Coding (H.266/VVC) in 2020. VVC adopts the quadtree with nested multi-type tree (QTMT) coding block structure which brings huge computational burden. To solve this problem, we design a fast intra coding algorithm based on skipping redundant coding block structures and unnecessary directional partition in VVC. First, we analyze the coding block partition of I frames in VVC and find that there are still some redundant partitions. Based on this, a redundant partition skipping (RPS) scheme is designed. Secondly, we aim at skipping the unnecessary directional partition (DPS) in VVC. It is inspired by the correlation between the optimal coding unit (CU) partition and other directional information such as: image texture direction, the directional intra prediction modes (IPM), intra Sub-partitions (ISP) mode. Compared with VTM-6.0, the proposed algorithms can achieve 26.46% time saving with only 0.45% BDBR increase in average.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134176531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
ExpressionHop: A Lightweight Human Facial Expression Classifier ExpressionHop:一个轻量级的人类面部表情分类器
Chengwei Wei, C. J. Kuo, R. L. Testa, Ariane Machado-Lima, Fátima L. S. Nunes
{"title":"ExpressionHop: A Lightweight Human Facial Expression Classifier","authors":"Chengwei Wei, C. J. Kuo, R. L. Testa, Ariane Machado-Lima, Fátima L. S. Nunes","doi":"10.1109/MIPR54900.2022.00042","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00042","url":null,"abstract":"A lightweight human facial expression recognition (FER) solution aiming at mobile/edge applications is proposed in this work. The solution, called ExpressionHop, consists offour modules: 1) cropping out patches based on facial landmarks, 2) applying filter banks to each patch to generate a rich set of joint spatial-spectral features, 3) conducting the discriminant feature test (DFT) to select features with higher discriminant power, and 4) performing the final classification task with a classifier. We conduct performance benchmarking on ExpressionHop, traditional and deep learning methods on several commonly used FER datasets such as JAFFE, CK + and KDEF. Experimental results show that ExpressionHop achieves comparable or better classification accuracy. Yet, its model size only has 30K parameters, which is significantly lower than those of deep learning methods.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116013112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
INTERPRETABLE LEARNING-BASED MULTI-MODAL HASHING ANALYSIS FOR MULTI-VIEW FEATURE REPRESENTATION LEARNING 基于可解释学习的多模态哈希分析,用于多视图特征表示学习
Lei Gao, L. Guan
{"title":"INTERPRETABLE LEARNING-BASED MULTI-MODAL HASHING ANALYSIS FOR MULTI-VIEW FEATURE REPRESENTATION LEARNING","authors":"Lei Gao, L. Guan","doi":"10.1109/MIPR54900.2022.00016","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00016","url":null,"abstract":"In this work, an interpretable learning-based multi-modal hashing analysis (ILMMHA) model is proposed with appli-cation to multi-view feature representation learning. In the proposed model, a cascade network structure is first utilized to reveal the intrinsically semantic representation of input variables. Then, a multi-modal hashing (MMH) method is integrated with the explored semantic representation, gener-ating an interpretable learning-based model for multi-view feature representation. Since MMH is capable of measuring semantic similarity across multiple variables jointly, it provides a natural link between the explored intrinsically semantic representation and its similarity across multi-modal data/information. Benefiting from integration of the cascade structure and MMH, the ILMMHA model leads to a new multi-view feature representation of high quality. To demonstrate the effectiveness and generic nature of the ILMMHA model, we conduct experiments on the cross-modal based audio-visual emotion and text-image recognition tasks, respectively. Experimental results demonstrate the superiority of the proposed model on multi-view feature representation learning.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115061489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Machine-Learning Based High Efficiency Rate Control for AV1 基于机器学习的AV1高效率控制
Yi Chen, Yunhao Mao, Shiqi Wang, Xianguo Zhang, S. Kwong
{"title":"Machine-Learning Based High Efficiency Rate Control for AV1","authors":"Yi Chen, Yunhao Mao, Shiqi Wang, Xianguo Zhang, S. Kwong","doi":"10.1109/MIPR54900.2022.00019","DOIUrl":"https://doi.org/10.1109/MIPR54900.2022.00019","url":null,"abstract":"Recent years have witnessed the increasing demand of video coding technologies, which have been continuously developed to meet various requirements in video-related applications. Developed by Alliance for Open Media (AOM), the AOMedia Video 1 (AVl) is an open-source and royalty-free standard. Herein, we achieve high efficiency rate control for AVI based on the machine-learning model, which establishes the rate-quantization relationship in a data-driven manner. More specifically, the Supporting Vector Regression (SVR) is used for rate model parameter estimation. The model is trained using sufficient training data, and incorporated in the encoder. Compared to the default rate control scheme in AV 1, experimental results have shown that 2.01% bitrate could be saved with tolerable bitrate error.","PeriodicalId":228640,"journal":{"name":"2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123252520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信