2022 IEEE International Symposium on Multimedia (ISM)最新文献

筛选
英文 中文
Deep Attention-Based Alignment Network for Melody Generation from Incomplete Lyrics 基于深度注意力的不完整歌词旋律生成对齐网络
2022 IEEE International Symposium on Multimedia (ISM) Pub Date : 2022-12-01 DOI: 10.1109/ISM55400.2022.00052
M. Gurunath Reddy, Zhe Zhang, Yi Yu, Florian Harscoet, Simon Canales, Suhua Tang
{"title":"Deep Attention-Based Alignment Network for Melody Generation from Incomplete Lyrics","authors":"M. Gurunath Reddy, Zhe Zhang, Yi Yu, Florian Harscoet, Simon Canales, Suhua Tang","doi":"10.1109/ISM55400.2022.00052","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00052","url":null,"abstract":"We propose a deep attention-based alignment network, which aims to automatically predict lyrics and melody with given incomplete lyrics as input in a way similar to the music creation of humans. Most importantly, a deep neural lyrics-to-melody net is trained in an encoder-decoder way to predict possible pairs of lyrics-melody when given incomplete lyrics (few keywords). The attention mechanism is exploited to align the predicted lyrics with the melody during the lyrics-to-melody generation. The qualitative and quantitative evaluation metrics reveal that the proposed method is indeed capable of generating proper lyrics and corresponding melody for composing new songs given a piece of incomplete seed lyrics.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125587557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing storage and delivery of Omnidirectional Videos in Viewport-dependent streaming 在依赖视口的流媒体中优化全向视频的存储和交付
2022 IEEE International Symposium on Multimedia (ISM) Pub Date : 2022-12-01 DOI: 10.1109/ISM55400.2022.00039
Kashyap Kammachi Sreedhar, M. Hannuksela, Emre B. Aksu, Lauri Ilola, Lukasz Condrad
{"title":"Optimizing storage and delivery of Omnidirectional Videos in Viewport-dependent streaming","authors":"Kashyap Kammachi Sreedhar, M. Hannuksela, Emre B. Aksu, Lauri Ilola, Lukasz Condrad","doi":"10.1109/ISM55400.2022.00039","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00039","url":null,"abstract":"The OMAF standard makes use of a framework called the viewport-dependent-delivery for the streaming of 360-degree videos. OMAF uses ISOBMFF for storage and MPEG-DASH as one of the delivery mechanisms. In viewport-dependent-streaming videos are spatially divided and encoded into multiple tracks and each track is further segmented for DASH delivery. Segmentation requires additional metadata which adds to bitrate overhead. The main contributor to this overhead is the track fragment run in a box with the four-character code, ‘trun’. The TRUN records the following information of each sample in a track: the size, duration, flags, and time offsets and uses a fixed byte size to record this information. To minimize the bitrate overhead of TRUN, four different representation algorithms have been explored. This paper briefly describes the four TRUN representations and discusses the benefits and drawbacks of each algorithm. For evaluation, the algorithms were implemented in the MP4BOX module of the GPAC suite. The results were evaluated for different segment durations (500ms, 1s, 2s, 4s), different tiling grids (8x4, 9x6), two videos (bip-bop, countertiles) with different packaging techniques (no encryption, encryption of Keyframes, encryption of all frames) The algorithms reduced the bitrate overhead by 59% on average as compared to the original TRUN representation.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124789817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Evaluation of Sampling Algorithms for a Pairwise Subjective Assessment Methodology 两两主观评价方法的抽样算法评价
2022 IEEE International Symposium on Multimedia (ISM) Pub Date : 2022-12-01 DOI: 10.1109/ISM55400.2022.10040647
Shima Mohammadi, J. Ascenso
{"title":"Evaluation of Sampling Algorithms for a Pairwise Subjective Assessment Methodology","authors":"Shima Mohammadi, J. Ascenso","doi":"10.1109/ISM55400.2022.10040647","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.10040647","url":null,"abstract":"Subjective assessment tests are often employed to evaluate image processing systems, notably image and video compression, super-resolution among others and have been used as an indisputable way to provide evidence of the performance of an algorithm or system. While several methodologies can be used in a subjective quality assessment test, pairwise comparison tests are nowadays attracting a lot of attention due to their accuracy and simplicity. However, the number of comparisons in a pairwise comparison test increases quadratically with the number of stimuli and thus often leads to very long tests, which is impractical for many cases. However, not all the pairs contribute equally to the final score and thus, it is possible to reduce the number of comparisons without degrading the final accuracy. To do so, pairwise sampling methods are often used to select the pairs which provide more information about the quality of each stimuli. In this paper, a reliable and much-needed evaluation procedure is proposed and used for already available methods in the literature, especially considering the case of subjective evaluation of image and video codecs. The results indicate that an appropriate selection of the pairs allows to achieve very reliable scores while requiring the comparison of a much lower number of pairs.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121204671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Effects of Color Stain Normalization in Histopathology Image Retrieval using Deep Learning 颜色染色归一化在深度学习组织病理学图像检索中的作用
2022 IEEE International Symposium on Multimedia (ISM) Pub Date : 2022-12-01 DOI: 10.1109/ISM55400.2022.00010
A. M. Rinaldi, Cristiano Russo, Cristian Tommasino
{"title":"Effects of Color Stain Normalization in Histopathology Image Retrieval using Deep Learning","authors":"A. M. Rinaldi, Cristiano Russo, Cristian Tommasino","doi":"10.1109/ISM55400.2022.00010","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00010","url":null,"abstract":"In the last decade, many digital slides have been available in the pathological field thanks to the spreading of new technologies for computerized acquisition. Often hardware and software tools and devices are different among biomedical analysis centers; consequently, the digital slides do not have the same representation using different colorization, exposition, contrast, brightness, and other distortions. Many computer vision algorithms are sensitive to these differences, and, in specific tasks such as image retrieval, color stain normalization can be a helpful technique to mitigate this misunderstanding. In this paper, we explored the effects of color stain normalization in the patches based on Hematoxylin and Eosin (H&E) image retrieval to measure how and how much it impacts the accuracy of this task providing an exhaustive analysis employing a standard dataset.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117337016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Depth Estimation in Foggy Environments Combining RGB Images and mmWave Radar 结合RGB图像和毫米波雷达的雾环境鲁棒深度估计
2022 IEEE International Symposium on Multimedia (ISM) Pub Date : 2022-12-01 DOI: 10.1109/ISM55400.2022.00011
Mengchen Xiong, Xiao Xu, D. Yang, E. Steinbach
{"title":"Robust Depth Estimation in Foggy Environments Combining RGB Images and mmWave Radar","authors":"Mengchen Xiong, Xiao Xu, D. Yang, E. Steinbach","doi":"10.1109/ISM55400.2022.00011","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00011","url":null,"abstract":"In this paper, we propose a robust depth estimation strategy that uses RGB images and mmWave radar data to deal with limited visibility in foggy environments. While the state-of-the-art RGB or LiDAR-based depth estimation works well in scenarios with good visibility, their performance dramatically degrades in the presence of fog. In contrast, mmWave radar sensors are not affected by fog and hence are a promising complement. To leverage this property of mmWave radar, we combine RGB image-based depth estimation with radar information. The proposed combination is an extension of the Sparse-to-Dense (S2D) model. Moreover, a weight-based sensor fusion strategy is presented to improve system performance. Our experiments show that a fog density of meteorological optical range (MOR) less than 50m leads to strongly degraded performance for RGB image-based and LiDAR-based depth estimation. For a MOR of 30m in our dataset, the experiments show an improvement of 26% in mean square error for our proposed approach compared to the combination of RGB images and LiDAR data.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116680836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Roundwood Tracking from the Forest to the Sawmill using filter approaches to highlight the annual ring pattern 圆木跟踪从森林到锯木厂使用过滤器的方法来突出年轮模式
2022 IEEE International Symposium on Multimedia (ISM) Pub Date : 2022-12-01 DOI: 10.1109/ISM55400.2022.00056
Georg Wimmer, R. Schraml, A. Uhl, A. Petutschnigg
{"title":"Roundwood Tracking from the Forest to the Sawmill using filter approaches to highlight the annual ring pattern","authors":"Georg Wimmer, R. Schraml, A. Uhl, A. Petutschnigg","doi":"10.1109/ISM55400.2022.00056","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00056","url":null,"abstract":"The proof of origin of wood logs is becoming more and more important. In the context of Industry 4.0 and to combat illegal logging, there is an increased interest to track each individual log. In order to track roundwood from the forest to the sawmill, this work applies log recognition based on log end images from 100 logs that were captured first in the forest and later at the sawmill. The log images are segmented from the background, then preprocessed using a novel filtering approach and features are extracted using two CNN-based methods. In this work we show that using filtering approaches that improve the visibility of the annual ring pattern and suppress unwanted image information like the saw cut pattern clearly improve the recognition results.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122658304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Singing Melody Extraction Based on Combined Frequency-Temporal Attention and Attentional Feature Fusion with Self-Attention 基于频率-时间联合注意和自注意特征融合的歌唱旋律提取
2022 IEEE International Symposium on Multimedia (ISM) Pub Date : 2022-12-01 DOI: 10.1109/ISM55400.2022.00050
Xiaonan Qi, Lihua Tian, Chen Li, Hui Song, Jiahui Yan
{"title":"Singing Melody Extraction Based on Combined Frequency-Temporal Attention and Attentional Feature Fusion with Self-Attention","authors":"Xiaonan Qi, Lihua Tian, Chen Li, Hui Song, Jiahui Yan","doi":"10.1109/ISM55400.2022.00050","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00050","url":null,"abstract":"The main melody extraction of polyphonic music is a challenging task for music information retrieval. Traditional convolutional neural networks, recurrent neural networks have effectively improved this task. In recent years, with the development of attention mechanism in neural networks, the frequency and time attention information of audio has been fully exploited, and the amplitude properties of audio can also be better integrated with a good fusion module. This paper improves the frequency-temporal attention based on others’ prior work. By extracting the attention information with the frequency-temporal attention and performing additive fusion of features, the combined frequency-temporal attention is obtained. Then we apply attentional feature fusion based on multi-scale channel attention, and finally the temporal dependencies are learned through the self-attention module. Our experimental results on four datasets demonstrate that our model outperforms existing models.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"100 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133236180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Experiences and Lessons Learned from a Crowdsourced-Remote Hybrid User Survey Framework 从众包-远程混合用户调查框架中获得的经验教训
2022 IEEE International Symposium on Multimedia (ISM) Pub Date : 2022-12-01 DOI: 10.1109/ISM55400.2022.00035
Cise Midoglu, A. Storås, S. Sabet, Malek Hammou, S. Hicks, Inga Strümke, M. Riegler, C. Griwodz, P. Halvorsen
{"title":"Experiences and Lessons Learned from a Crowdsourced-Remote Hybrid User Survey Framework","authors":"Cise Midoglu, A. Storås, S. Sabet, Malek Hammou, S. Hicks, Inga Strümke, M. Riegler, C. Griwodz, P. Halvorsen","doi":"10.1109/ISM55400.2022.00035","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00035","url":null,"abstract":"Subjective user studies are important to ensure the fidelity and usability of systems that generate multimedia content. Testing how end-users and domain experts perceive multimedia assets might provide crucial information. In this paper, we present our experiences with the open source hybrid crowdsourced-remote user survey framework called Huldra, which is intended for conducting web-based subjective user studies and aims to integrate the individual benefits associated with traditional, crowdsourced, and remote methods. We disseminate our experiences and insights from two actively deployed use cases and discuss challenges and opportunities associated with using Huldra as a framework for conducting user studies.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115779028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Teardrop Magnification: A Hybrid Linear-Fisheye Magnifier for the Border and Corner of the Screen 泪滴放大:一个混合线性鱼眼放大镜的边界和角落的屏幕
2022 IEEE International Symposium on Multimedia (ISM) Pub Date : 2022-12-01 DOI: 10.1109/ISM55400.2022.00017
Florian Schniederjann, Darius Rausch, Jens Wiggenbrock, R. Mertens
{"title":"Teardrop Magnification: A Hybrid Linear-Fisheye Magnifier for the Border and Corner of the Screen","authors":"Florian Schniederjann, Darius Rausch, Jens Wiggenbrock, R. Mertens","doi":"10.1109/ISM55400.2022.00017","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00017","url":null,"abstract":"Eye tracking based interfaces have to solve two major problems: The midas-touch problem and accuracy. While the midas-touch problem can be tackled with innovative interaction concepts, accuracy problems result from human physiology, hardware limitations and increasing screen resolutions. The Multi Modal Interaction Concept for Efficient Input (M2ice) tries to tackle these problems with a hybrid linear-fisheye magnifier. This magnifier works in the center of the screen but is problematic on the borders/corners of the screen, as it only appears as a half or quarter circle, limiting the space of the enlarged area. This paper presents a novel teardrop magnification approach for border and corner cases that solves these problems by using a different shape and by shifting the user's focus, making more of the magnified area available for interaction.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122683993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Actor-Critic Bilateral Filter for Noise-Robust Image Smoothing 用于噪声鲁棒图像平滑的actor - critical双边滤波器
2022 IEEE International Symposium on Multimedia (ISM) Pub Date : 2022-12-01 DOI: 10.1109/ISM55400.2022.00061
Yi-Jie Chen, Yen-Chiao Wang, Bo-Hao Chen, Hsiang-Yin Cheng, Jia-Li Yin
{"title":"Actor-Critic Bilateral Filter for Noise-Robust Image Smoothing","authors":"Yi-Jie Chen, Yen-Chiao Wang, Bo-Hao Chen, Hsiang-Yin Cheng, Jia-Li Yin","doi":"10.1109/ISM55400.2022.00061","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00061","url":null,"abstract":"Bilateral filters have been used for achieving excellent edge-preserving image smoothing. However, most studies have focused on the acceleration of bilateral filtering but not on the stability of filtering process in regard to small perturbations to its inputs. In this paper, we propose a novel actor–critic bilateral filter trained with a multistep learning scheme for high-stability edge-preserving image smoothing. We first designed an edge-preserving smoothing process as a Markov decision process that involves adjusting the width setting for the range kernel of a bilateral filter. Next, we trained our actor–critic bilateral filter in a multistep manner to learn the optimal sequence of width settings. Through extensive experiments on five benchmark datasets, we determined that the proposed actor–critic bilateral filter produced satisfactory edge-preserving smoothing results.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130236562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信