M. Gurunath Reddy, Zhe Zhang, Yi Yu, Florian Harscoet, Simon Canales, Suhua Tang
{"title":"Deep Attention-Based Alignment Network for Melody Generation from Incomplete Lyrics","authors":"M. Gurunath Reddy, Zhe Zhang, Yi Yu, Florian Harscoet, Simon Canales, Suhua Tang","doi":"10.1109/ISM55400.2022.00052","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00052","url":null,"abstract":"We propose a deep attention-based alignment network, which aims to automatically predict lyrics and melody with given incomplete lyrics as input in a way similar to the music creation of humans. Most importantly, a deep neural lyrics-to-melody net is trained in an encoder-decoder way to predict possible pairs of lyrics-melody when given incomplete lyrics (few keywords). The attention mechanism is exploited to align the predicted lyrics with the melody during the lyrics-to-melody generation. The qualitative and quantitative evaluation metrics reveal that the proposed method is indeed capable of generating proper lyrics and corresponding melody for composing new songs given a piece of incomplete seed lyrics.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125587557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kashyap Kammachi Sreedhar, M. Hannuksela, Emre B. Aksu, Lauri Ilola, Lukasz Condrad
{"title":"Optimizing storage and delivery of Omnidirectional Videos in Viewport-dependent streaming","authors":"Kashyap Kammachi Sreedhar, M. Hannuksela, Emre B. Aksu, Lauri Ilola, Lukasz Condrad","doi":"10.1109/ISM55400.2022.00039","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00039","url":null,"abstract":"The OMAF standard makes use of a framework called the viewport-dependent-delivery for the streaming of 360-degree videos. OMAF uses ISOBMFF for storage and MPEG-DASH as one of the delivery mechanisms. In viewport-dependent-streaming videos are spatially divided and encoded into multiple tracks and each track is further segmented for DASH delivery. Segmentation requires additional metadata which adds to bitrate overhead. The main contributor to this overhead is the track fragment run in a box with the four-character code, ‘trun’. The TRUN records the following information of each sample in a track: the size, duration, flags, and time offsets and uses a fixed byte size to record this information. To minimize the bitrate overhead of TRUN, four different representation algorithms have been explored. This paper briefly describes the four TRUN representations and discusses the benefits and drawbacks of each algorithm. For evaluation, the algorithms were implemented in the MP4BOX module of the GPAC suite. The results were evaluated for different segment durations (500ms, 1s, 2s, 4s), different tiling grids (8x4, 9x6), two videos (bip-bop, countertiles) with different packaging techniques (no encryption, encryption of Keyframes, encryption of all frames) The algorithms reduced the bitrate overhead by 59% on average as compared to the original TRUN representation.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124789817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of Sampling Algorithms for a Pairwise Subjective Assessment Methodology","authors":"Shima Mohammadi, J. Ascenso","doi":"10.1109/ISM55400.2022.10040647","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.10040647","url":null,"abstract":"Subjective assessment tests are often employed to evaluate image processing systems, notably image and video compression, super-resolution among others and have been used as an indisputable way to provide evidence of the performance of an algorithm or system. While several methodologies can be used in a subjective quality assessment test, pairwise comparison tests are nowadays attracting a lot of attention due to their accuracy and simplicity. However, the number of comparisons in a pairwise comparison test increases quadratically with the number of stimuli and thus often leads to very long tests, which is impractical for many cases. However, not all the pairs contribute equally to the final score and thus, it is possible to reduce the number of comparisons without degrading the final accuracy. To do so, pairwise sampling methods are often used to select the pairs which provide more information about the quality of each stimuli. In this paper, a reliable and much-needed evaluation procedure is proposed and used for already available methods in the literature, especially considering the case of subjective evaluation of image and video codecs. The results indicate that an appropriate selection of the pairs allows to achieve very reliable scores while requiring the comparison of a much lower number of pairs.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121204671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. M. Rinaldi, Cristiano Russo, Cristian Tommasino
{"title":"Effects of Color Stain Normalization in Histopathology Image Retrieval using Deep Learning","authors":"A. M. Rinaldi, Cristiano Russo, Cristian Tommasino","doi":"10.1109/ISM55400.2022.00010","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00010","url":null,"abstract":"In the last decade, many digital slides have been available in the pathological field thanks to the spreading of new technologies for computerized acquisition. Often hardware and software tools and devices are different among biomedical analysis centers; consequently, the digital slides do not have the same representation using different colorization, exposition, contrast, brightness, and other distortions. Many computer vision algorithms are sensitive to these differences, and, in specific tasks such as image retrieval, color stain normalization can be a helpful technique to mitigate this misunderstanding. In this paper, we explored the effects of color stain normalization in the patches based on Hematoxylin and Eosin (H&E) image retrieval to measure how and how much it impacts the accuracy of this task providing an exhaustive analysis employing a standard dataset.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117337016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robust Depth Estimation in Foggy Environments Combining RGB Images and mmWave Radar","authors":"Mengchen Xiong, Xiao Xu, D. Yang, E. Steinbach","doi":"10.1109/ISM55400.2022.00011","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00011","url":null,"abstract":"In this paper, we propose a robust depth estimation strategy that uses RGB images and mmWave radar data to deal with limited visibility in foggy environments. While the state-of-the-art RGB or LiDAR-based depth estimation works well in scenarios with good visibility, their performance dramatically degrades in the presence of fog. In contrast, mmWave radar sensors are not affected by fog and hence are a promising complement. To leverage this property of mmWave radar, we combine RGB image-based depth estimation with radar information. The proposed combination is an extension of the Sparse-to-Dense (S2D) model. Moreover, a weight-based sensor fusion strategy is presented to improve system performance. Our experiments show that a fog density of meteorological optical range (MOR) less than 50m leads to strongly degraded performance for RGB image-based and LiDAR-based depth estimation. For a MOR of 30m in our dataset, the experiments show an improvement of 26% in mean square error for our proposed approach compared to the combination of RGB images and LiDAR data.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116680836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Roundwood Tracking from the Forest to the Sawmill using filter approaches to highlight the annual ring pattern","authors":"Georg Wimmer, R. Schraml, A. Uhl, A. Petutschnigg","doi":"10.1109/ISM55400.2022.00056","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00056","url":null,"abstract":"The proof of origin of wood logs is becoming more and more important. In the context of Industry 4.0 and to combat illegal logging, there is an increased interest to track each individual log. In order to track roundwood from the forest to the sawmill, this work applies log recognition based on log end images from 100 logs that were captured first in the forest and later at the sawmill. The log images are segmented from the background, then preprocessed using a novel filtering approach and features are extracted using two CNN-based methods. In this work we show that using filtering approaches that improve the visibility of the annual ring pattern and suppress unwanted image information like the saw cut pattern clearly improve the recognition results.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122658304","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaonan Qi, Lihua Tian, Chen Li, Hui Song, Jiahui Yan
{"title":"Singing Melody Extraction Based on Combined Frequency-Temporal Attention and Attentional Feature Fusion with Self-Attention","authors":"Xiaonan Qi, Lihua Tian, Chen Li, Hui Song, Jiahui Yan","doi":"10.1109/ISM55400.2022.00050","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00050","url":null,"abstract":"The main melody extraction of polyphonic music is a challenging task for music information retrieval. Traditional convolutional neural networks, recurrent neural networks have effectively improved this task. In recent years, with the development of attention mechanism in neural networks, the frequency and time attention information of audio has been fully exploited, and the amplitude properties of audio can also be better integrated with a good fusion module. This paper improves the frequency-temporal attention based on others’ prior work. By extracting the attention information with the frequency-temporal attention and performing additive fusion of features, the combined frequency-temporal attention is obtained. Then we apply attentional feature fusion based on multi-scale channel attention, and finally the temporal dependencies are learned through the self-attention module. Our experimental results on four datasets demonstrate that our model outperforms existing models.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"100 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133236180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cise Midoglu, A. Storås, S. Sabet, Malek Hammou, S. Hicks, Inga Strümke, M. Riegler, C. Griwodz, P. Halvorsen
{"title":"Experiences and Lessons Learned from a Crowdsourced-Remote Hybrid User Survey Framework","authors":"Cise Midoglu, A. Storås, S. Sabet, Malek Hammou, S. Hicks, Inga Strümke, M. Riegler, C. Griwodz, P. Halvorsen","doi":"10.1109/ISM55400.2022.00035","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00035","url":null,"abstract":"Subjective user studies are important to ensure the fidelity and usability of systems that generate multimedia content. Testing how end-users and domain experts perceive multimedia assets might provide crucial information. In this paper, we present our experiences with the open source hybrid crowdsourced-remote user survey framework called Huldra, which is intended for conducting web-based subjective user studies and aims to integrate the individual benefits associated with traditional, crowdsourced, and remote methods. We disseminate our experiences and insights from two actively deployed use cases and discuss challenges and opportunities associated with using Huldra as a framework for conducting user studies.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115779028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Florian Schniederjann, Darius Rausch, Jens Wiggenbrock, R. Mertens
{"title":"Teardrop Magnification: A Hybrid Linear-Fisheye Magnifier for the Border and Corner of the Screen","authors":"Florian Schniederjann, Darius Rausch, Jens Wiggenbrock, R. Mertens","doi":"10.1109/ISM55400.2022.00017","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00017","url":null,"abstract":"Eye tracking based interfaces have to solve two major problems: The midas-touch problem and accuracy. While the midas-touch problem can be tackled with innovative interaction concepts, accuracy problems result from human physiology, hardware limitations and increasing screen resolutions. The Multi Modal Interaction Concept for Efficient Input (M2ice) tries to tackle these problems with a hybrid linear-fisheye magnifier. This magnifier works in the center of the screen but is problematic on the borders/corners of the screen, as it only appears as a half or quarter circle, limiting the space of the enlarged area. This paper presents a novel teardrop magnification approach for border and corner cases that solves these problems by using a different shape and by shifting the user's focus, making more of the magnified area available for interaction.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122683993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Actor-Critic Bilateral Filter for Noise-Robust Image Smoothing","authors":"Yi-Jie Chen, Yen-Chiao Wang, Bo-Hao Chen, Hsiang-Yin Cheng, Jia-Li Yin","doi":"10.1109/ISM55400.2022.00061","DOIUrl":"https://doi.org/10.1109/ISM55400.2022.00061","url":null,"abstract":"Bilateral filters have been used for achieving excellent edge-preserving image smoothing. However, most studies have focused on the acceleration of bilateral filtering but not on the stability of filtering process in regard to small perturbations to its inputs. In this paper, we propose a novel actor–critic bilateral filter trained with a multistep learning scheme for high-stability edge-preserving image smoothing. We first designed an edge-preserving smoothing process as a Markov decision process that involves adjusting the width setting for the range kernel of a bilateral filter. Next, we trained our actor–critic bilateral filter in a multistep manner to learn the optimal sequence of width settings. Through extensive experiments on five benchmark datasets, we determined that the proposed actor–critic bilateral filter produced satisfactory edge-preserving smoothing results.","PeriodicalId":112060,"journal":{"name":"2022 IEEE International Symposium on Multimedia (ISM)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130236562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}