{"title":"Assessment framework for deepfake detection in real-world situations","authors":"Yuhang Lu, Touradj Ebrahimi","doi":"10.1186/s13640-024-00621-8","DOIUrl":"https://doi.org/10.1186/s13640-024-00621-8","url":null,"abstract":"<p>Detecting digital face manipulation in images and video has attracted extensive attention due to the potential risk to public trust. To counteract the malicious usage of such techniques, deep learning-based deepfake detection methods have been employed and have exhibited remarkable performance. However, the performance of such detectors is often assessed on related benchmarks that hardly reflect real-world situations. For example, the impact of various image and video processing operations and typical workflow distortions on detection accuracy has not been systematically measured. In this paper, a more reliable assessment framework is proposed to evaluate the performance of learning-based deepfake detectors in more realistic settings. To the best of our acknowledgment, it is the first systematic assessment approach for deepfake detectors that not only reports the general performance under real-world conditions but also quantitatively measures their robustness toward different processing operations. To demonstrate the effectiveness and usage of the framework, extensive experiments and detailed analysis of four popular deepfake detection methods are further presented in this paper. In addition, a stochastic degradation-based data augmentation method driven by realistic processing operations is designed, which significantly improves the robustness of deepfake detectors.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"46 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139763809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anthony Bua, Goodluck Kapyela, Libe Massawe, Baraka Maiseli
{"title":"Edge-aware nonlinear diffusion-driven regularization model for despeckling synthetic aperture radar images","authors":"Anthony Bua, Goodluck Kapyela, Libe Massawe, Baraka Maiseli","doi":"10.1186/s13640-023-00617-w","DOIUrl":"https://doi.org/10.1186/s13640-023-00617-w","url":null,"abstract":"<p>Speckle noise corrupts synthetic aperture radar (SAR) images and limits their applications in sensitive scientific and engineering fields. This challenge has attracted several scholars because of the wide demand of SAR images in forestry, oceanography, geology, glaciology, and topography. Despite some significant efforts to address the challenge, an open-ended research question remains to simultaneously suppress speckle noise and to restore semantic features in SAR images. Therefore, this work establishes a diffusion-driven nonlinear method with edge-awareness capabilities to restore corrupted SAR images while protecting critical image features, such as contours and textures. The proposed method incorporates two terms that promote effective noise removal: (1) high-order diffusion kernel; and (2) fractional regularization term that is sensitive to speckle noise. These terms have been carefully designed to ensure that the restored SAR images contain stronger edges and well-preserved textures. Empirical results show that the proposed model produces content-rich images with higher subjective and objective values. Furthermore, our model generates images with unnoticeable staircase and block artifacts, which are commonly found in the classical Perona–Malik and Total variation models.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"104 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal few-shot classification without attribute embedding","authors":"Jun Qing Chang, Deepu Rajan, Nicholas Vun","doi":"10.1186/s13640-024-00620-9","DOIUrl":"https://doi.org/10.1186/s13640-024-00620-9","url":null,"abstract":"<p>Multimodal few-shot learning aims to exploit complementary information inherent in multiple modalities for vision tasks in low data scenarios. Most of the current research focuses on a suitable embedding space for the various modalities. While solutions based on embedding provide state-of-the-art results, they reduce the interpretability of the model. Separate visualization approaches enable the models to become more transparent. In this paper, a multimodal few-shot learning framework that is inherently interpretable is presented. This is achieved by using the textual modality in the form of attributes without embedding them. This enables the model to directly explain which attributes caused it to classify an image into a particular class. The model consists of a variational autoencoder to learn the visual latent representation, which is combined with a semantic latent representation that is learnt from a normal autoencoder, which calculates a semantic loss between the latent representation and a binary attribute vector. A decoder reconstructs the original image from concatenated latent vectors. The proposed model outperforms other multimodal methods when all test classes are used, e.g., 50 classes in a 50-way 1-shot setting, and is comparable for lesser number of ways. Since raw text attributes are used, the datasets for evaluation are CUB, SUN and AWA2. The effectiveness of interpretability provided by the model is evaluated by analyzing how well it has learnt to identify the attributes.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"4 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Farouk Abduh Kamil Al-Fahaidy, Radwan AL-Bouthigy, Mohammad Yahya H. Al-Shamri, Safwan Abdulkareem
{"title":"Secure image transmission through LTE wireless communications systems","authors":"Farouk Abduh Kamil Al-Fahaidy, Radwan AL-Bouthigy, Mohammad Yahya H. Al-Shamri, Safwan Abdulkareem","doi":"10.1186/s13640-024-00619-2","DOIUrl":"https://doi.org/10.1186/s13640-024-00619-2","url":null,"abstract":"<p>Secure transmission of images over wireless communications systems can be done using RSA, the most known and efficient cryptographic algorithm, and OFDMA, the most preferred signal processing choice in wireless communications. This paper aims to investigate the performance of OFDMA system for wireless transmission of RSA-based encrypted images. In fact, the performance of OFDMA systems; based on different signal processing techniques, such as, discrete sine transforms (DST) and discrete cosine transforms (DCT), as well as the conventional discrete Fourier transforms (DFT) are tested for wireless transmission of gray-scale images with/without RSA encryption. The progress of transmitting the image is carried by firstly, encrypting the image with RSA algorithm. Then, the encrypted image is modulated with DFT-based, DCT-based, and DST-based OFDMA systems. After that, the modulated images are transmitted over a wireless multipath fading channel. The reverse operations will be carried at the receiver, in addition to the frequency domain equalization to overcome the channel effect. Exhaustive numbers of scenarios are performed for study and investigation of the performance of the different OFDMA systems in terms of PSNR and MSE, with different subcarriers mapping and modulation techniques, is done. Results indicate that the ability of different OFDMA systems for wireless secure transmission of images. However, the DCT-OFDMA system showed superiority over the DST-OFDMA and the conventional DFT-OFDMA systems.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"14 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139422457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lobna M. Abouelmagd, Mahmoud Y. Shams, Hanaa Salem Marie, Aboul Ella Hassanien
{"title":"An optimized capsule neural networks for tomato leaf disease classification","authors":"Lobna M. Abouelmagd, Mahmoud Y. Shams, Hanaa Salem Marie, Aboul Ella Hassanien","doi":"10.1186/s13640-023-00618-9","DOIUrl":"https://doi.org/10.1186/s13640-023-00618-9","url":null,"abstract":"<p>Plant diseases have a significant impact on leaves, with each disease exhibiting specific spots characterized by unique colors and locations. Therefore, it is crucial to develop a method for detecting these diseases based on spot shape, color, and location within the leaves. While Convolutional Neural Networks (CNNs) have been widely used in deep learning applications, they suffer from limitations in capturing relative spatial and orientation relationships. This paper presents a computer vision methodology that utilizes an optimized capsule neural network (CapsNet) to detect and classify ten tomato leaf diseases using standard dataset images. To mitigate overfitting, data augmentation, and preprocessing techniques were employed during the training phase. CapsNet was chosen over CNNs due to its superior ability to capture spatial positioning within the image. The proposed CapsNet approach achieved an accuracy of 96.39% with minimal loss, relying on a 0.00001 Adam optimizer. By comparing the results with existing state-of-the-art approaches, the study demonstrates the effectiveness of CapsNet in accurately identifying and classifying tomato leaf diseases based on spot shape, color, and location. The findings highlight the potential of CapsNet as an alternative to CNNs for improving disease detection and classification in plant pathology research.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"29 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139396774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaofeng Lu, Xuan Wang, Zhengyang Wang, Xinhong Hei
{"title":"Multi-layer features template update object tracking algorithm based on SiamFC++","authors":"Xiaofeng Lu, Xuan Wang, Zhengyang Wang, Xinhong Hei","doi":"10.1186/s13640-023-00616-x","DOIUrl":"https://doi.org/10.1186/s13640-023-00616-x","url":null,"abstract":"<p>SiamFC++ only extracts the object feature of the first frame as a tracking template, and only uses the highest level feature maps in both the classification branch and the regression branch, so that the respective characteristics of the two branches are not fully utilized. In view of this, the present paper proposes an object tracking algorithm based on SiamFC++. The algorithm uses the multi-layer features of the Siamese network to update template. First, FPN is used to extract feature maps from different layers of Backbone for classification branch and regression branch. Second, 3D convolution is used to update the tracking template of the object tracking algorithm. Next, a template update judgment condition is proposed based on mutual information. Finally, AlexNet is used as the backbone and GOT-10K as training set. Compared with SiamFC++, our algorithm obtains improved results on OTB100, VOT2016, VOT2018 and GOT-10k data sets, and the tracking process is real time.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"9 1","pages":""},"PeriodicalIF":2.4,"publicationDate":"2024-01-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139092803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Subjective performance evaluation of bitrate allocation strategies for MPEG and JPEG Pleno point cloud compression.","authors":"Davi Lazzarotto, Michela Testolina, Touradj Ebrahimi","doi":"10.1186/s13640-024-00629-0","DOIUrl":"10.1186/s13640-024-00629-0","url":null,"abstract":"<p><p>The recent rise in interest in point clouds as an imaging modality has motivated standardization groups such as JPEG and MPEG to launch activities aiming at developing compression standards for point clouds. Lossy compression usually introduces visual artifacts that negatively impact the perceived quality of media, which can only be reliably measured through subjective visual quality assessment experiments. While MPEG standards have been subjectively evaluated in previous studies on multiple occasions, no work has yet assessed the performance of the recent JPEG Pleno standard in comparison to them. In this study, a comprehensive performance evaluation of JPEG and MPEG standards for point cloud compression is conducted. The impact of different configuration parameters on the performance of the codecs is first analyzed with the help of objective quality metrics. The results from this analysis are used to define three rate allocation strategies for each codec, which are employed to compress a set of point clouds at four target rates. The set of distorted point clouds is then subjectively evaluated following two subjective quality assessment protocols. Finally, the obtained results are used to compare the performance of these compression standards and draw insights about best coding practices.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"2024 1","pages":"14"},"PeriodicalIF":2.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11166754/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141318743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learned scalable video coding for humans and machines.","authors":"Hadi Hadizadeh, Ivan V Bajić","doi":"10.1186/s13640-024-00657-w","DOIUrl":"10.1186/s13640-024-00657-w","url":null,"abstract":"<p><p>Video coding has traditionally been developed to support services such as video streaming, videoconferencing, digital TV, and so on. The main intent was to enable human viewing of the encoded content. However, with the advances in deep neural networks (DNNs), encoded video is increasingly being used for automatic video analytics performed by machines. In applications such as automatic traffic monitoring, analytics such as vehicle detection, tracking and counting, would run continuously, while human viewing could be required occasionally to review potential incidents. To support such applications, a new paradigm for video coding is needed that will facilitate efficient representation and compression of video for both machine and human use in a scalable manner. In this manuscript, we introduce an end-to-end learnable video codec that supports a machine vision task in its base layer, while its enhancement layer, together with the base layer, supports input reconstruction for human viewing. The proposed system is constructed based on the concept of conditional coding to achieve better compression gains. Comprehensive experimental evaluations conducted on four standard video datasets demonstrate that our framework outperforms both state-of-the-art learned and conventional video codecs in its base layer, while maintaining comparable performance on the human vision task in its enhancement layer.</p>","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"2024 1","pages":"41"},"PeriodicalIF":2.4,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11564357/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142649470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tong Qiao, Shengwang Xu, Shuai Wang, Xiaoshuai Wu, Bo Liu, Ning Zheng, Ming Xu, Binmin Pan
{"title":"Robust steganography in practical communication: a comparative study","authors":"Tong Qiao, Shengwang Xu, Shuai Wang, Xiaoshuai Wu, Bo Liu, Ning Zheng, Ming Xu, Binmin Pan","doi":"10.1186/s13640-023-00615-y","DOIUrl":"https://doi.org/10.1186/s13640-023-00615-y","url":null,"abstract":"Abstract To realize the act of covert communication in a public channel, steganography is proposed. In the current study, modern adaptive steganography plays a dominant role due to its high undetectability. However, the effectiveness of modern adaptive steganography is challenged when being applied in practical communication, such as over social network. Several robust steganographic methods have been proposed, while the comparative study between them is still unknown. Thus, we propose a framework to generalize the current typical steganographic methods resisting against compression attack, and meanwhile empirically analyze advantages and disadvantages of them based on four baseline indicators, referring to as capacity, imperceptibility, undetectability, and robustness. More importantly, the robustness performance of the methods is compared in the real application, such as on Facebook, Twitter, and WeChat, which has not been comprehensively addressed in this community. In particular, the methods modifying sign of DCT coefficients perform more superiority on the social media application.","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":"44 11","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135868890","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saima Waseem, S. Abu-Bakar, Z. Omar, Bilal Ashfaq Ahmed, Saba Baloch, Adel Hafeezallah
{"title":"Multi-attention-based approach for deepfake face and expression swap detection and localization","authors":"Saima Waseem, S. Abu-Bakar, Z. Omar, Bilal Ashfaq Ahmed, Saba Baloch, Adel Hafeezallah","doi":"10.1186/s13640-023-00614-z","DOIUrl":"https://doi.org/10.1186/s13640-023-00614-z","url":null,"abstract":"","PeriodicalId":49322,"journal":{"name":"Eurasip Journal on Image and Video Processing","volume":" ","pages":""},"PeriodicalIF":2.4,"publicationDate":"2023-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42272550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}