{"title":"基于多模式不兼容的深度伪造检测","authors":"Yu-xin Zhang, Jinyu Zhan, Wei Jiang, Zhufeng Fan","doi":"10.1109/ICITES53477.2021.9637096","DOIUrl":null,"url":null,"abstract":"We propose a multi-modal detection for deepfake videos, called the Incompatibility Between Multiple Modes (IBMM) detection. The detection algorithm can detect whether the video is real or fake, and may be embedded in the monitoring equipment in the future. The model adopts EfficientNet and simple 3D-CNN, and it identifies deepfake videos through three modes. In the facial motion mode and lip motion mode, we use the EfficientNet for feature learning. This network uses a series of fixed scaling coefficients to scale the dimensions of the network uniformly and achieves good results in learning image features. In the audio mode, we adopt 3D-CNN network to train the hot coding diagram of audio data. Besides, for a single mode, we use the cross-entropy loss to calculate the irrationality of the mode. For different modes, the contrastive loss is used to calculate the incongruity between the modes, such as incompatibility between lips and voice. Experimental results show that, compared with other existing fake detection methods, the method presented in this paper has higher accuracy (95.87%) on DFDC datasets. And compared with the existing methods, the accuracy increases by 5.21%.","PeriodicalId":370828,"journal":{"name":"2021 International Conference on Intelligent Technology and Embedded Systems (ICITES)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Deepfake Detection Based on Incompatibility Between Multiple Modes\",\"authors\":\"Yu-xin Zhang, Jinyu Zhan, Wei Jiang, Zhufeng Fan\",\"doi\":\"10.1109/ICITES53477.2021.9637096\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a multi-modal detection for deepfake videos, called the Incompatibility Between Multiple Modes (IBMM) detection. The detection algorithm can detect whether the video is real or fake, and may be embedded in the monitoring equipment in the future. The model adopts EfficientNet and simple 3D-CNN, and it identifies deepfake videos through three modes. In the facial motion mode and lip motion mode, we use the EfficientNet for feature learning. This network uses a series of fixed scaling coefficients to scale the dimensions of the network uniformly and achieves good results in learning image features. In the audio mode, we adopt 3D-CNN network to train the hot coding diagram of audio data. Besides, for a single mode, we use the cross-entropy loss to calculate the irrationality of the mode. For different modes, the contrastive loss is used to calculate the incongruity between the modes, such as incompatibility between lips and voice. Experimental results show that, compared with other existing fake detection methods, the method presented in this paper has higher accuracy (95.87%) on DFDC datasets. And compared with the existing methods, the accuracy increases by 5.21%.\",\"PeriodicalId\":370828,\"journal\":{\"name\":\"2021 International Conference on Intelligent Technology and Embedded Systems (ICITES)\",\"volume\":\"55 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Intelligent Technology and Embedded Systems (ICITES)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICITES53477.2021.9637096\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Intelligent Technology and Embedded Systems (ICITES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITES53477.2021.9637096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Deepfake Detection Based on Incompatibility Between Multiple Modes
We propose a multi-modal detection for deepfake videos, called the Incompatibility Between Multiple Modes (IBMM) detection. The detection algorithm can detect whether the video is real or fake, and may be embedded in the monitoring equipment in the future. The model adopts EfficientNet and simple 3D-CNN, and it identifies deepfake videos through three modes. In the facial motion mode and lip motion mode, we use the EfficientNet for feature learning. This network uses a series of fixed scaling coefficients to scale the dimensions of the network uniformly and achieves good results in learning image features. In the audio mode, we adopt 3D-CNN network to train the hot coding diagram of audio data. Besides, for a single mode, we use the cross-entropy loss to calculate the irrationality of the mode. For different modes, the contrastive loss is used to calculate the incongruity between the modes, such as incompatibility between lips and voice. Experimental results show that, compared with other existing fake detection methods, the method presented in this paper has higher accuracy (95.87%) on DFDC datasets. And compared with the existing methods, the accuracy increases by 5.21%.