Madina Abdrakhmanova;Timur Unaspekov;Huseyin Atakan Varol
{"title":"利用生成式热数据增强技术进行多模态人员验证","authors":"Madina Abdrakhmanova;Timur Unaspekov;Huseyin Atakan Varol","doi":"10.1109/TBIOM.2023.3346938","DOIUrl":null,"url":null,"abstract":"The fusion of audio, visual, and thermal modalities has proven effective in developing reliable person verification systems. In this study, we enhanced multimodal person verification performance by augmenting training data using domain transfer methods. Specifically, we enriched the audio-visual-thermal SpeakingFaces dataset with a combination of real audio-visual data and synthetic thermal data from the VoxCeleb dataset. We adapted visual images in VoxCeleb to the thermal domain using CycleGAN, trained on SpeakingFaces. Our results demonstrate the positive impact of augmented training data on all unimodal and multimodal models. The score fusion of unimodal audio, unimodal visual, bimodal, and trimodal systems trained on the combined data achieved the best results on both datasets and exhibited robustness in low-illumination and noisy conditions. Our findings emphasize the importance of utilizing synthetic data, produced by generative methods, to improve deep learning model performance. To facilitate reproducibility and further research in multimodal person verification, we have made our code, pretrained models, and preprocessed dataset freely available in our GitHub repository.","PeriodicalId":73307,"journal":{"name":"IEEE transactions on biometrics, behavior, and identity science","volume":"6 1","pages":"43-53"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal Person Verification With Generative Thermal Data Augmentation\",\"authors\":\"Madina Abdrakhmanova;Timur Unaspekov;Huseyin Atakan Varol\",\"doi\":\"10.1109/TBIOM.2023.3346938\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The fusion of audio, visual, and thermal modalities has proven effective in developing reliable person verification systems. In this study, we enhanced multimodal person verification performance by augmenting training data using domain transfer methods. Specifically, we enriched the audio-visual-thermal SpeakingFaces dataset with a combination of real audio-visual data and synthetic thermal data from the VoxCeleb dataset. We adapted visual images in VoxCeleb to the thermal domain using CycleGAN, trained on SpeakingFaces. Our results demonstrate the positive impact of augmented training data on all unimodal and multimodal models. The score fusion of unimodal audio, unimodal visual, bimodal, and trimodal systems trained on the combined data achieved the best results on both datasets and exhibited robustness in low-illumination and noisy conditions. Our findings emphasize the importance of utilizing synthetic data, produced by generative methods, to improve deep learning model performance. To facilitate reproducibility and further research in multimodal person verification, we have made our code, pretrained models, and preprocessed dataset freely available in our GitHub repository.\",\"PeriodicalId\":73307,\"journal\":{\"name\":\"IEEE transactions on biometrics, behavior, and identity science\",\"volume\":\"6 1\",\"pages\":\"43-53\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on biometrics, behavior, and identity science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10374245/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on biometrics, behavior, and identity science","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10374245/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Multimodal Person Verification With Generative Thermal Data Augmentation
The fusion of audio, visual, and thermal modalities has proven effective in developing reliable person verification systems. In this study, we enhanced multimodal person verification performance by augmenting training data using domain transfer methods. Specifically, we enriched the audio-visual-thermal SpeakingFaces dataset with a combination of real audio-visual data and synthetic thermal data from the VoxCeleb dataset. We adapted visual images in VoxCeleb to the thermal domain using CycleGAN, trained on SpeakingFaces. Our results demonstrate the positive impact of augmented training data on all unimodal and multimodal models. The score fusion of unimodal audio, unimodal visual, bimodal, and trimodal systems trained on the combined data achieved the best results on both datasets and exhibited robustness in low-illumination and noisy conditions. Our findings emphasize the importance of utilizing synthetic data, produced by generative methods, to improve deep learning model performance. To facilitate reproducibility and further research in multimodal person verification, we have made our code, pretrained models, and preprocessed dataset freely available in our GitHub repository.