Arnab Kumar Das , Aritra Bose , Priya Manohar , Anurag Dutta , Ruchira Naskar , Rajat Subhra Chakraborty
{"title":"InDeepFake:一个新颖的多模态多语言印度深度假视频数据集","authors":"Arnab Kumar Das , Aritra Bose , Priya Manohar , Anurag Dutta , Ruchira Naskar , Rajat Subhra Chakraborty","doi":"10.1016/j.patrec.2025.07.002","DOIUrl":null,"url":null,"abstract":"<div><div>Recent advancements in Generative AI have resulted in decline of online digital contents credibility, at all levels of the human society. In spite of numerous discussions in popular media on the grave risks exposed by deepfakes and the relative lack of human awareness, deepfake based illegal activities are on the rise all over the world. India as a nation has seen rapid surge in deepfake cases reported in recent times, with news channels and media flooded with cases of financial fraudulence, personal vendetta, and false political propaganda, especially before the national and state elections. This can prove detrimental against the democratic future of the nation, indicating a serious need for efficient deepfake detectors in the coming days, tailored to investigate and solve Indian deepfake cases. The task is particularly challenging given the great linguistic and ethnic diversity of India. Based on this motivation, in our work, we develop an extensive deepfake dataset for the Indian population. To the best of our knowledge, this is the first such effort that is reported. We have developed a multimodal audio–video deepfake dataset, in seven major Indian languages, and seven state-of-the-art (SOTA) deepfake generators, covering a wide range of age and gender diversity. We evaluated SOTA detector results on the proposed dataset, to highlight its relevance in furthering multimodal deepfake research. We have open-sourced the dataset and code to implement the baseline methods at: <span><span>https://github.com/arnabdasphd/InDeepFake</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"197 ","pages":"Pages 16-23"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"InDeepFake: A novel multimodal multilingual indian deepfake video dataset\",\"authors\":\"Arnab Kumar Das , Aritra Bose , Priya Manohar , Anurag Dutta , Ruchira Naskar , Rajat Subhra Chakraborty\",\"doi\":\"10.1016/j.patrec.2025.07.002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Recent advancements in Generative AI have resulted in decline of online digital contents credibility, at all levels of the human society. In spite of numerous discussions in popular media on the grave risks exposed by deepfakes and the relative lack of human awareness, deepfake based illegal activities are on the rise all over the world. India as a nation has seen rapid surge in deepfake cases reported in recent times, with news channels and media flooded with cases of financial fraudulence, personal vendetta, and false political propaganda, especially before the national and state elections. This can prove detrimental against the democratic future of the nation, indicating a serious need for efficient deepfake detectors in the coming days, tailored to investigate and solve Indian deepfake cases. The task is particularly challenging given the great linguistic and ethnic diversity of India. Based on this motivation, in our work, we develop an extensive deepfake dataset for the Indian population. To the best of our knowledge, this is the first such effort that is reported. We have developed a multimodal audio–video deepfake dataset, in seven major Indian languages, and seven state-of-the-art (SOTA) deepfake generators, covering a wide range of age and gender diversity. We evaluated SOTA detector results on the proposed dataset, to highlight its relevance in furthering multimodal deepfake research. We have open-sourced the dataset and code to implement the baseline methods at: <span><span>https://github.com/arnabdasphd/InDeepFake</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":54638,\"journal\":{\"name\":\"Pattern Recognition Letters\",\"volume\":\"197 \",\"pages\":\"Pages 16-23\"},\"PeriodicalIF\":3.3000,\"publicationDate\":\"2025-07-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Pattern Recognition Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167865525002545\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525002545","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
InDeepFake: A novel multimodal multilingual indian deepfake video dataset
Recent advancements in Generative AI have resulted in decline of online digital contents credibility, at all levels of the human society. In spite of numerous discussions in popular media on the grave risks exposed by deepfakes and the relative lack of human awareness, deepfake based illegal activities are on the rise all over the world. India as a nation has seen rapid surge in deepfake cases reported in recent times, with news channels and media flooded with cases of financial fraudulence, personal vendetta, and false political propaganda, especially before the national and state elections. This can prove detrimental against the democratic future of the nation, indicating a serious need for efficient deepfake detectors in the coming days, tailored to investigate and solve Indian deepfake cases. The task is particularly challenging given the great linguistic and ethnic diversity of India. Based on this motivation, in our work, we develop an extensive deepfake dataset for the Indian population. To the best of our knowledge, this is the first such effort that is reported. We have developed a multimodal audio–video deepfake dataset, in seven major Indian languages, and seven state-of-the-art (SOTA) deepfake generators, covering a wide range of age and gender diversity. We evaluated SOTA detector results on the proposed dataset, to highlight its relevance in furthering multimodal deepfake research. We have open-sourced the dataset and code to implement the baseline methods at: https://github.com/arnabdasphd/InDeepFake.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.