E. Singer, B. J. Borgstrom, Kenneth Alperin, Trang Nguyen, C. Dagli, Melissa R. Dale, A. Ross
{"title":"身份验证用MITLL三模态数据集的设计","authors":"E. Singer, B. J. Borgstrom, Kenneth Alperin, Trang Nguyen, C. Dagli, Melissa R. Dale, A. Ross","doi":"10.1109/IWBF57495.2023.10157658","DOIUrl":null,"url":null,"abstract":"The recent advances in deep learning have led to an increased interest in the development of techniques for multimodal identity verification applications, particularly in the area of biometric fusion. Associated with these efforts is a corresponding need for large scale multimodal datasets to provide the bases for establishing performance baselines for proposed approaches. After examining the characteristics of existing multimodal datasets, this paper will describe the development of the MITLL Trimodal dataset, a new triple-modality collection of data comprising parallel samples of audio, image, and text for 553 subjects. The dataset is formed from YouTube videos and Twitter tweets. Baseline single modality results using a common processing pipeline are presented, along with the results of applying a conventional fusion algorithm to the individual stream scores.","PeriodicalId":273412,"journal":{"name":"2023 11th International Workshop on Biometrics and Forensics (IWBF)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"On the Design of the MITLL Trimodal Dataset for Identity Verification\",\"authors\":\"E. Singer, B. J. Borgstrom, Kenneth Alperin, Trang Nguyen, C. Dagli, Melissa R. Dale, A. Ross\",\"doi\":\"10.1109/IWBF57495.2023.10157658\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The recent advances in deep learning have led to an increased interest in the development of techniques for multimodal identity verification applications, particularly in the area of biometric fusion. Associated with these efforts is a corresponding need for large scale multimodal datasets to provide the bases for establishing performance baselines for proposed approaches. After examining the characteristics of existing multimodal datasets, this paper will describe the development of the MITLL Trimodal dataset, a new triple-modality collection of data comprising parallel samples of audio, image, and text for 553 subjects. The dataset is formed from YouTube videos and Twitter tweets. Baseline single modality results using a common processing pipeline are presented, along with the results of applying a conventional fusion algorithm to the individual stream scores.\",\"PeriodicalId\":273412,\"journal\":{\"name\":\"2023 11th International Workshop on Biometrics and Forensics (IWBF)\",\"volume\":\"71 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 11th International Workshop on Biometrics and Forensics (IWBF)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWBF57495.2023.10157658\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 11th International Workshop on Biometrics and Forensics (IWBF)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWBF57495.2023.10157658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the Design of the MITLL Trimodal Dataset for Identity Verification
The recent advances in deep learning have led to an increased interest in the development of techniques for multimodal identity verification applications, particularly in the area of biometric fusion. Associated with these efforts is a corresponding need for large scale multimodal datasets to provide the bases for establishing performance baselines for proposed approaches. After examining the characteristics of existing multimodal datasets, this paper will describe the development of the MITLL Trimodal dataset, a new triple-modality collection of data comprising parallel samples of audio, image, and text for 553 subjects. The dataset is formed from YouTube videos and Twitter tweets. Baseline single modality results using a common processing pipeline are presented, along with the results of applying a conventional fusion algorithm to the individual stream scores.