Kaneswaran Anantharajah, S. Denman, D. Tjondronegoro, S. Sridharan, C. Fookes
{"title":"新闻视频中鲁棒自动人脸聚类","authors":"Kaneswaran Anantharajah, S. Denman, D. Tjondronegoro, S. Sridharan, C. Fookes","doi":"10.1109/DICTA.2015.7371301","DOIUrl":null,"url":null,"abstract":"Clustering identities in a video is a useful task to aid in video search, annotation and retrieval, and cast identification. However, reliably clustering faces across multiple videos is challenging task due to variations in the appearance of the faces, as videos are captured in an uncontrolled environment. A person's appearance may vary due to session variations including: lighting and background changes, occlusions, changes in expression and make up. In this paper we propose the novel Local Total Variability Modelling (Local TVM) approach to cluster faces across a news video corpus; and incorporate this into a novel two stage video clustering system. We first cluster faces within a single video using colour, spatial and temporal cues; after which we use face track modelling and hierarchical agglomerative clustering to cluster faces across the entire corpus. We compare different face recognition approaches within this framework. Experiments on a news video database show that the Local TVM technique is able effectively model the session variation observed in the data, resulting in improved clustering performance, with much greater computational efficiency than other methods.","PeriodicalId":214897,"journal":{"name":"2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Robust Automatic Face Clustering in News Video\",\"authors\":\"Kaneswaran Anantharajah, S. Denman, D. Tjondronegoro, S. Sridharan, C. Fookes\",\"doi\":\"10.1109/DICTA.2015.7371301\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustering identities in a video is a useful task to aid in video search, annotation and retrieval, and cast identification. However, reliably clustering faces across multiple videos is challenging task due to variations in the appearance of the faces, as videos are captured in an uncontrolled environment. A person's appearance may vary due to session variations including: lighting and background changes, occlusions, changes in expression and make up. In this paper we propose the novel Local Total Variability Modelling (Local TVM) approach to cluster faces across a news video corpus; and incorporate this into a novel two stage video clustering system. We first cluster faces within a single video using colour, spatial and temporal cues; after which we use face track modelling and hierarchical agglomerative clustering to cluster faces across the entire corpus. We compare different face recognition approaches within this framework. Experiments on a news video database show that the Local TVM technique is able effectively model the session variation observed in the data, resulting in improved clustering performance, with much greater computational efficiency than other methods.\",\"PeriodicalId\":214897,\"journal\":{\"name\":\"2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA)\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DICTA.2015.7371301\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DICTA.2015.7371301","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
视频中的身份聚类有助于视频搜索、标注和检索以及演员识别。然而,由于人脸外观的变化,在多个视频中可靠地聚类人脸是一项具有挑战性的任务,因为视频是在不受控制的环境中捕获的。一个人的外表可能会因会话变化而变化,包括:灯光和背景的变化、遮挡、表情和化妆的变化。在本文中,我们提出了一种新的局部总变异性建模(Local Total Variability modeling, Local TVM)方法来聚类新闻视频语料库中的人脸;并将其整合到一个新的两级视频聚类系统中。我们首先使用颜色、空间和时间线索在单个视频中对面孔进行聚类;然后,我们使用人脸轨迹建模和分层聚类对整个语料库中的人脸进行聚类。我们在这个框架内比较了不同的人脸识别方法。在新闻视频数据库上的实验表明,Local TVM技术能够有效地对数据中观察到的会话变化进行建模,提高了聚类性能,计算效率大大高于其他方法。
Clustering identities in a video is a useful task to aid in video search, annotation and retrieval, and cast identification. However, reliably clustering faces across multiple videos is challenging task due to variations in the appearance of the faces, as videos are captured in an uncontrolled environment. A person's appearance may vary due to session variations including: lighting and background changes, occlusions, changes in expression and make up. In this paper we propose the novel Local Total Variability Modelling (Local TVM) approach to cluster faces across a news video corpus; and incorporate this into a novel two stage video clustering system. We first cluster faces within a single video using colour, spatial and temporal cues; after which we use face track modelling and hierarchical agglomerative clustering to cluster faces across the entire corpus. We compare different face recognition approaches within this framework. Experiments on a news video database show that the Local TVM technique is able effectively model the session variation observed in the data, resulting in improved clustering performance, with much greater computational efficiency than other methods.