{"title":"SAIVT-BNEWS: An Australian Broadcast News Video Dataset for Entity Extraction, and More","authors":"David Dean","doi":"10.1145/2802558.2814653","DOIUrl":null,"url":null,"abstract":"Recently QUT have released a set of annotated broadcast news videos (SAIVT-BNEWS) that we have made available at our website (https://www.qut.edu.au/research/saivt). This presentation will outline the dataset itself, covering 50 or so short news clips surrounding a single political event with many entities appearing in multuple records, and cover interesting research that QUT has, is currently, and is interested in performing on this dataset in the future. This presentation will cover existing published research, including image processing tasks like face detection and clustering; and speech processing tasks (including the use of visual speech) like speech detection, speaker recognition, and speaker diarisation. We have also started very interesting research on fusing multiple sources of information, including metadata, OCR, faces, speech, and scene detection to improve the performance of many techniques, but with a focus on improving the automatic extraction of entities (people, places, companies and organisations) from large volumes of audio-visual data, and this will also be addressed in this talk. As this dataset is publicly available for free to all researchers, QUT hopes that other researchers will make use of, and improve upon this dataset as well.","PeriodicalId":115369,"journal":{"name":"Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Third Edition Workshop on Speech, Language & Audio in Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2802558.2814653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recently QUT have released a set of annotated broadcast news videos (SAIVT-BNEWS) that we have made available at our website (https://www.qut.edu.au/research/saivt). This presentation will outline the dataset itself, covering 50 or so short news clips surrounding a single political event with many entities appearing in multuple records, and cover interesting research that QUT has, is currently, and is interested in performing on this dataset in the future. This presentation will cover existing published research, including image processing tasks like face detection and clustering; and speech processing tasks (including the use of visual speech) like speech detection, speaker recognition, and speaker diarisation. We have also started very interesting research on fusing multiple sources of information, including metadata, OCR, faces, speech, and scene detection to improve the performance of many techniques, but with a focus on improving the automatic extraction of entities (people, places, companies and organisations) from large volumes of audio-visual data, and this will also be addressed in this talk. As this dataset is publicly available for free to all researchers, QUT hopes that other researchers will make use of, and improve upon this dataset as well.