{"title":"一个轻松的内容注释系统,以展现视频中的语义","authors":"R. Lienhart","doi":"10.1109/IVL.2000.853838","DOIUrl":null,"url":null,"abstract":"We propose and investigate a new but simple and natural extension of the way people record video. This extension allows one to unfold the semantics of video clips and thus enables a completely new set of applications for raw video footage. Two microphones are connected to a camcorder: a headworn speech input microphone and an environmental microphone. During recording the cameraman speaks out loud content-descriptive annotations and/or editing commands. Due to the two-microphones setup the sound of annotations and editing commands can be removed from the environmental audio by adaptive filtering enabling people to play back the video as if there had been no annotations. Simultaneously, these annotations are transcribed to ASCII by means of a standard speech recognition engine. The viability of this approach is demonstrated by means of an important application for video libraries: the automatic abstraction of raw video footage.","PeriodicalId":333664,"journal":{"name":"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"A system for effortless content annotation to unfold the semantics in videos\",\"authors\":\"R. Lienhart\",\"doi\":\"10.1109/IVL.2000.853838\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose and investigate a new but simple and natural extension of the way people record video. This extension allows one to unfold the semantics of video clips and thus enables a completely new set of applications for raw video footage. Two microphones are connected to a camcorder: a headworn speech input microphone and an environmental microphone. During recording the cameraman speaks out loud content-descriptive annotations and/or editing commands. Due to the two-microphones setup the sound of annotations and editing commands can be removed from the environmental audio by adaptive filtering enabling people to play back the video as if there had been no annotations. Simultaneously, these annotations are transcribed to ASCII by means of a standard speech recognition engine. The viability of this approach is demonstrated by means of an important application for video libraries: the automatic abstraction of raw video footage.\",\"PeriodicalId\":333664,\"journal\":{\"name\":\"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2000-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IVL.2000.853838\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2000 Proceedings Workshop on Content-based Access of Image and Video Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IVL.2000.853838","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A system for effortless content annotation to unfold the semantics in videos
We propose and investigate a new but simple and natural extension of the way people record video. This extension allows one to unfold the semantics of video clips and thus enables a completely new set of applications for raw video footage. Two microphones are connected to a camcorder: a headworn speech input microphone and an environmental microphone. During recording the cameraman speaks out loud content-descriptive annotations and/or editing commands. Due to the two-microphones setup the sound of annotations and editing commands can be removed from the environmental audio by adaptive filtering enabling people to play back the video as if there had been no annotations. Simultaneously, these annotations are transcribed to ASCII by means of a standard speech recognition engine. The viability of this approach is demonstrated by means of an important application for video libraries: the automatic abstraction of raw video footage.