{"title":"Beyond the Informedia digital video library: video and audio analysis for remembering conversations","authors":"Alexander Hauptmann, Wei-Hao Lin","doi":"10.1109/ASRU.2001.1034646","DOIUrl":null,"url":null,"abstract":"The Informedia Project digital video library pioneered the automatic analysis of television broadcast news and its retrieval on demand. Building on that system, we have developed a wearable, personalized Informedia system, which listens to and transcribes the wearer's part of a conversation, recognizes the face of the current dialog partner and remembers his/her voice. The next time the system sees the same person's face and hears the same voice, it can retrieve the audio from the last conversation, replaying in compressed form the names and major issues that were mentioned. All of this happens unobtrusively, somewhat like an intelligent assistant who whispers to you: \"That's Bob Jones from Tech Solutions; two weeks ago in London you discussed solar panels\". This paper outlines the general system components as well as interface considerations. Initial implementations showed that both face recognition methods and speaker identification technology have serious shortfalls that must be overcome.","PeriodicalId":118671,"journal":{"name":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2001.1034646","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
The Informedia Project digital video library pioneered the automatic analysis of television broadcast news and its retrieval on demand. Building on that system, we have developed a wearable, personalized Informedia system, which listens to and transcribes the wearer's part of a conversation, recognizes the face of the current dialog partner and remembers his/her voice. The next time the system sees the same person's face and hears the same voice, it can retrieve the audio from the last conversation, replaying in compressed form the names and major issues that were mentioned. All of this happens unobtrusively, somewhat like an intelligent assistant who whispers to you: "That's Bob Jones from Tech Solutions; two weeks ago in London you discussed solar panels". This paper outlines the general system components as well as interface considerations. Initial implementations showed that both face recognition methods and speaker identification technology have serious shortfalls that must be overcome.