Theodoros Giannakopoulos, A. Pikrakis, S. Theodoridis
{"title":"电影音频流中的音乐跟踪","authors":"Theodoros Giannakopoulos, A. Pikrakis, S. Theodoridis","doi":"10.1109/MMSP.2008.4665211","DOIUrl":null,"url":null,"abstract":"This paper presents a robust and computationally efficient method for tracking music in audio streams from movies. The audio stream is first mid-term processed with a fixed length moving window and four features are extracted per window. Each feature is fed as input to a simple classifier which produces a soft output for the binary problem of music vs. all other types of audio. The soft outputs are then combined to yield a measure of confidence quantifying whether the segment corresponds to music or not. At a final step, thresholding is applied to filter out segments where the confidence measure is low. The proposed approach has been tested with audio streams from various movies and its performance was measured both on a mid-term segment basis as well as on an event detection basis. Reported results demonstrate that the method exhibits high performance even when music is mixed with other types of audio in the stream.","PeriodicalId":402287,"journal":{"name":"2008 IEEE 10th Workshop on Multimedia Signal Processing","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Music tracking in audio streams from movies\",\"authors\":\"Theodoros Giannakopoulos, A. Pikrakis, S. Theodoridis\",\"doi\":\"10.1109/MMSP.2008.4665211\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a robust and computationally efficient method for tracking music in audio streams from movies. The audio stream is first mid-term processed with a fixed length moving window and four features are extracted per window. Each feature is fed as input to a simple classifier which produces a soft output for the binary problem of music vs. all other types of audio. The soft outputs are then combined to yield a measure of confidence quantifying whether the segment corresponds to music or not. At a final step, thresholding is applied to filter out segments where the confidence measure is low. The proposed approach has been tested with audio streams from various movies and its performance was measured both on a mid-term segment basis as well as on an event detection basis. Reported results demonstrate that the method exhibits high performance even when music is mixed with other types of audio in the stream.\",\"PeriodicalId\":402287,\"journal\":{\"name\":\"2008 IEEE 10th Workshop on Multimedia Signal Processing\",\"volume\":\"117 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-11-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE 10th Workshop on Multimedia Signal Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MMSP.2008.4665211\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE 10th Workshop on Multimedia Signal Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MMSP.2008.4665211","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper presents a robust and computationally efficient method for tracking music in audio streams from movies. The audio stream is first mid-term processed with a fixed length moving window and four features are extracted per window. Each feature is fed as input to a simple classifier which produces a soft output for the binary problem of music vs. all other types of audio. The soft outputs are then combined to yield a measure of confidence quantifying whether the segment corresponds to music or not. At a final step, thresholding is applied to filter out segments where the confidence measure is low. The proposed approach has been tested with audio streams from various movies and its performance was measured both on a mid-term segment basis as well as on an event detection basis. Reported results demonstrate that the method exhibits high performance even when music is mixed with other types of audio in the stream.