{"title":"新颖性测量作为音频相似性时间显著性的线索","authors":"M. Cartwright, Bryan Pardo","doi":"10.1145/2390848.2390862","DOIUrl":null,"url":null,"abstract":"Most algorithms for estimating audio similarity either completely disregard time or they treat each moment in time equally. However, many studies over the years have noted several factors that affect how much attention we give to certain sounds or parts of sounds (e.g. loudness, the attack, novelty). These findings suggest that some time segments of audio may be more salient than others when making similarity judgments. We believe that if we could estimate this information, we could improve audio similarity measures. This paper presents the results of a human subject study designed to test the hypothesis that sounds segments with high timbral change are more salient than segments with low timbral change. We then investigate whether we can use this information to improve two audio similarity measures: a \"bag-of-frames\" approach and a dynamic time warping approach.","PeriodicalId":199844,"journal":{"name":"MIRUM '12","volume":"107 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Novelty measures as cues for temporal salience in audio similarity\",\"authors\":\"M. Cartwright, Bryan Pardo\",\"doi\":\"10.1145/2390848.2390862\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most algorithms for estimating audio similarity either completely disregard time or they treat each moment in time equally. However, many studies over the years have noted several factors that affect how much attention we give to certain sounds or parts of sounds (e.g. loudness, the attack, novelty). These findings suggest that some time segments of audio may be more salient than others when making similarity judgments. We believe that if we could estimate this information, we could improve audio similarity measures. This paper presents the results of a human subject study designed to test the hypothesis that sounds segments with high timbral change are more salient than segments with low timbral change. We then investigate whether we can use this information to improve two audio similarity measures: a \\\"bag-of-frames\\\" approach and a dynamic time warping approach.\",\"PeriodicalId\":199844,\"journal\":{\"name\":\"MIRUM '12\",\"volume\":\"107 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MIRUM '12\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2390848.2390862\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MIRUM '12","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2390848.2390862","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Novelty measures as cues for temporal salience in audio similarity
Most algorithms for estimating audio similarity either completely disregard time or they treat each moment in time equally. However, many studies over the years have noted several factors that affect how much attention we give to certain sounds or parts of sounds (e.g. loudness, the attack, novelty). These findings suggest that some time segments of audio may be more salient than others when making similarity judgments. We believe that if we could estimate this information, we could improve audio similarity measures. This paper presents the results of a human subject study designed to test the hypothesis that sounds segments with high timbral change are more salient than segments with low timbral change. We then investigate whether we can use this information to improve two audio similarity measures: a "bag-of-frames" approach and a dynamic time warping approach.