嘉宾评论:音乐技术中的音乐感知与认知

IF 1.2 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Zijin Li, Stephen McAdams
{"title":"嘉宾评论:音乐技术中的音乐感知与认知","authors":"Zijin Li,&nbsp;Stephen McAdams","doi":"10.1049/ccs2.12066","DOIUrl":null,"url":null,"abstract":"<p>There has been a remarkably increasing interest in music technology in the past few years, which is a multi-disciplinary overlapping research area. It involves digital signal processing, acoustics, mechanics, computer science, electronic engineering, artificial intelligence psychophysiology, cognitive neuroscience and music performance, theory and analysis. Among these sub-domains of music technology, Music Perception and Cognition are important parts of Computational Musicology as <i>Musiking</i> is a whole activity from music noumenon to being perceived and cognised by human beings. In addition to the calculation of basic elements of music itself, such as rhythm, pitch, timbre, harmony and structure, the perception of music to the human ear and the creative cognitive process should gain more attention from researchers because it serves as a bridge to join the humanity and technology.</p><p>Music perception exists in almost every aspect related to music, such as composing, playing, improvising, performing, teaching and learning. It is so comprehensive that a range of disciplines, including cognitive musicology, musical timbre perception, music emotions, acoustics, audio-based music signal processing, music interactive, cognitive modelling and music information retrieval, can be incorporated.</p><p>This special issue aims to bring together humanity and technology scientists in music technology in areas such as music performance art, creativity, computer science, experimental psychology, and cognitive science. It is composed of 10 outstanding contributions covering auditory attention selection behaviours, emotional music generation, instrument and performance skills recognition, perception and musical elements, music educational robots, affective computing, music-related social behaviour, and cross-cultural music dataset.</p><p>Li et al. studied the automatic recognition of traditional Chinese musical instrument audio. Specifically in the instrument type identification experiment, Mel-spectrum is used as input, and an 8-layer convolutional neural network is trained. This configuration achieves 99.3% accuracy; in the performance skills recognition experiments respectively conducted on single-instrument level and same-kind instruments level where the regularity of the same playing technique of different instruments can be utilised. The recognition accuracy of the four kinds of instruments is as follows: 95.7% for blowing instruments, 82.2% for plucked string instruments, 88.3% for strings instruments, and 97.5% for percussion instruments with a similar training procedure configuration.</p><p>Wang et al. used a cross-cultural approach to explore the correlations between perception and musical elements by comparing music emotion recognition models. In this approach, the participants are asked to rate valence, tension arousal and energy arousal on labelled nine-point analogical-categorical scales for four types of classical music: Chinese ensemble, Chinese solo, Western ensemble and Western solo. Fifteen musical elements in five categories—timbre, rhythm, articulation, dynamics and register were annotated through manual evaluation or the automatic algorithm. Results showed that tempo, rhythm complexity, and articulation are culturally universal, but musical elements related to timbre, register and dynamic features are culturally specific.</p><p>Du et al. proposed a multi-scale ASA model based on the binary Logit model by referencing the information value and saliency-driven factors of the listener's attention behaviour. The experiment for verification showed that the proposed ASA model was an effectively predicted human selective auditory attention feature. The improvement of the proposed ASA model with auditory attention research studies and traditional attention models is embodied in cognitive specialties that coincide more with the authentic auditory attention process and its application in the practical HMS optimisation. Furthermore, by adopting the proposed ASA model, auditory attention behaviour can be predicted before the task. This will help researchers analyse listeners' behaviours and evaluate the ergonomics in the ‘cocktail party effect’ environment.</p><p>Ma et al. proposed an emotional music generation model considering the structure features along with its emotional label. Specifically, the emotional labels with music structure features are embedded as the conditional input, a conditional generative GRU model is used for generating music in an auto-regressive manner and a perceptual loss is optimised with cross-entropy loss during the training procedure. Furthermore, both the subjective and objective experiments prove that their model can generate emotional music correlated to the specified emotion and music structures.</p><p>Jiang et al. analysed the mechanism of sound production in terms of the coupling of the edge tone and the air column's vibration in the tube. It was found through numerical simulations that the oscillation frequency of the edge tone increases with the jet velocity and jumps to another higher stage at certain values, and the dominant modes can be altered by varying the impinging jet angle. Furthermore, the tonal quality of the flue pipe is demonstrated to be dependent upon the changes in the oscillation frequency of the edge tone by the experiments of a musical pipe model. Greater amplitude and higher dominant frequencies are shown in the acoustic response of the flue pipe when increasing the jet velocity. With these properties, the flutist will obtain subtle variations in the perceived tonal quality through adjustment of the blowing velocity during the attack transient.</p><p>Li et al. presented the design and development of a virtual fretless Chinese stringed instrument App by taking the Duxianqin as an example. The digital simulation of fretless musical instruments consists of simulation of the continuous pitch processing of the strings, and the simulation of the sound produced by plucking strings. Focussing on the mechanics and wave theory, they obtain the quantitative relationship between string frequency and its deformation and elongation and use physical acoustic theory to quantitatively restore the way of playing musical instruments.</p><p>Zhang et al. proposed an optimising method for automatic determination of vocal tract linear prediction analysis order that follows the specific situation of different voicing scenes based on Iterative Adaptive Inverse Filtering (IAIF). They aim at obtaining a more accurate glottal wave from speech or singing voice signal in a non-invasive way. Compared with existing methods that use a fixed experience order, their proposed method can achieve up to 8.41% improvement in correlation coefficient with the real glottal wave.</p><p>Chen et al. constructed the first labelled extensive Music Video (MV) dataset, Next-MV consisting of 6000 pieces of 30-s MV fragments annotated with five music style labels and four cultural labels. Furthermore, they propose a Next-Net framework to study the correlation between the music style and visual style. The experimental accuracy reached 71.1% and the accuracy of the general fusion model in a cross-cultural experiment is between the model trained by within-dataset and by cross-dataset. It shows that culture has a significant influence on the correlation between music and visual.</p><p>Zhang et al. proposed a pipeline for performing a perceptual survey which is designed to explore how different musical elements influence people's perception of ‘Chinese style’ in music. Participants with various backgrounds were presented with categorised music excerpts performed in the Erhu or violin and then gave ‘Chinese style’ ratings. Statistical analysis indicates that music content contributes more than instruments in general, and musicians showed higher sensibility to both music content and instruments, and their responses are more concentrated than non-musicians. Furthermore, a supplementary automatic music classification experiment is conducted in comparison with the survey results to discuss the authors' choice of stimuli in the survey and similarities between computer auditory and human perception.</p><p>Chen et al. derived a new research model based on the environmental psychology model in the literature and designed an empirical experiment to examine changes in consumers' non-behavioural shopping outcomes under different conditions. Specifically, they build a virtual shopping website and chose the Mid-Autumn Festival as the experimental scenario in which a questionnaire is used to measure the differences in dependent variables formed by different treatments. The results show that the background music helps more positive shopping experiences regardless of its theme.</p><p>Xie et al. proposed an evaluation method of aesthetic categories of Chinese traditional music, established a dataset composed of 500 clips of five aesthetics categories and analysed the distribution characteristics of different aesthetic categories in the emotional dimension space. Furthermore, they tested the accuracy of different classifiers for aesthetic classification on this dataset by extracting corresponding acoustical features, and the highest classification accuracy was 65.37% by logistic regression.</p><p>Wang et al. proposed a subjective user study on the hardness of drum sound by taking the Bass Drum as an example. They studied the impact of different audio effects on the perception of hardness of the Bass Drum. The results show that appropriate low-frequency and high-frequency excitation processing will respectively weaken and increase the ear's perception of the hardness of the Bass Drum and the change of this perception is obvious. However, properly raising the base frequency of the Bass Drum or changing the sound envelope of the Bass Drum to create a faster ‘attack’ can increase the ear's perception of the hardness of the Bass Drum, but the degree of this perception is not obvious. Furthermore, changing the frequency and changing the envelope affect each other, and their interaction is also the main reason for changing the human ear's perception of the hardness of the Bass Drum.</p><p>All the papers selected for this Special Issue show it's important for music perception to music technology improvement. Most of the papers contain real-world validation with experimental data, and most of them contain and demonstrate innovative system design and processing solutions. In the meanwhile, there are still many challenges in this field that require future research attention. The future research work can help the potential of music technology extend its application and accelerate market adoption and application.</p><p>We would like to express our gratitude and congratulations to all the authors of the selected papers in this Special Issue of <i>IET Music Perception and Cognition in Music Technology</i> for their contributions of great value in terms of quality and innovation. We also thank all the reviewers for their contribution to the selection and improvement process of the publications in this Special Issue. Our hope is that this Special Issue will stimulate researchers in both industry and academia to undertake further research in this challenging field. We are also grateful to the <i>IET Cognitive Computation and Systems</i> Editor-in-Chief and the Editorial office for their support throughout the editorial process.</p>","PeriodicalId":33652,"journal":{"name":"Cognitive Computation and Systems","volume":null,"pages":null},"PeriodicalIF":1.2000,"publicationDate":"2022-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ccs2.12066","citationCount":"0","resultStr":"{\"title\":\"Guest editorial: Music perception and cognition in music technology\",\"authors\":\"Zijin Li,&nbsp;Stephen McAdams\",\"doi\":\"10.1049/ccs2.12066\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>There has been a remarkably increasing interest in music technology in the past few years, which is a multi-disciplinary overlapping research area. It involves digital signal processing, acoustics, mechanics, computer science, electronic engineering, artificial intelligence psychophysiology, cognitive neuroscience and music performance, theory and analysis. Among these sub-domains of music technology, Music Perception and Cognition are important parts of Computational Musicology as <i>Musiking</i> is a whole activity from music noumenon to being perceived and cognised by human beings. In addition to the calculation of basic elements of music itself, such as rhythm, pitch, timbre, harmony and structure, the perception of music to the human ear and the creative cognitive process should gain more attention from researchers because it serves as a bridge to join the humanity and technology.</p><p>Music perception exists in almost every aspect related to music, such as composing, playing, improvising, performing, teaching and learning. It is so comprehensive that a range of disciplines, including cognitive musicology, musical timbre perception, music emotions, acoustics, audio-based music signal processing, music interactive, cognitive modelling and music information retrieval, can be incorporated.</p><p>This special issue aims to bring together humanity and technology scientists in music technology in areas such as music performance art, creativity, computer science, experimental psychology, and cognitive science. It is composed of 10 outstanding contributions covering auditory attention selection behaviours, emotional music generation, instrument and performance skills recognition, perception and musical elements, music educational robots, affective computing, music-related social behaviour, and cross-cultural music dataset.</p><p>Li et al. studied the automatic recognition of traditional Chinese musical instrument audio. Specifically in the instrument type identification experiment, Mel-spectrum is used as input, and an 8-layer convolutional neural network is trained. This configuration achieves 99.3% accuracy; in the performance skills recognition experiments respectively conducted on single-instrument level and same-kind instruments level where the regularity of the same playing technique of different instruments can be utilised. The recognition accuracy of the four kinds of instruments is as follows: 95.7% for blowing instruments, 82.2% for plucked string instruments, 88.3% for strings instruments, and 97.5% for percussion instruments with a similar training procedure configuration.</p><p>Wang et al. used a cross-cultural approach to explore the correlations between perception and musical elements by comparing music emotion recognition models. In this approach, the participants are asked to rate valence, tension arousal and energy arousal on labelled nine-point analogical-categorical scales for four types of classical music: Chinese ensemble, Chinese solo, Western ensemble and Western solo. Fifteen musical elements in five categories—timbre, rhythm, articulation, dynamics and register were annotated through manual evaluation or the automatic algorithm. Results showed that tempo, rhythm complexity, and articulation are culturally universal, but musical elements related to timbre, register and dynamic features are culturally specific.</p><p>Du et al. proposed a multi-scale ASA model based on the binary Logit model by referencing the information value and saliency-driven factors of the listener's attention behaviour. The experiment for verification showed that the proposed ASA model was an effectively predicted human selective auditory attention feature. The improvement of the proposed ASA model with auditory attention research studies and traditional attention models is embodied in cognitive specialties that coincide more with the authentic auditory attention process and its application in the practical HMS optimisation. Furthermore, by adopting the proposed ASA model, auditory attention behaviour can be predicted before the task. This will help researchers analyse listeners' behaviours and evaluate the ergonomics in the ‘cocktail party effect’ environment.</p><p>Ma et al. proposed an emotional music generation model considering the structure features along with its emotional label. Specifically, the emotional labels with music structure features are embedded as the conditional input, a conditional generative GRU model is used for generating music in an auto-regressive manner and a perceptual loss is optimised with cross-entropy loss during the training procedure. Furthermore, both the subjective and objective experiments prove that their model can generate emotional music correlated to the specified emotion and music structures.</p><p>Jiang et al. analysed the mechanism of sound production in terms of the coupling of the edge tone and the air column's vibration in the tube. It was found through numerical simulations that the oscillation frequency of the edge tone increases with the jet velocity and jumps to another higher stage at certain values, and the dominant modes can be altered by varying the impinging jet angle. Furthermore, the tonal quality of the flue pipe is demonstrated to be dependent upon the changes in the oscillation frequency of the edge tone by the experiments of a musical pipe model. Greater amplitude and higher dominant frequencies are shown in the acoustic response of the flue pipe when increasing the jet velocity. With these properties, the flutist will obtain subtle variations in the perceived tonal quality through adjustment of the blowing velocity during the attack transient.</p><p>Li et al. presented the design and development of a virtual fretless Chinese stringed instrument App by taking the Duxianqin as an example. The digital simulation of fretless musical instruments consists of simulation of the continuous pitch processing of the strings, and the simulation of the sound produced by plucking strings. Focussing on the mechanics and wave theory, they obtain the quantitative relationship between string frequency and its deformation and elongation and use physical acoustic theory to quantitatively restore the way of playing musical instruments.</p><p>Zhang et al. proposed an optimising method for automatic determination of vocal tract linear prediction analysis order that follows the specific situation of different voicing scenes based on Iterative Adaptive Inverse Filtering (IAIF). They aim at obtaining a more accurate glottal wave from speech or singing voice signal in a non-invasive way. Compared with existing methods that use a fixed experience order, their proposed method can achieve up to 8.41% improvement in correlation coefficient with the real glottal wave.</p><p>Chen et al. constructed the first labelled extensive Music Video (MV) dataset, Next-MV consisting of 6000 pieces of 30-s MV fragments annotated with five music style labels and four cultural labels. Furthermore, they propose a Next-Net framework to study the correlation between the music style and visual style. The experimental accuracy reached 71.1% and the accuracy of the general fusion model in a cross-cultural experiment is between the model trained by within-dataset and by cross-dataset. It shows that culture has a significant influence on the correlation between music and visual.</p><p>Zhang et al. proposed a pipeline for performing a perceptual survey which is designed to explore how different musical elements influence people's perception of ‘Chinese style’ in music. Participants with various backgrounds were presented with categorised music excerpts performed in the Erhu or violin and then gave ‘Chinese style’ ratings. Statistical analysis indicates that music content contributes more than instruments in general, and musicians showed higher sensibility to both music content and instruments, and their responses are more concentrated than non-musicians. Furthermore, a supplementary automatic music classification experiment is conducted in comparison with the survey results to discuss the authors' choice of stimuli in the survey and similarities between computer auditory and human perception.</p><p>Chen et al. derived a new research model based on the environmental psychology model in the literature and designed an empirical experiment to examine changes in consumers' non-behavioural shopping outcomes under different conditions. Specifically, they build a virtual shopping website and chose the Mid-Autumn Festival as the experimental scenario in which a questionnaire is used to measure the differences in dependent variables formed by different treatments. The results show that the background music helps more positive shopping experiences regardless of its theme.</p><p>Xie et al. proposed an evaluation method of aesthetic categories of Chinese traditional music, established a dataset composed of 500 clips of five aesthetics categories and analysed the distribution characteristics of different aesthetic categories in the emotional dimension space. Furthermore, they tested the accuracy of different classifiers for aesthetic classification on this dataset by extracting corresponding acoustical features, and the highest classification accuracy was 65.37% by logistic regression.</p><p>Wang et al. proposed a subjective user study on the hardness of drum sound by taking the Bass Drum as an example. They studied the impact of different audio effects on the perception of hardness of the Bass Drum. The results show that appropriate low-frequency and high-frequency excitation processing will respectively weaken and increase the ear's perception of the hardness of the Bass Drum and the change of this perception is obvious. However, properly raising the base frequency of the Bass Drum or changing the sound envelope of the Bass Drum to create a faster ‘attack’ can increase the ear's perception of the hardness of the Bass Drum, but the degree of this perception is not obvious. Furthermore, changing the frequency and changing the envelope affect each other, and their interaction is also the main reason for changing the human ear's perception of the hardness of the Bass Drum.</p><p>All the papers selected for this Special Issue show it's important for music perception to music technology improvement. Most of the papers contain real-world validation with experimental data, and most of them contain and demonstrate innovative system design and processing solutions. In the meanwhile, there are still many challenges in this field that require future research attention. The future research work can help the potential of music technology extend its application and accelerate market adoption and application.</p><p>We would like to express our gratitude and congratulations to all the authors of the selected papers in this Special Issue of <i>IET Music Perception and Cognition in Music Technology</i> for their contributions of great value in terms of quality and innovation. We also thank all the reviewers for their contribution to the selection and improvement process of the publications in this Special Issue. Our hope is that this Special Issue will stimulate researchers in both industry and academia to undertake further research in this challenging field. We are also grateful to the <i>IET Cognitive Computation and Systems</i> Editor-in-Chief and the Editorial office for their support throughout the editorial process.</p>\",\"PeriodicalId\":33652,\"journal\":{\"name\":\"Cognitive Computation and Systems\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.2000,\"publicationDate\":\"2022-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ietresearch.onlinelibrary.wiley.com/doi/epdf/10.1049/ccs2.12066\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cognitive Computation and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1049/ccs2.12066\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Computation and Systems","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/ccs2.12066","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

摘要

在过去的几年里,人们对音乐技术的兴趣显著增加,这是一个多学科交叉的研究领域。它涉及数字信号处理、声学、力学、计算机科学、电子工程、人工智能心理生理学、认知神经科学和音乐表演、理论与分析。在音乐技术的这些子领域中,音乐感知和认知是计算音乐学的重要组成部分,因为音乐是一个从音乐本体到被人类感知和认知的整体活动。除了音乐本身的节奏、音高、音色、和声、结构等基本要素的计算之外,人耳对音乐的感知和创造性的认知过程应该得到研究者更多的关注,因为它是连接人文与技术的桥梁。乐感几乎存在于与音乐有关的各个方面,如作曲、演奏、即兴、表演、教学和学习。它是如此全面,以至于一系列学科,包括认知音乐学,音乐音色感知,音乐情感,声学,基于音频的音乐信号处理,音乐互动,认知建模和音乐信息检索,可以被纳入。这期特刊旨在汇集音乐技术领域的人文和技术科学家,如音乐表演艺术、创造力、计算机科学、实验心理学和认知科学。它由10个杰出的贡献组成,涵盖听觉注意选择行为、情感音乐生成、乐器和演奏技能识别、感知和音乐元素、音乐教育机器人、情感计算、音乐相关的社会行为和跨文化音乐数据集。Li等人研究了中国传统乐器音频的自动识别。具体在仪器类型识别实验中,采用mel谱作为输入,训练了一个8层卷积神经网络。该配置达到99.3%的准确率;分别在单乐器水平和同类乐器水平上进行演奏技巧识别实验,利用不同乐器相同演奏技巧的规律性。四种乐器的识别准确率分别为:吹奏95.7%,拨弦82.2%,弦乐器88.3%,打击乐器97.5%,训练程序配置相似。Wang等人通过比较音乐情感识别模型,采用跨文化方法探索感知与音乐元素之间的相关性。在这种方法中,参与者被要求对四种古典音乐(中国合奏、中国独奏、西方合奏和西方独奏)的效价、紧张唤醒和能量唤醒进行打分。通过人工评价或自动算法对音色、节奏、发音、动态和音域5类15个音乐元素进行标注。结果表明,节奏、节奏复杂性和发音在文化上是普遍的,但与音色、音域和动态特征相关的音乐元素在文化上是特定的。Du等人参考听者注意行为的信息价值和显著性驱动因素,提出了基于二元Logit模型的多尺度ASA模型。实验验证表明,本文提出的ASA模型能够有效预测人类选择性听觉注意特征。听觉注意研究和传统注意模型对ASA模型的改进体现在更符合真实听觉注意过程的认知特点及其在实际HMS优化中的应用。此外,采用本文提出的ASA模型,可以在任务开始前预测听觉注意行为。这将有助于研究人员分析听众的行为,并评估“鸡尾酒会效应”环境下的人体工程学。Ma等人提出了一种考虑结构特征及其情感标签的情感音乐生成模型。具体而言,将具有音乐结构特征的情感标签嵌入作为条件输入,使用条件生成GRU模型以自回归的方式生成音乐,并在训练过程中使用交叉熵损失优化感知损失。此外,主观和客观实验都证明了该模型能够生成与特定情感和音乐结构相关的情感音乐。Jiang等人从边缘音与管内空气柱振动耦合的角度分析了声音的产生机理。 通过数值模拟发现,边缘音的振荡频率随着射流速度的增加而增加,并在一定值时跳到另一个更高的阶段,并且可以通过改变冲击射流角度来改变主导模态。此外,通过音乐管模型的实验,证明了烟道的音质取决于边缘音振荡频率的变化。射流速度增大时,烟道声响应的幅值增大,主导频率增大。有了这些特性,在吹奏过程中,长笛演奏者可以通过调整吹奏速度来获得细微的音质变化。Li等人以独弦琴为例,介绍了虚拟无弦中国弦乐器App的设计与开发。无音乐器的数字仿真包括对琴弦连续音高处理的仿真和拨弦时产生的声音的仿真。他们以力学和波动理论为重点,获得了弦频与其变形伸长之间的定量关系,运用物理声学理论定量还原了乐器的演奏方式。Zhang等人提出了一种基于迭代自适应反滤波(IAIF)的根据不同发声场景的具体情况自动确定声道线性预测分析顺序的优化方法。他们的目标是以一种非侵入性的方式从说话或唱歌的声音信号中获得更准确的声门波。与现有使用固定经验阶的方法相比,该方法与真实声门波的相关系数提高了8.41%。Chen等人构建了第一个有标签的广泛音乐视频(MV)数据集Next-MV,由6000个30秒MV片段组成,用5个音乐风格标签和4个文化标签进行了注释。此外,他们提出了一个Next-Net框架来研究音乐风格和视觉风格之间的相关性。实验准确率达到71.1%,跨文化实验中一般融合模型的准确率介于数据集内和跨数据集训练的模型之间。研究表明,文化对音乐与视觉的相关性有显著影响。Zhang等人提出了一种进行感知调查的管道,旨在探索不同的音乐元素如何影响人们对音乐中“中国风格”的感知。研究人员向不同背景的参与者展示了用二胡或小提琴演奏的分类音乐片段,然后给出了“中国风格”的评分。统计分析表明,总体而言,音乐内容的贡献大于乐器,音乐家对音乐内容和乐器的敏感性更高,且其反应比非音乐家更集中。在此基础上,进行了音乐自动分类实验,并与调查结果进行了对比,讨论了作者在调查中对刺激的选择以及计算机听觉与人类感知的相似性。Chen等人在文献中的环境心理学模型的基础上推导出新的研究模型,设计了一个实证实验来考察不同条件下消费者非行为性购物结果的变化。具体而言,他们建立了一个虚拟购物网站,并选择中秋节作为实验场景,使用问卷来衡量不同处理形成的因变量的差异。结果表明,无论背景音乐的主题是什么,背景音乐都能带来更积极的购物体验。Xie等人提出了一种中国传统音乐审美类别的评价方法,建立了由5个审美类别的500个片段组成的数据集,分析了不同审美类别在情感维度空间中的分布特征。此外,他们通过提取相应的声学特征,在该数据集上测试了不同分类器对美学分类的准确性,通过逻辑回归的分类准确率最高为65.37%。Wang等人以Bass drum为例,提出了对鼓声硬度的主观用户研究。他们研究了不同音频效果对低音鼓硬度感知的影响。结果表明,适当的低频和高频激励处理将分别减弱和增强耳朵对低音鼓硬度的感知,并且这种感知的变化是明显的。 然而,适当提高低音鼓的基频或改变低音鼓的包络以创造更快的“攻击”,可以增加耳朵对低音鼓硬度的感知,但这种感知的程度并不明显。此外,改变频率和改变包络是相互影响的,它们的相互作用也是改变人耳对Bass Drum硬度感知的主要原因。本次特刊所选的所有论文都表明,音乐感知对音乐技术进步的重要性。大多数论文包含真实世界的实验数据验证,其中大多数包含并展示了创新的系统设计和处理解决方案。与此同时,该领域仍存在许多挑战,需要进一步研究。未来的研究工作可以帮助音乐技术的潜力扩大其应用范围,加速市场的采用和应用。在此,我们向本期《IET音乐感知与音乐技术认知》特刊中入选论文的所有作者表示感谢和祝贺,感谢他们在质量和创新方面做出的巨大贡献。我们也感谢所有审稿人对本期特刊出版物的选择和改进过程所做的贡献。我们希望这期特刊能够激励工业界和学术界的研究人员在这个具有挑战性的领域进行进一步的研究。我们也感谢IET认知计算与系统总编辑和编辑部在整个编辑过程中的支持。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Guest editorial: Music perception and cognition in music technology

There has been a remarkably increasing interest in music technology in the past few years, which is a multi-disciplinary overlapping research area. It involves digital signal processing, acoustics, mechanics, computer science, electronic engineering, artificial intelligence psychophysiology, cognitive neuroscience and music performance, theory and analysis. Among these sub-domains of music technology, Music Perception and Cognition are important parts of Computational Musicology as Musiking is a whole activity from music noumenon to being perceived and cognised by human beings. In addition to the calculation of basic elements of music itself, such as rhythm, pitch, timbre, harmony and structure, the perception of music to the human ear and the creative cognitive process should gain more attention from researchers because it serves as a bridge to join the humanity and technology.

Music perception exists in almost every aspect related to music, such as composing, playing, improvising, performing, teaching and learning. It is so comprehensive that a range of disciplines, including cognitive musicology, musical timbre perception, music emotions, acoustics, audio-based music signal processing, music interactive, cognitive modelling and music information retrieval, can be incorporated.

This special issue aims to bring together humanity and technology scientists in music technology in areas such as music performance art, creativity, computer science, experimental psychology, and cognitive science. It is composed of 10 outstanding contributions covering auditory attention selection behaviours, emotional music generation, instrument and performance skills recognition, perception and musical elements, music educational robots, affective computing, music-related social behaviour, and cross-cultural music dataset.

Li et al. studied the automatic recognition of traditional Chinese musical instrument audio. Specifically in the instrument type identification experiment, Mel-spectrum is used as input, and an 8-layer convolutional neural network is trained. This configuration achieves 99.3% accuracy; in the performance skills recognition experiments respectively conducted on single-instrument level and same-kind instruments level where the regularity of the same playing technique of different instruments can be utilised. The recognition accuracy of the four kinds of instruments is as follows: 95.7% for blowing instruments, 82.2% for plucked string instruments, 88.3% for strings instruments, and 97.5% for percussion instruments with a similar training procedure configuration.

Wang et al. used a cross-cultural approach to explore the correlations between perception and musical elements by comparing music emotion recognition models. In this approach, the participants are asked to rate valence, tension arousal and energy arousal on labelled nine-point analogical-categorical scales for four types of classical music: Chinese ensemble, Chinese solo, Western ensemble and Western solo. Fifteen musical elements in five categories—timbre, rhythm, articulation, dynamics and register were annotated through manual evaluation or the automatic algorithm. Results showed that tempo, rhythm complexity, and articulation are culturally universal, but musical elements related to timbre, register and dynamic features are culturally specific.

Du et al. proposed a multi-scale ASA model based on the binary Logit model by referencing the information value and saliency-driven factors of the listener's attention behaviour. The experiment for verification showed that the proposed ASA model was an effectively predicted human selective auditory attention feature. The improvement of the proposed ASA model with auditory attention research studies and traditional attention models is embodied in cognitive specialties that coincide more with the authentic auditory attention process and its application in the practical HMS optimisation. Furthermore, by adopting the proposed ASA model, auditory attention behaviour can be predicted before the task. This will help researchers analyse listeners' behaviours and evaluate the ergonomics in the ‘cocktail party effect’ environment.

Ma et al. proposed an emotional music generation model considering the structure features along with its emotional label. Specifically, the emotional labels with music structure features are embedded as the conditional input, a conditional generative GRU model is used for generating music in an auto-regressive manner and a perceptual loss is optimised with cross-entropy loss during the training procedure. Furthermore, both the subjective and objective experiments prove that their model can generate emotional music correlated to the specified emotion and music structures.

Jiang et al. analysed the mechanism of sound production in terms of the coupling of the edge tone and the air column's vibration in the tube. It was found through numerical simulations that the oscillation frequency of the edge tone increases with the jet velocity and jumps to another higher stage at certain values, and the dominant modes can be altered by varying the impinging jet angle. Furthermore, the tonal quality of the flue pipe is demonstrated to be dependent upon the changes in the oscillation frequency of the edge tone by the experiments of a musical pipe model. Greater amplitude and higher dominant frequencies are shown in the acoustic response of the flue pipe when increasing the jet velocity. With these properties, the flutist will obtain subtle variations in the perceived tonal quality through adjustment of the blowing velocity during the attack transient.

Li et al. presented the design and development of a virtual fretless Chinese stringed instrument App by taking the Duxianqin as an example. The digital simulation of fretless musical instruments consists of simulation of the continuous pitch processing of the strings, and the simulation of the sound produced by plucking strings. Focussing on the mechanics and wave theory, they obtain the quantitative relationship between string frequency and its deformation and elongation and use physical acoustic theory to quantitatively restore the way of playing musical instruments.

Zhang et al. proposed an optimising method for automatic determination of vocal tract linear prediction analysis order that follows the specific situation of different voicing scenes based on Iterative Adaptive Inverse Filtering (IAIF). They aim at obtaining a more accurate glottal wave from speech or singing voice signal in a non-invasive way. Compared with existing methods that use a fixed experience order, their proposed method can achieve up to 8.41% improvement in correlation coefficient with the real glottal wave.

Chen et al. constructed the first labelled extensive Music Video (MV) dataset, Next-MV consisting of 6000 pieces of 30-s MV fragments annotated with five music style labels and four cultural labels. Furthermore, they propose a Next-Net framework to study the correlation between the music style and visual style. The experimental accuracy reached 71.1% and the accuracy of the general fusion model in a cross-cultural experiment is between the model trained by within-dataset and by cross-dataset. It shows that culture has a significant influence on the correlation between music and visual.

Zhang et al. proposed a pipeline for performing a perceptual survey which is designed to explore how different musical elements influence people's perception of ‘Chinese style’ in music. Participants with various backgrounds were presented with categorised music excerpts performed in the Erhu or violin and then gave ‘Chinese style’ ratings. Statistical analysis indicates that music content contributes more than instruments in general, and musicians showed higher sensibility to both music content and instruments, and their responses are more concentrated than non-musicians. Furthermore, a supplementary automatic music classification experiment is conducted in comparison with the survey results to discuss the authors' choice of stimuli in the survey and similarities between computer auditory and human perception.

Chen et al. derived a new research model based on the environmental psychology model in the literature and designed an empirical experiment to examine changes in consumers' non-behavioural shopping outcomes under different conditions. Specifically, they build a virtual shopping website and chose the Mid-Autumn Festival as the experimental scenario in which a questionnaire is used to measure the differences in dependent variables formed by different treatments. The results show that the background music helps more positive shopping experiences regardless of its theme.

Xie et al. proposed an evaluation method of aesthetic categories of Chinese traditional music, established a dataset composed of 500 clips of five aesthetics categories and analysed the distribution characteristics of different aesthetic categories in the emotional dimension space. Furthermore, they tested the accuracy of different classifiers for aesthetic classification on this dataset by extracting corresponding acoustical features, and the highest classification accuracy was 65.37% by logistic regression.

Wang et al. proposed a subjective user study on the hardness of drum sound by taking the Bass Drum as an example. They studied the impact of different audio effects on the perception of hardness of the Bass Drum. The results show that appropriate low-frequency and high-frequency excitation processing will respectively weaken and increase the ear's perception of the hardness of the Bass Drum and the change of this perception is obvious. However, properly raising the base frequency of the Bass Drum or changing the sound envelope of the Bass Drum to create a faster ‘attack’ can increase the ear's perception of the hardness of the Bass Drum, but the degree of this perception is not obvious. Furthermore, changing the frequency and changing the envelope affect each other, and their interaction is also the main reason for changing the human ear's perception of the hardness of the Bass Drum.

All the papers selected for this Special Issue show it's important for music perception to music technology improvement. Most of the papers contain real-world validation with experimental data, and most of them contain and demonstrate innovative system design and processing solutions. In the meanwhile, there are still many challenges in this field that require future research attention. The future research work can help the potential of music technology extend its application and accelerate market adoption and application.

We would like to express our gratitude and congratulations to all the authors of the selected papers in this Special Issue of IET Music Perception and Cognition in Music Technology for their contributions of great value in terms of quality and innovation. We also thank all the reviewers for their contribution to the selection and improvement process of the publications in this Special Issue. Our hope is that this Special Issue will stimulate researchers in both industry and academia to undertake further research in this challenging field. We are also grateful to the IET Cognitive Computation and Systems Editor-in-Chief and the Editorial office for their support throughout the editorial process.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cognitive Computation and Systems
Cognitive Computation and Systems Computer Science-Computer Science Applications
CiteScore
2.50
自引率
0.00%
发文量
39
审稿时长
10 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信