{"title":"The Interpretation Gap in Text-to-Music Generation Models","authors":"Yongyi Zang, Yixiao Zhang","doi":"arxiv-2407.10328","DOIUrl":null,"url":null,"abstract":"Large-scale text-to-music generation models have significantly enhanced music\ncreation capabilities, offering unprecedented creative freedom. However, their\nability to collaborate effectively with human musicians remains limited. In\nthis paper, we propose a framework to describe the musical interaction process,\nwhich includes expression, interpretation, and execution of controls. Following\nthis framework, we argue that the primary gap between existing text-to-music\nmodels and musicians lies in the interpretation stage, where models lack the\nability to interpret controls from musicians. We also propose two strategies to\naddress this gap and call on the music information retrieval community to\ntackle the interpretation challenge to improve human-AI musical collaboration.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.10328","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Large-scale text-to-music generation models have significantly enhanced music
creation capabilities, offering unprecedented creative freedom. However, their
ability to collaborate effectively with human musicians remains limited. In
this paper, we propose a framework to describe the musical interaction process,
which includes expression, interpretation, and execution of controls. Following
this framework, we argue that the primary gap between existing text-to-music
models and musicians lies in the interpretation stage, where models lack the
ability to interpret controls from musicians. We also propose two strategies to
address this gap and call on the music information retrieval community to
tackle the interpretation challenge to improve human-AI musical collaboration.