SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers

IF 2.7 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE open journal of signal processing Pub Date : 2025-01-28 DOI:10.1109/OJSP.2025.3534686

Junghyun Koo;Gordon Wichern;François G. Germain;Sameer Khurana;Jonathan Le Roux

{"title":"SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers","authors":"Junghyun Koo;Gordon Wichern;François G. Germain;Sameer Khurana;Jonathan Le Roux","doi":"10.1109/OJSP.2025.3534686","DOIUrl":null,"url":null,"abstract":"We introduce Self-Monitored Inference-Time INtervention (SMITIN), an approach for controlling an autoregressive generative music transformer using classifier probes. These simple logistic regression probes are trained on the output of each attention head in the transformer using a small dataset of audio examples both exhibiting and missing a specific musical trait (e.g., the presence/absence of drums, or real/synthetic music). We then steer the attention heads in the probe direction, ensuring the generative model output captures the desired musical trait. Additionally, we monitor the probe output to avoid adding an excessive amount of intervention into the autoregressive generation, which could lead to temporally incoherent music. We validate our results objectively and subjectively for both audio continuation and text-to-music applications, demonstrating the ability to add controls to large generative models for which retraining or even fine-tuning is impractical for most musicians. Audio samples of the proposed intervention approach are available on our <underline>demo page</u>.","PeriodicalId":73300,"journal":{"name":"IEEE open journal of signal processing","volume":"6 ","pages":"266-275"},"PeriodicalIF":2.7000,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10856829","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of signal processing","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10856829/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

We introduce Self-Monitored Inference-Time INtervention (SMITIN), an approach for controlling an autoregressive generative music transformer using classifier probes. These simple logistic regression probes are trained on the output of each attention head in the transformer using a small dataset of audio examples both exhibiting and missing a specific musical trait (e.g., the presence/absence of drums, or real/synthetic music). We then steer the attention heads in the probe direction, ensuring the generative model output captures the desired musical trait. Additionally, we monitor the probe output to avoid adding an excessive amount of intervention into the autoregressive generation, which could lead to temporally incoherent music. We validate our results objectively and subjectively for both audio continuation and text-to-music applications, demonstrating the ability to add controls to large generative models for which retraining or even fine-tuning is impractical for most musicians. Audio samples of the proposed intervention approach are available on our demo page.

查看原文本刊更多论文

生成音乐变形器的自我监控推理时间干预

我们介绍了自我监控推理时间干预（SMITIN），一种使用分类器探针控制自回归生成音乐转换器的方法。这些简单的逻辑回归探针使用展示和缺少特定音乐特征（例如，鼓的存在/缺失，或真实/合成音乐）的音频示例的小数据集在变压器中的每个注意力头部的输出上进行训练。然后，我们将注意力转向探针方向，确保生成模型输出捕获所需的音乐特征。此外，我们监控探头输出，以避免在自回归生成中添加过多的干预，这可能导致暂时不连贯的音乐。我们客观和主观地验证了音频延续和文本到音乐应用程序的结果，展示了将控制添加到大型生成模型的能力，对于大多数音乐家来说，重新训练甚至微调都是不切实际的。建议的干预方法的音频样本可以在我们的演示页面上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊