{"title":"The NES Video-Music Database: A Dataset of Symbolic Video Game Music Paired with Gameplay Videos","authors":"Igor Cardoso, Rubens O. Moraes, Lucas N. Ferreira","doi":"arxiv-2404.04420","DOIUrl":null,"url":null,"abstract":"Neural models are one of the most popular approaches for music generation,\nyet there aren't standard large datasets tailored for learning music directly\nfrom game data. To address this research gap, we introduce a novel dataset\nnamed NES-VMDB, containing 98,940 gameplay videos from 389 NES games, each\npaired with its original soundtrack in symbolic format (MIDI). NES-VMDB is\nbuilt upon the Nintendo Entertainment System Music Database (NES-MDB),\nencompassing 5,278 music pieces from 397 NES games. Our approach involves\ncollecting long-play videos for 389 games of the original dataset, slicing them\ninto 15-second-long clips, and extracting the audio from each clip.\nSubsequently, we apply an audio fingerprinting algorithm (similar to Shazam) to\nautomatically identify the corresponding piece in the NES-MDB dataset.\nAdditionally, we introduce a baseline method based on the Controllable Music\nTransformer to generate NES music conditioned on gameplay clips. We evaluated\nthis approach with objective metrics, and the results showed that the\nconditional CMT improves musical structural quality when compared to its\nunconditional counterpart. Moreover, we used a neural classifier to predict the\ngame genre of the generated pieces. Results showed that the CMT generator can\nlearn correlations between gameplay videos and game genres, but further\nresearch has to be conducted to achieve human-level performance.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":"85 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Sound","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.04420","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Neural models are one of the most popular approaches for music generation,
yet there aren't standard large datasets tailored for learning music directly
from game data. To address this research gap, we introduce a novel dataset
named NES-VMDB, containing 98,940 gameplay videos from 389 NES games, each
paired with its original soundtrack in symbolic format (MIDI). NES-VMDB is
built upon the Nintendo Entertainment System Music Database (NES-MDB),
encompassing 5,278 music pieces from 397 NES games. Our approach involves
collecting long-play videos for 389 games of the original dataset, slicing them
into 15-second-long clips, and extracting the audio from each clip.
Subsequently, we apply an audio fingerprinting algorithm (similar to Shazam) to
automatically identify the corresponding piece in the NES-MDB dataset.
Additionally, we introduce a baseline method based on the Controllable Music
Transformer to generate NES music conditioned on gameplay clips. We evaluated
this approach with objective metrics, and the results showed that the
conditional CMT improves musical structural quality when compared to its
unconditional counterpart. Moreover, we used a neural classifier to predict the
game genre of the generated pieces. Results showed that the CMT generator can
learn correlations between gameplay videos and game genres, but further
research has to be conducted to achieve human-level performance.