arXiv - CS - Sound最新文献

筛选
英文 中文
Guitar Chord Diagram Suggestion for Western Popular Music 西方流行音乐吉他和弦图建议
arXiv - CS - Sound Pub Date : 2024-07-15 DOI: arxiv-2407.14260
Alexandre d'HoogeLaBRI, SCRIME, Louis BigoLaBRI, SCRIME, Ken Déguernel, Nicolas Martin
{"title":"Guitar Chord Diagram Suggestion for Western Popular Music","authors":"Alexandre d'HoogeLaBRI, SCRIME, Louis BigoLaBRI, SCRIME, Ken Déguernel, Nicolas Martin","doi":"arxiv-2407.14260","DOIUrl":"https://doi.org/arxiv-2407.14260","url":null,"abstract":"Chord diagrams are used by guitar players to show where and how to play a\u0000chord on the fretboard. They are useful to beginners learning chords or for\u0000sharing the hand positions required to play a song.However, the diagrams\u0000presented on guitar learning toolsare usually selected from an existing\u0000databaseand rarely represent the actual positions used by performers.In this\u0000paper, we propose a tool which suggests a chord diagram for achord label,taking\u0000into account the diagram of the previous chord.Based on statistical analysis of\u0000the DadaGP and mySongBook datasets, we show that some chord diagrams are\u0000over-represented in western popular musicand that some chords can be played in\u0000more than 20 different ways.We argue that taking context into account can\u0000improve the variety and the quality of chord diagram suggestion, and compare\u0000this approach with a model taking only the current chord label into account.We\u0000show that adding previous context improves the F1-score on this task by up to\u000027% and reduces the propensity of the model to suggest standard open chords.We\u0000also define the notion of texture in the context of chord diagrams andshow\u0000through a variety of metrics that our model improves textureconsistencywith the\u0000previous diagram.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion 通过视觉场景驱动扩散进行声学匹配和消除混响的相互学习
arXiv - CS - Sound Pub Date : 2024-07-15 DOI: arxiv-2407.10373
Jian Ma, Wenguan Wang, Yi Yang, Feng Zheng
{"title":"Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion","authors":"Jian Ma, Wenguan Wang, Yi Yang, Feng Zheng","doi":"arxiv-2407.10373","DOIUrl":"https://doi.org/arxiv-2407.10373","url":null,"abstract":"Visual acoustic matching (VAM) is pivotal for enhancing the immersive\u0000experience, and the task of dereverberation is effective in improving audio\u0000intelligibility. Existing methods treat each task independently, overlooking\u0000the inherent reciprocity between them. Moreover, these methods depend on paired\u0000training data, which is challenging to acquire, impeding the utilization of\u0000extensive unpaired data. In this paper, we introduce MVSD, a mutual learning\u0000framework based on diffusion models. MVSD considers the two tasks\u0000symmetrically, exploiting the reciprocal relationship to facilitate learning\u0000from inverse tasks and overcome data scarcity. Furthermore, we employ the\u0000diffusion model as foundational conditional converters to circumvent the\u0000training instability and over-smoothing drawbacks of conventional GAN\u0000architectures. Specifically, MVSD employs two converters: one for VAM called\u0000reverberator and one for dereverberation called dereverberator. The\u0000dereverberator judges whether the reverberation audio generated by reverberator\u0000sounds like being in the conditional visual scenario, and vice versa. By\u0000forming a closed loop, these two converters can generate informative feedback\u0000signals to optimize the inverse tasks, even with easily acquired one-way\u0000unpaired data. Extensive experiments on two standard benchmarks, i.e.,\u0000SoundSpaces-Speech and Acoustic AVSpeech, exhibit that our framework can\u0000improve the performance of the reverberator and dereverberator and better match\u0000specified visual scenarios.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141718912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DDFAD: Dataset Distillation Framework for Audio Data DDFAD:音频数据的数据集蒸馏框架
arXiv - CS - Sound Pub Date : 2024-07-15 DOI: arxiv-2407.10446
Wenbo Jiang, Rui Zhang, Hongwei Li, Xiaoyuan Liu, Haomiao Yang, Shui Yu
{"title":"DDFAD: Dataset Distillation Framework for Audio Data","authors":"Wenbo Jiang, Rui Zhang, Hongwei Li, Xiaoyuan Liu, Haomiao Yang, Shui Yu","doi":"arxiv-2407.10446","DOIUrl":"https://doi.org/arxiv-2407.10446","url":null,"abstract":"Deep neural networks (DNNs) have achieved significant success in numerous\u0000applications. The remarkable performance of DNNs is largely attributed to the\u0000availability of massive, high-quality training datasets. However, processing\u0000such massive training data requires huge computational and storage resources.\u0000Dataset distillation is a promising solution to this problem, offering the\u0000capability to compress a large dataset into a smaller distilled dataset. The\u0000model trained on the distilled dataset can achieve comparable performance to\u0000the model trained on the whole dataset. While dataset distillation has been demonstrated in image data, none have\u0000explored dataset distillation for audio data. In this work, for the first time,\u0000we propose a Dataset Distillation Framework for Audio Data (DDFAD).\u0000Specifically, we first propose the Fused Differential MFCC (FD-MFCC) as\u0000extracted features for audio data. After that, the FD-MFCC is distilled through\u0000the matching training trajectory distillation method. Finally, we propose an\u0000audio signal reconstruction algorithm based on the Griffin-Lim Algorithm to\u0000reconstruct the audio signal from the distilled FD-MFCC. Extensive experiments\u0000demonstrate the effectiveness of DDFAD on various audio datasets. In addition,\u0000we show that DDFAD has promising application prospects in many applications,\u0000such as continual learning and neural architecture search.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141722118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features BandControlNet:基于并行变换器的可转向流行音乐生成与细粒度时空特征
arXiv - CS - Sound Pub Date : 2024-07-15 DOI: arxiv-2407.10462
Jing Luo, Xinyu Yang, Dorien Herremans
{"title":"BandControlNet: Parallel Transformers-based Steerable Popular Music Generation with Fine-Grained Spatiotemporal Features","authors":"Jing Luo, Xinyu Yang, Dorien Herremans","doi":"arxiv-2407.10462","DOIUrl":"https://doi.org/arxiv-2407.10462","url":null,"abstract":"Controllable music generation promotes the interaction between humans and\u0000composition systems by projecting the users' intent on their desired music. The\u0000challenge of introducing controllability is an increasingly important issue in\u0000the symbolic music generation field. When building controllable generative\u0000popular multi-instrument music systems, two main challenges typically present\u0000themselves, namely weak controllability and poor music quality. To address\u0000these issues, we first propose spatiotemporal features as powerful and\u0000fine-grained controls to enhance the controllability of the generative model.\u0000In addition, an efficient music representation called REMI_Track is designed to\u0000convert multitrack music into multiple parallel music sequences and shorten the\u0000sequence length of each track with Byte Pair Encoding (BPE) techniques.\u0000Subsequently, we release BandControlNet, a conditional model based on parallel\u0000Transformers, to tackle the multiple music sequences and generate high-quality\u0000music samples that are conditioned to the given spatiotemporal control\u0000features. More concretely, the two specially designed modules of\u0000BandControlNet, namely structure-enhanced self-attention (SE-SA) and\u0000Cross-Track Transformer (CTT), are utilized to strengthen the resulting musical\u0000structure and inter-track harmony modeling respectively. Experimental results\u0000tested on two popular music datasets of different lengths demonstrate that the\u0000proposed BandControlNet outperforms other conditional music generation models\u0000on most objective metrics in terms of fidelity and inference speed and shows\u0000great robustness in generating long music samples. The subjective evaluations\u0000show BandControlNet trained on short datasets can generate music with\u0000comparable quality to state-of-the-art models, while outperforming them\u0000significantly using longer datasets.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141722117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity 具有增强同步性的屏蔽式生成视频音频转换器
arXiv - CS - Sound Pub Date : 2024-07-15 DOI: arxiv-2407.10387
Santiago Pascual, Chunghsin Yeh, Ioannis Tsiamas, Joan Serrà
{"title":"Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity","authors":"Santiago Pascual, Chunghsin Yeh, Ioannis Tsiamas, Joan Serrà","doi":"arxiv-2407.10387","DOIUrl":"https://doi.org/arxiv-2407.10387","url":null,"abstract":"Video-to-audio (V2A) generation leverages visual-only video features to\u0000render plausible sounds that match the scene. Importantly, the generated sound\u0000onsets should match the visual actions that are aligned with them, otherwise\u0000unnatural synchronization artifacts arise. Recent works have explored the\u0000progression of conditioning sound generators on still images and then video\u0000features, focusing on quality and semantic matching while ignoring\u0000synchronization, or by sacrificing some amount of quality to focus on improving\u0000synchronization only. In this work, we propose a V2A generative model, named\u0000MaskVAT, that interconnects a full-band high-quality general audio codec with a\u0000sequence-to-sequence masked generative model. This combination allows modeling\u0000both high audio quality, semantic matching, and temporal synchronicity at the\u0000same time. Our results show that, by combining a high-quality codec with the\u0000proper pre-trained audio-visual features and a sequence-to-sequence parallel\u0000structure, we are able to yield highly synchronized results on one hand, whilst\u0000being competitive with the state of the art of non-codec generative audio\u0000models. Sample videos and generated audios are available at\u0000https://maskvat.github.io .","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141718913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification Whisper-SV:为低数据资源扬声器验证调整 Whisper
arXiv - CS - Sound Pub Date : 2024-07-14 DOI: arxiv-2407.10048
Li Zhang, Ning Jiang, Qing Wang, Yue Li, Quan Lu, Lei Xie
{"title":"Whisper-SV: Adapting Whisper for Low-data-resource Speaker Verification","authors":"Li Zhang, Ning Jiang, Qing Wang, Yue Li, Quan Lu, Lei Xie","doi":"arxiv-2407.10048","DOIUrl":"https://doi.org/arxiv-2407.10048","url":null,"abstract":"Trained on 680,000 hours of massive speech data, Whisper is a multitasking,\u0000multilingual speech foundation model demonstrating superior performance in\u0000automatic speech recognition, translation, and language identification.\u0000However, its applicability in speaker verification (SV) tasks remains\u0000unexplored, particularly in low-data-resource scenarios where labeled speaker\u0000data in specific domains are limited. To fill this gap, we propose a\u0000lightweight adaptor framework to boost SV with Whisper, namely Whisper-SV.\u0000Given that Whisper is not specifically optimized for SV tasks, we introduce a\u0000representation selection module to quantify the speaker-specific\u0000characteristics contained in each layer of Whisper and select the top-k layers\u0000with prominent discriminative speaker features. To aggregate pivotal\u0000speaker-related features while diminishing non-speaker redundancies across the\u0000selected top-k distinct layers of Whisper, we design a multi-layer aggregation\u0000module in Whisper-SV to integrate multi-layer representations into a singular,\u0000compacted representation for SV. In the multi-layer aggregation module, we\u0000employ convolutional layers with shortcut connections among different layers to\u0000refine speaker characteristics derived from multi-layer representations from\u0000Whisper. In addition, an attention aggregation layer is used to reduce\u0000non-speaker interference and amplify speaker-specific cues for SV tasks.\u0000Finally, a simple classification module is used for speaker classification.\u0000Experiments on VoxCeleb1, FFSVC, and IMSV datasets demonstrate that Whisper-SV\u0000achieves EER/minDCF of 2.22%/0.307, 6.14%/0.488, and 7.50%/0.582, respectively,\u0000showing superior performance in low-data-resource SV scenarios.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141718915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Interpretation Gap in Text-to-Music Generation Models 文本到音乐生成模型中的解读差距
arXiv - CS - Sound Pub Date : 2024-07-14 DOI: arxiv-2407.10328
Yongyi Zang, Yixiao Zhang
{"title":"The Interpretation Gap in Text-to-Music Generation Models","authors":"Yongyi Zang, Yixiao Zhang","doi":"arxiv-2407.10328","DOIUrl":"https://doi.org/arxiv-2407.10328","url":null,"abstract":"Large-scale text-to-music generation models have significantly enhanced music\u0000creation capabilities, offering unprecedented creative freedom. However, their\u0000ability to collaborate effectively with human musicians remains limited. In\u0000this paper, we propose a framework to describe the musical interaction process,\u0000which includes expression, interpretation, and execution of controls. Following\u0000this framework, we argue that the primary gap between existing text-to-music\u0000models and musicians lies in the interpretation stage, where models lack the\u0000ability to interpret controls from musicians. We also propose two strategies to\u0000address this gap and call on the music information retrieval community to\u0000tackle the interpretation challenge to improve human-AI musical collaboration.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141722119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating Voice Command Pipelines for Drone Control: From STT and LLM to Direct Classification and Siamese Networks 评估用于无人机控制的语音命令管道:从 STT 和 LLM 到直接分类和连体网络
arXiv - CS - Sound Pub Date : 2024-07-10 DOI: arxiv-2407.08658
Lucca Emmanuel Pineli Simões, Lucas Brandão Rodrigues, Rafaela Mota Silva, Gustavo Rodrigues da Silva
{"title":"Evaluating Voice Command Pipelines for Drone Control: From STT and LLM to Direct Classification and Siamese Networks","authors":"Lucca Emmanuel Pineli Simões, Lucas Brandão Rodrigues, Rafaela Mota Silva, Gustavo Rodrigues da Silva","doi":"arxiv-2407.08658","DOIUrl":"https://doi.org/arxiv-2407.08658","url":null,"abstract":"This paper presents the development and comparative evaluation of three voice\u0000command pipelines for controlling a Tello drone, using speech recognition and\u0000deep learning techniques. The aim is to enhance human-machine interaction by\u0000enabling intuitive voice control of drone actions. The pipelines developed\u0000include: (1) a traditional Speech-to-Text (STT) followed by a Large Language\u0000Model (LLM) approach, (2) a direct voice-to-function mapping model, and (3) a\u0000Siamese neural network-based system. Each pipeline was evaluated based on\u0000inference time, accuracy, efficiency, and flexibility. Detailed methodologies,\u0000dataset preparation, and evaluation metrics are provided, offering a\u0000comprehensive analysis of each pipeline's strengths and applicability across\u0000different scenarios.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141609941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Benchmark for Multi-speaker Anonymization 多扬声器匿名化基准
arXiv - CS - Sound Pub Date : 2024-07-08 DOI: arxiv-2407.05608
Xiaoxiao Miao, Ruijie Tao, Chang Zeng, Xin Wang
{"title":"A Benchmark for Multi-speaker Anonymization","authors":"Xiaoxiao Miao, Ruijie Tao, Chang Zeng, Xin Wang","doi":"arxiv-2407.05608","DOIUrl":"https://doi.org/arxiv-2407.05608","url":null,"abstract":"Privacy-preserving voice protection approaches primarily suppress\u0000privacy-related information derived from paralinguistic attributes while\u0000preserving the linguistic content. Existing solutions focus on single-speaker\u0000scenarios. However, they lack practicality for real-world applications, i.e.,\u0000multi-speaker scenarios. In this paper, we present an initial attempt to\u0000provide a multi-speaker anonymization benchmark by defining the task and\u0000evaluation protocol, proposing benchmarking solutions, and discussing the\u0000privacy leakage of overlapping conversations. Specifically, ideal multi-speaker\u0000anonymization should preserve the number of speakers and the turn-taking\u0000structure of the conversation, ensuring accurate context conveyance while\u0000maintaining privacy. To achieve that, a cascaded system uses speaker\u0000diarization to aggregate the speech of each speaker and speaker anonymization\u0000to conceal speaker privacy and preserve speech content. Additionally, we\u0000propose two conversation-level speaker vector anonymization methods to improve\u0000the utility further. Both methods aim to make the original and corresponding\u0000pseudo-speaker identities of each speaker unlinkable while preserving or even\u0000improving the distinguishability among pseudo-speakers in a conversation. The\u0000first method minimizes the differential similarity across speaker pairs in the\u0000original and anonymized conversations to maintain original speaker\u0000relationships in the anonymized version. The other method minimizes the\u0000aggregated similarity across anonymized speakers to achieve better\u0000differentiation between speakers. Experiments conducted on both non-overlap\u0000simulated and real-world datasets demonstrate the effectiveness of the\u0000multi-speaker anonymization system with the proposed speaker anonymizers.\u0000Additionally, we analyzed overlapping speech regarding privacy leakage and\u0000provide potential solutions.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141575856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MERGE -- A Bimodal Dataset for Static Music Emotion Recognition MERGE -- 用于静态音乐情感识别的双模数据集
arXiv - CS - Sound Pub Date : 2024-07-08 DOI: arxiv-2407.06060
Pedro Lima Louro, Hugo Redinho, Ricardo Santos, Ricardo Malheiro, Renato Panda, Rui Pedro Paiva
{"title":"MERGE -- A Bimodal Dataset for Static Music Emotion Recognition","authors":"Pedro Lima Louro, Hugo Redinho, Ricardo Santos, Ricardo Malheiro, Renato Panda, Rui Pedro Paiva","doi":"arxiv-2407.06060","DOIUrl":"https://doi.org/arxiv-2407.06060","url":null,"abstract":"The Music Emotion Recognition (MER) field has seen steady developments in\u0000recent years, with contributions from feature engineering, machine learning,\u0000and deep learning. The landscape has also shifted from audio-centric systems to\u0000bimodal ensembles that combine audio and lyrics. However, a severe lack of\u0000public and sizeable bimodal databases has hampered the development and\u0000improvement of bimodal audio-lyrics systems. This article proposes three new\u0000audio, lyrics, and bimodal MER research datasets, collectively called MERGE,\u0000created using a semi-automatic approach. To comprehensively assess the proposed\u0000datasets and establish a baseline for benchmarking, we conducted several\u0000experiments for each modality, using feature engineering, machine learning, and\u0000deep learning methodologies. In addition, we propose and validate fixed\u0000train-validate-test splits. The obtained results confirm the viability of the\u0000proposed datasets, achieving the best overall result of 79.21% F1-score for\u0000bimodal classification using a deep neural network.","PeriodicalId":501178,"journal":{"name":"arXiv - CS - Sound","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141576058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信