Multimodal Fusion for Segment Classification in Folk Music

Aravind Krishnan, Amal Vincent, Geevar Jos, R. Rajan
{"title":"Multimodal Fusion for Segment Classification in Folk Music","authors":"Aravind Krishnan, Amal Vincent, Geevar Jos, R. Rajan","doi":"10.1109/INDICON52576.2021.9691751","DOIUrl":null,"url":null,"abstract":"A folk music segment classification system that uses a multimodal fusion of acoustic features, textual information and duration based feature on Thiruvathirakali music corpus is proposed. Acoustic features are learned from musical texture features (MTF) using a long short term memory (LSTM) model. A term frequency-inverse document frequency (TF-IDF) model is employed to derive text-based features from transcription data. For multimodal fusion, early integration of the LSTM derived features, TF-IDF features and duration feature is employed. An attempt to optimise the LSTM model is carried out through frame fusion in the temporal domain. Frame fusion is seen to increase classification efficiency by 13 percent and reduce computational expense by tenfold. The system reports an overall precision, recall and F1 measure of 0.53, 0.52 and 0.51 respectively for an LSTM model with frame fusion, with better performance over a baseline SVM classifier. The classification efficiency is seen to improve by 15 percent (absolutely) with the addition of each multimodal component. For complete multimodal fusion, the metrics improve to 0.83, 0.78 and 0.80 respectively.","PeriodicalId":106004,"journal":{"name":"2021 IEEE 18th India Council International Conference (INDICON)","volume":"800 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 18th India Council International Conference (INDICON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDICON52576.2021.9691751","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

A folk music segment classification system that uses a multimodal fusion of acoustic features, textual information and duration based feature on Thiruvathirakali music corpus is proposed. Acoustic features are learned from musical texture features (MTF) using a long short term memory (LSTM) model. A term frequency-inverse document frequency (TF-IDF) model is employed to derive text-based features from transcription data. For multimodal fusion, early integration of the LSTM derived features, TF-IDF features and duration feature is employed. An attempt to optimise the LSTM model is carried out through frame fusion in the temporal domain. Frame fusion is seen to increase classification efficiency by 13 percent and reduce computational expense by tenfold. The system reports an overall precision, recall and F1 measure of 0.53, 0.52 and 0.51 respectively for an LSTM model with frame fusion, with better performance over a baseline SVM classifier. The classification efficiency is seen to improve by 15 percent (absolutely) with the addition of each multimodal component. For complete multimodal fusion, the metrics improve to 0.83, 0.78 and 0.80 respectively.
多模态融合在民乐音段分类中的应用
基于Thiruvathirakali音乐语料库,提出了一种基于声学特征、文本信息和音长特征的多模态融合民乐片段分类系统。使用长短期记忆(LSTM)模型从音乐织体特征(MTF)中学习声学特征。采用术语频率-逆文档频率(TF-IDF)模型从转录数据中导出基于文本的特征。对于多模态融合,采用LSTM衍生特征、TF-IDF特征和持续时间特征的早期融合。通过时域帧融合对LSTM模型进行了优化。框架融合被认为可以将分类效率提高13%,并将计算费用降低10倍。该系统报告了具有帧融合的LSTM模型的总体精度、召回率和F1测度分别为0.53、0.52和0.51,比基线SVM分类器性能更好。随着每个多模态成分的增加,分类效率被认为提高了15%(绝对)。对于完全的多模态融合,该指标分别提高到0.83、0.78和0.80。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信