通过自适应音频分割自动化扩展无线电广播语义

Science of aging knowledge environment : SAGE KE Pub Date : 2022-07-18 DOI:10.3390/knowledge2030020

Rigas Kotsakis, Charalampos A. Dimoulas

{"title":"通过自适应音频分割自动化扩展无线电广播语义","authors":"Rigas Kotsakis, Charalampos A. Dimoulas","doi":"10.3390/knowledge2030020","DOIUrl":null,"url":null,"abstract":"The present paper focuses on adaptive audio detection, segmentation and classification techniques in audio broadcasting content, dedicated mainly to voice data. The suggested framework addresses a real case scenario encountered in media services and especially radio streams, aiming to fulfill diverse (semi-) automated indexing/annotation and management necessities. In this context, aggregated radio content is collected, featuring small input datasets, which are utilized for adaptive classification experiments, without searching, at this point, for a generic pattern recognition solution. Hierarchical and hybrid taxonomies are proposed, firstly to discriminate voice data in radio streams and thereafter to detect single speaker voices, and when this is the case, the experiments proceed into a final layer of gender classification. It is worth mentioning that stand-alone and combined supervised and clustering techniques are tested along with multivariate window tuning, towards the extraction of meaningful results based on overall and partial performance rates. Furthermore, the current work via data augmentation mechanisms contributes to the formulation of a dynamic Generic Audio Classification Repository to be subjected, in the future, to adaptive multilabel experimentation with more sophisticated techniques, such as deep architectures.","PeriodicalId":74770,"journal":{"name":"Science of aging knowledge environment : SAGE KE","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Extending Radio Broadcasting Semantics through Adaptive Audio Segmentation Automations\",\"authors\":\"Rigas Kotsakis, Charalampos A. Dimoulas\",\"doi\":\"10.3390/knowledge2030020\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The present paper focuses on adaptive audio detection, segmentation and classification techniques in audio broadcasting content, dedicated mainly to voice data. The suggested framework addresses a real case scenario encountered in media services and especially radio streams, aiming to fulfill diverse (semi-) automated indexing/annotation and management necessities. In this context, aggregated radio content is collected, featuring small input datasets, which are utilized for adaptive classification experiments, without searching, at this point, for a generic pattern recognition solution. Hierarchical and hybrid taxonomies are proposed, firstly to discriminate voice data in radio streams and thereafter to detect single speaker voices, and when this is the case, the experiments proceed into a final layer of gender classification. It is worth mentioning that stand-alone and combined supervised and clustering techniques are tested along with multivariate window tuning, towards the extraction of meaningful results based on overall and partial performance rates. Furthermore, the current work via data augmentation mechanisms contributes to the formulation of a dynamic Generic Audio Classification Repository to be subjected, in the future, to adaptive multilabel experimentation with more sophisticated techniques, such as deep architectures.\",\"PeriodicalId\":74770,\"journal\":{\"name\":\"Science of aging knowledge environment : SAGE KE\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Science of aging knowledge environment : SAGE KE\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/knowledge2030020\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Science of aging knowledge environment : SAGE KE","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/knowledge2030020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

本文主要研究音频广播内容中的自适应音频检测、分割和分类技术，主要针对语音数据。建议的框架解决了在媒体服务中遇到的实际情况，特别是无线电流，旨在满足各种(半)自动化索引/注释和管理需求。在这种情况下，收集聚合的无线电内容，具有小的输入数据集，用于自适应分类实验，此时不需要搜索通用的模式识别解决方案。提出了分层和混合分类法，首先区分无线电流中的语音数据，然后检测单个说话者的声音，当这种情况下，实验进入最后一层性别分类。值得一提的是，独立和组合监督和聚类技术与多变量窗口调优一起进行了测试，以基于整体和部分性能率提取有意义的结果。此外，目前通过数据增强机制进行的工作有助于动态通用音频分类存储库的制定，以便将来使用更复杂的技术(如深度架构)进行自适应多标签实验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Extending Radio Broadcasting Semantics through Adaptive Audio Segmentation Automations

The present paper focuses on adaptive audio detection, segmentation and classification techniques in audio broadcasting content, dedicated mainly to voice data. The suggested framework addresses a real case scenario encountered in media services and especially radio streams, aiming to fulfill diverse (semi-) automated indexing/annotation and management necessities. In this context, aggregated radio content is collected, featuring small input datasets, which are utilized for adaptive classification experiments, without searching, at this point, for a generic pattern recognition solution. Hierarchical and hybrid taxonomies are proposed, firstly to discriminate voice data in radio streams and thereafter to detect single speaker voices, and when this is the case, the experiments proceed into a final layer of gender classification. It is worth mentioning that stand-alone and combined supervised and clustering techniques are tested along with multivariate window tuning, towards the extraction of meaningful results based on overall and partial performance rates. Furthermore, the current work via data augmentation mechanisms contributes to the formulation of a dynamic Generic Audio Classification Repository to be subjected, in the future, to adaptive multilabel experimentation with more sophisticated techniques, such as deep architectures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Science of aging knowledge environment : SAGE KE

自引率

0.00%

发文量