Work-in-Progress: ARTIC: An Adaptive Real-Time Imprecise Computation Pipeline for Audio Analysis

2019 IEEE Real-Time Systems Symposium (RTSS) Pub Date : 2019-12-01 DOI:10.1109/RTSS46320.2019.00071

Michael Yantosca, A. Cheng

{"title":"Work-in-Progress: ARTIC: An Adaptive Real-Time Imprecise Computation Pipeline for Audio Analysis","authors":"Michael Yantosca, A. Cheng","doi":"10.1109/RTSS46320.2019.00071","DOIUrl":null,"url":null,"abstract":"One of the more complex issues facing natural language processing (NLP) is how to deal with overlapped speech, i.e., when two or more speakers interfere with or talk over each other, and the more general case of co-channel speech, i.e., when two or more speakers are present in an audio stream regardless of interference. Frequently, one speaker is selected as a primary speaker for the purpose of analysis with other speakers relegated to the category of interfering speakers. Despite the breadth of research into overlapped speech detection, few endeavors have been made into preserving the speech of so-called interfering speakers. A compelling case can be made for a more comprehensive analysis of co-channel speech in the fields of computational linguistics, accessibility automation, and entertainment, particularly under real-time constraints. Currently available open-source audio libraries, while technically capable of supporting such research endeavors, are cumbersome to work with. To this end, the work introduces the Adaptive Real-Time Imprecise Computation (ARTIC) pipeline for audio analysis, a simple but flexible approach to stream processing that tracks computation times and deadlines for the various pipeline stages and affords the user the ability to specify automatic precision reductions to avoid projected deadline misses as well as automatic precision increases to combat underutilization. A proof of concept is tested with the intent to build upon this groundwork for a more comprehensive project having the goal of multi-speaker interference detection and eventually speaker separation.","PeriodicalId":102892,"journal":{"name":"2019 IEEE Real-Time Systems Symposium (RTSS)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Real-Time Systems Symposium (RTSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RTSS46320.2019.00071","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

One of the more complex issues facing natural language processing (NLP) is how to deal with overlapped speech, i.e., when two or more speakers interfere with or talk over each other, and the more general case of co-channel speech, i.e., when two or more speakers are present in an audio stream regardless of interference. Frequently, one speaker is selected as a primary speaker for the purpose of analysis with other speakers relegated to the category of interfering speakers. Despite the breadth of research into overlapped speech detection, few endeavors have been made into preserving the speech of so-called interfering speakers. A compelling case can be made for a more comprehensive analysis of co-channel speech in the fields of computational linguistics, accessibility automation, and entertainment, particularly under real-time constraints. Currently available open-source audio libraries, while technically capable of supporting such research endeavors, are cumbersome to work with. To this end, the work introduces the Adaptive Real-Time Imprecise Computation (ARTIC) pipeline for audio analysis, a simple but flexible approach to stream processing that tracks computation times and deadlines for the various pipeline stages and affords the user the ability to specify automatic precision reductions to avoid projected deadline misses as well as automatic precision increases to combat underutilization. A proof of concept is tested with the intent to build upon this groundwork for a more comprehensive project having the goal of multi-speaker interference detection and eventually speaker separation.

查看原文本刊更多论文

正在进行的工作:ARTIC:音频分析的自适应实时不精确计算管道

自然语言处理(NLP)面临的一个更复杂的问题是如何处理重叠语音，即当两个或多个说话者相互干扰或交谈时，以及更一般的同信道语音情况，即当两个或更多说话者存在于音频流中而不受干扰时。通常，为了分析的目的，一个说话人被选为主要说话人，而其他说话人则被归入干扰说话人的类别。尽管对重叠语音检测的研究范围很广，但很少有人努力保留所谓的干扰说话者的语音。在计算语言学、可访问性自动化和娱乐领域，特别是在实时限制下，可以对同信道语音进行更全面的分析，这是一个令人信服的案例。目前可用的开源音频库虽然在技术上能够支持此类研究，但使用起来很麻烦。为此，该工作引入了用于音频分析的自适应实时不精确计算(ARTIC)管道，这是一种简单但灵活的流处理方法，可以跟踪各个管道阶段的计算时间和截止日期，并为用户提供指定自动精度降低的能力，以避免预计的截止日期错过，以及自动精度增加以对抗利用率不足。概念验证测试的目的是在此基础上建立一个更全面的项目，目标是多扬声器干扰检测和最终扬声器分离。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE Real-Time Systems Symposium (RTSS)

自引率

0.00%

发文量