Introduction to the Special Issue on Data Mining of Speech, Audio, and Dialog

IEEE Trans. Speech Audio Process. Pub Date : 2005-08-15 DOI:10.1109/TSA.2005.852677

M. Gilbert, Roger K. Moore, G. Zweig

{"title":"Introduction to the Special Issue on Data Mining of Speech, Audio, and Dialog","authors":"M. Gilbert, Roger K. Moore, G. Zweig","doi":"10.1109/TSA.2005.852677","DOIUrl":null,"url":null,"abstract":"ATA mining is concerned with the science, technology, and engineering of discovering patterns and extracting potentially useful or interesting information automatically or semi-automatically from data. Data mining was introduced in the 1990s and has deep roots in the fields of statistics, artificial intelligence, and machine learning. With the advent of inexpensive storage space and faster processing over the past decade or so, data mining research has started to penetrate new grounds in areas of speech and audio processing as well as spoken language dialog. It has been fueled by the influx of audio data that are becoming more widely available from a variety of multimedia sources including webcasts, conversations, music, meetings, voice messages, lectures, television, and radio. Algorithmic advances in automatic speech recognition have also been a major, enabling technology behind the growth in data mining. Current state-of-the-art, large-vocabulary, continuous speech recognizers are now trained on a record amount of data—several hundreds of millions of words and thousands of hours of speech. Pioneering research in robust speech processing, large-scale discriminative training, finite state automata, and statistical hidden Markov modeling have resulted in real-time recognizers that are able to transcribe spontaneous speech with a word accuracy exceeding 85%. With this level of accuracy, the technology is now highly attractive for a variety of speech mining applications. Speech mining research includes many ways of applying machine learning, speech processing, and language processing algorithms to benefit and serve commercial applications. It also raises and addresses several new and interesting fundamental research challenges in the areas of prediction, search, explanation, learning, and language understanding. These basic challenges are becoming increasingly important in revolutionizing business processes by providing essential sales and marketing information about services, customers, and product offerings. They are also enabling a new class of learning systems to be created that can infer knowledge and trends automatically from data, analyze and report application performance, and adapt and improve over time with minimal or zero human involvement. Effective techniques for mining speech, audio, and dialog data can impact numerous business and government applications. The technology for monitoring conversational speech to discover patterns, capture useful trends, and generate alarms is essential for intelligence and law enforcement organizations as well as for enhancing call center operation. It is useful for an","PeriodicalId":13155,"journal":{"name":"IEEE Trans. Speech Audio Process.","volume":"70 1","pages":"633-634"},"PeriodicalIF":0.0000,"publicationDate":"2005-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Trans. Speech Audio Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TSA.2005.852677","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

ATA mining is concerned with the science, technology, and engineering of discovering patterns and extracting potentially useful or interesting information automatically or semi-automatically from data. Data mining was introduced in the 1990s and has deep roots in the fields of statistics, artificial intelligence, and machine learning. With the advent of inexpensive storage space and faster processing over the past decade or so, data mining research has started to penetrate new grounds in areas of speech and audio processing as well as spoken language dialog. It has been fueled by the influx of audio data that are becoming more widely available from a variety of multimedia sources including webcasts, conversations, music, meetings, voice messages, lectures, television, and radio. Algorithmic advances in automatic speech recognition have also been a major, enabling technology behind the growth in data mining. Current state-of-the-art, large-vocabulary, continuous speech recognizers are now trained on a record amount of data—several hundreds of millions of words and thousands of hours of speech. Pioneering research in robust speech processing, large-scale discriminative training, finite state automata, and statistical hidden Markov modeling have resulted in real-time recognizers that are able to transcribe spontaneous speech with a word accuracy exceeding 85%. With this level of accuracy, the technology is now highly attractive for a variety of speech mining applications. Speech mining research includes many ways of applying machine learning, speech processing, and language processing algorithms to benefit and serve commercial applications. It also raises and addresses several new and interesting fundamental research challenges in the areas of prediction, search, explanation, learning, and language understanding. These basic challenges are becoming increasingly important in revolutionizing business processes by providing essential sales and marketing information about services, customers, and product offerings. They are also enabling a new class of learning systems to be created that can infer knowledge and trends automatically from data, analyze and report application performance, and adapt and improve over time with minimal or zero human involvement. Effective techniques for mining speech, audio, and dialog data can impact numerous business and government applications. The technology for monitoring conversational speech to discover patterns, capture useful trends, and generate alarms is essential for intelligence and law enforcement organizations as well as for enhancing call center operation. It is useful for an

查看原文本刊更多论文

语音、音频和对话的数据挖掘专题导论

ATA挖掘涉及自动或半自动地从数据中发现模式和提取潜在有用或有趣信息的科学、技术和工程。数据挖掘在20世纪90年代被引入，在统计学、人工智能和机器学习领域有着深厚的根基。在过去的十年左右，随着廉价存储空间的出现和更快的处理速度，数据挖掘研究已经开始渗透到语音和音频处理以及口语对话领域的新领域。音频数据的涌入推动了它的发展，这些音频数据越来越广泛地从各种多媒体来源获得，包括网络广播、对话、音乐、会议、语音信息、讲座、电视和广播。自动语音识别的算法进步也是数据挖掘增长背后的主要支持技术。目前，最先进的、大词汇量的、连续的语音识别器正在接受创纪录数量的数据训练——数亿个单词和数千小时的语音。在鲁棒语音处理、大规模判别训练、有限状态自动机和统计隐马尔可夫建模方面的开创性研究已经导致实时识别器能够以超过85%的单词准确率转录自发语音。由于这种精度，该技术现在对各种语音挖掘应用具有很高的吸引力。语音挖掘研究包括许多应用机器学习、语音处理和语言处理算法的方法，以受益和服务于商业应用。它还提出并解决了预测、搜索、解释、学习和语言理解领域的几个新的和有趣的基础研究挑战。通过提供有关服务、客户和产品的基本销售和营销信息，这些基本挑战在革新业务流程方面变得越来越重要。它们还使一种新的学习系统得以创建，这种系统可以从数据中自动推断知识和趋势，分析和报告应用程序的性能，并随着时间的推移进行调整和改进，而无需人工参与。挖掘语音、音频和对话数据的有效技术可以影响许多业务和政府应用程序。监视会话语音以发现模式、捕获有用趋势和生成警报的技术对于情报和执法组织以及增强呼叫中心操作至关重要。它对一个人很有用

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Trans. Speech Audio Process.

自引率

0.00%

发文量