{"title":"Topic segmentation in ASR transcripts using bidirectional RNNS for change detection","authors":"I. Sheikh, D. Fohr, I. Illina","doi":"10.1109/ASRU.2017.8268979","DOIUrl":null,"url":null,"abstract":"Topic segmentation methods are mostly based on the idea of lexical cohesion, in which lexical distributions are analysed across the document and segment boundaries are marked in areas of low cohesion. We propose a novel approach for topic segmentation in speech recognition transcripts by measuring lexical cohesion using bidirectional Recurrent Neural Networks (RNN). The bidirectional RNNs capture context in the past and the following set of words. The past and following contexts are compared to perform topic change detection. In contrast to existing works based on sequence and discriminative models for topic segmentation, our approach does not use a segmented corpus nor (pseudo) topic labels for training. Our model is trained using news articles obtained from the internet. Evaluation on ASR transcripts of French TV broadcast news programs demonstrates the effectiveness of our proposed approach.","PeriodicalId":290868,"journal":{"name":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","volume":"119 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"26","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASRU.2017.8268979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 26
Abstract
Topic segmentation methods are mostly based on the idea of lexical cohesion, in which lexical distributions are analysed across the document and segment boundaries are marked in areas of low cohesion. We propose a novel approach for topic segmentation in speech recognition transcripts by measuring lexical cohesion using bidirectional Recurrent Neural Networks (RNN). The bidirectional RNNs capture context in the past and the following set of words. The past and following contexts are compared to perform topic change detection. In contrast to existing works based on sequence and discriminative models for topic segmentation, our approach does not use a segmented corpus nor (pseudo) topic labels for training. Our model is trained using news articles obtained from the internet. Evaluation on ASR transcripts of French TV broadcast news programs demonstrates the effectiveness of our proposed approach.