{"title":"Topic embedding of sentences for story segmentation","authors":"J. Yu, Xiong Xiao, Lei Xie, Chng Eng Siong","doi":"10.1109/APSIPA.2017.8282280","DOIUrl":null,"url":null,"abstract":"In this paper, we propose to embed sentences into fixed-dimensional vectors that carry the topic information for story segmentation. As a sentence comprises of a sequence of words and may have different lengths, we use long short-term memory recurrent neural network (LSTM-RNN) to summarize the information of the whole sentence and only predict the topic class at the last word in the sentence. The output of the network at the last word can be used as an embedding of the sentence in the topic space. We used the obtained sentence embeddings in the HMM-based story segmentation framework and obtained promising results. On the TDT2 corpus, the F1 measure is improved to 0.789 from 0.765 which is obtained by a competitive system using DNN and bag-of-words features.","PeriodicalId":142091,"journal":{"name":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSIPA.2017.8282280","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this paper, we propose to embed sentences into fixed-dimensional vectors that carry the topic information for story segmentation. As a sentence comprises of a sequence of words and may have different lengths, we use long short-term memory recurrent neural network (LSTM-RNN) to summarize the information of the whole sentence and only predict the topic class at the last word in the sentence. The output of the network at the last word can be used as an embedding of the sentence in the topic space. We used the obtained sentence embeddings in the HMM-based story segmentation framework and obtained promising results. On the TDT2 corpus, the F1 measure is improved to 0.789 from 0.765 which is obtained by a competitive system using DNN and bag-of-words features.