A topical VAEGAN-IHMM approach for automatic story segmentation.

IF 2.6 4区 工程技术 Q1 Mathematics
Jia Yu, Huiling Peng, Guoqiang Wang, Nianfeng Shi
{"title":"A topical VAEGAN-IHMM approach for automatic story segmentation.","authors":"Jia Yu, Huiling Peng, Guoqiang Wang, Nianfeng Shi","doi":"10.3934/mbe.2024289","DOIUrl":null,"url":null,"abstract":"<p><p>Feature representations with rich topic information can greatly improve the performance of story segmentation tasks. VAEGAN offers distinct advantages in feature learning by combining variational autoencoder (VAE) and generative adversarial network (GAN), which not only captures intricate data representations through VAE's probabilistic encoding and decoding mechanism but also enhances feature diversity and quality via GAN's adversarial training. To better learn topical domain representation, we used a topical classifier to supervise the training process of VAEGAN. Based on the learned feature, a segmentor splits the document into shorter ones with different topics. Hidden Markov model (HMM) is a popular approach for story segmentation, in which stories are viewed as instances of topics (hidden states). The number of states has to be set manually but it is often unknown in real scenarios. To solve this problem, we proposed an infinite HMM (IHMM) approach which utilized an HDP prior on transition matrices over countably infinite state spaces to automatically infer the state's number from the data. Given a running text, a Blocked Gibbis sampler labeled the states with topic classes. The position where the topic changes was a story boundary. Experimental results on the TDT2 corpus demonstrated that the proposed topical VAEGAN-IHMM approach was significantly better than the traditional HMM method in story segmentation tasks and achieved state-of-the-art performance.</p>","PeriodicalId":49870,"journal":{"name":"Mathematical Biosciences and Engineering","volume":"21 7","pages":"6608-6630"},"PeriodicalIF":2.6000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biosciences and Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.3934/mbe.2024289","RegionNum":4,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

Abstract

Feature representations with rich topic information can greatly improve the performance of story segmentation tasks. VAEGAN offers distinct advantages in feature learning by combining variational autoencoder (VAE) and generative adversarial network (GAN), which not only captures intricate data representations through VAE's probabilistic encoding and decoding mechanism but also enhances feature diversity and quality via GAN's adversarial training. To better learn topical domain representation, we used a topical classifier to supervise the training process of VAEGAN. Based on the learned feature, a segmentor splits the document into shorter ones with different topics. Hidden Markov model (HMM) is a popular approach for story segmentation, in which stories are viewed as instances of topics (hidden states). The number of states has to be set manually but it is often unknown in real scenarios. To solve this problem, we proposed an infinite HMM (IHMM) approach which utilized an HDP prior on transition matrices over countably infinite state spaces to automatically infer the state's number from the data. Given a running text, a Blocked Gibbis sampler labeled the states with topic classes. The position where the topic changes was a story boundary. Experimental results on the TDT2 corpus demonstrated that the proposed topical VAEGAN-IHMM approach was significantly better than the traditional HMM method in story segmentation tasks and achieved state-of-the-art performance.

用于自动故事分割的专题 VAEGAN-IHMM 方法。
具有丰富主题信息的特征表征可以大大提高故事分割任务的性能。VAEGAN 结合了变异自动编码器(VAE)和生成对抗网络(GAN),在特征学习方面具有明显的优势,不仅能通过 VAE 的概率编码和解码机制捕捉复杂的数据表示,还能通过 GAN 的对抗训练提高特征的多样性和质量。为了更好地学习拓扑域表示,我们使用拓扑分类器来监督 VAEGAN 的训练过程。根据学习到的特征,分割器会将文档分割成不同主题的短文档。隐马尔可夫模型(HMM)是一种流行的故事分割方法,其中故事被视为主题实例(隐藏状态)。状态的数量必须手动设置,但在实际场景中往往是未知的。为了解决这个问题,我们提出了一种无限 HMM(IHMM)方法,利用可数无限状态空间上过渡矩阵的 HDP 先验,从数据中自动推断状态数。给定一个流水文本,一个 Blocked Gibbis 采样器用主题类别标记状态。主题变化的位置就是故事的边界。在 TDT2 语料库上的实验结果表明,在故事分割任务中,所提出的主题 VAEGAN-IHMM 方法明显优于传统的 HMM 方法,达到了最先进的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Mathematical Biosciences and Engineering
Mathematical Biosciences and Engineering 工程技术-数学跨学科应用
CiteScore
3.90
自引率
7.70%
发文量
586
审稿时长
>12 weeks
期刊介绍: Mathematical Biosciences and Engineering (MBE) is an interdisciplinary Open Access journal promoting cutting-edge research, technology transfer and knowledge translation about complex data and information processing. MBE publishes Research articles (long and original research); Communications (short and novel research); Expository papers; Technology Transfer and Knowledge Translation reports (description of new technologies and products); Announcements and Industrial Progress and News (announcements and even advertisement, including major conferences).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信