Pseudo Conditional Random Fields: Joint Training Approach to Segmenting and Labeling Sequence Data

Shing-Kit Chan, Wai Lam
{"title":"Pseudo Conditional Random Fields: Joint Training Approach to Segmenting and Labeling Sequence Data","authors":"Shing-Kit Chan, Wai Lam","doi":"10.1109/ICDM.2010.99","DOIUrl":null,"url":null,"abstract":"Cascaded approach has been used for a long time to conduct sub-tasks in order to accomplish a major task. We put cascaded approach in a probabilistic framework and analyze possible reasons for cascaded errors. To reduce the occurrence of cascaded errors, we need to add a constraint when performing joint training. We suggest a pseudo Conditional Random Field (pseudo-CRF) approach that models two sub-tasks as two Conditional Random Fields (CRFs). We then present the formulation in the context of a linear chain CRF for solving problems on sequence data. In conducting joint training for a pseudo-CRF, we reuse all existing well-developed efficient inference algorithms for a linear chain CRF, which would otherwise require the use of approximate inference algorithms or simulations that involve long computational time. Our experimental results show an interesting fact that a jointly trained CRF model in a pseudo-CRF may perform worse than a separately trained CRF on a sub-task. However the overall system performance of a pseudo-CRF would outperform that of a cascaded approach. We implement the implicit constraint in the form of a soft constraint such that users can define the penalty cost for violating the constraint. In order to work on large-scale datasets, we further suggest a parallel implementation of the pseudo-CRF approach, which can be implemented on a multi-core CPU or GPU on a graphics card that supports multi-threading. Our experimental results show that it can achieve a 12 times increase in speedup.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"275 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2010.99","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Cascaded approach has been used for a long time to conduct sub-tasks in order to accomplish a major task. We put cascaded approach in a probabilistic framework and analyze possible reasons for cascaded errors. To reduce the occurrence of cascaded errors, we need to add a constraint when performing joint training. We suggest a pseudo Conditional Random Field (pseudo-CRF) approach that models two sub-tasks as two Conditional Random Fields (CRFs). We then present the formulation in the context of a linear chain CRF for solving problems on sequence data. In conducting joint training for a pseudo-CRF, we reuse all existing well-developed efficient inference algorithms for a linear chain CRF, which would otherwise require the use of approximate inference algorithms or simulations that involve long computational time. Our experimental results show an interesting fact that a jointly trained CRF model in a pseudo-CRF may perform worse than a separately trained CRF on a sub-task. However the overall system performance of a pseudo-CRF would outperform that of a cascaded approach. We implement the implicit constraint in the form of a soft constraint such that users can define the penalty cost for violating the constraint. In order to work on large-scale datasets, we further suggest a parallel implementation of the pseudo-CRF approach, which can be implemented on a multi-core CPU or GPU on a graphics card that supports multi-threading. Our experimental results show that it can achieve a 12 times increase in speedup.
伪条件随机场:分割和标记序列数据的联合训练方法
长期以来,人们一直使用级联方法来执行子任务以完成一项主要任务。我们将级联方法置于概率框架中,并分析了级联错误的可能原因。为了减少级联错误的发生,我们需要在进行联合训练时增加一个约束。我们提出一种伪条件随机场(pseudo- Conditional Random Field, pseudo- crf)方法,将两个子任务建模为两个条件随机场(Conditional Random Field, crf)。然后,我们在求解序列数据问题的线性链CRF的背景下提出了该公式。在对伪CRF进行联合训练时,我们对线性链CRF重用了所有现有的良好开发的高效推理算法,否则将需要使用涉及长计算时间的近似推理算法或模拟。我们的实验结果显示了一个有趣的事实,即在伪CRF中联合训练的CRF模型在子任务上的表现可能比单独训练的CRF更差。然而,伪crf的整体系统性能将优于级联方法。我们以软约束的形式实现隐式约束,以便用户可以定义违反约束的惩罚成本。为了在大规模数据集上工作,我们进一步建议并行实现伪crf方法,该方法可以在支持多线程的显卡上的多核CPU或GPU上实现。我们的实验结果表明,它可以实现12倍的加速提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信