Developing Probabilistic Models for Identifying Semantic Patterns in Texts

2011 IEEE Fifth International Conference on Semantic Computing Pub Date : 2011-09-18 DOI:10.1109/ICSC.2011.35

Minhua Huang, R. Haralick

引用次数: 0

Abstract

We present a probabilistic graphical model that finds a sequence of optimal categories for a sequence of input symbols. Based on this mode, three algorithms are developed for identifying semantic patterns in texts. They are the algorithm for extracting semantic arguments of a verb, the algorithm for classifying the sense of an ambiguous word, and the algorithm for identifying noun phrases from a sentence. Experiments conducted on standard data sets show good results. For example, our method achieves an average precision of 92:96% and an average recall of 94:94% for extracting semantic argument boundaries of verbs on WSJ data from Penn Tree bank and Prop Bank, an average accuracy of 81:12% for recognizing the six sense word 0line0, and an average precision of 97:7% and an average recall of 98:8% for recognizing noun phrases on WSJ data from Penn Tree bank.

查看原文本刊更多论文

发展文本语义模式识别的概率模型

我们提出了一个概率图模型，该模型为输入符号序列找到一个最优类别序列。在此基础上，提出了三种文本语义模式识别算法。它们是提取动词语义参数的算法，对歧义词的意义进行分类的算法，以及从句子中识别名词短语的算法。在标准数据集上进行了实验，取得了良好的效果。例如，我们的方法在Penn Tree bank和Prop bank的WSJ数据上提取动词语义参数边界的平均准确率为92:96%，平均召回率为94:94%;识别6个义词0line0的平均准确率为81:12%;识别Penn Tree bank的WSJ数据上的名词短语的平均准确率为97:7%，平均召回率为98:8%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE Fifth International Conference on Semantic Computing

自引率

0.00%

发文量