Optimal distance metric function with trigram features for case based word sense disambiguation using artificial neural network

2011 Third International Conference on Advanced Computing Pub Date : 2011-12-01 DOI:10.1109/ICOAC.2011.6165190

P. Tamilselvi, S. Srivatsa

{"title":"Optimal distance metric function with trigram features for case based word sense disambiguation using artificial neural network","authors":"P. Tamilselvi, S. Srivatsa","doi":"10.1109/ICOAC.2011.6165190","DOIUrl":null,"url":null,"abstract":"In general, different levels of knowledge are used for disambiguation. In this paper, only three knowledge features or sources (trigram) are used to achieve the word sense disambiguation. Case based approach is applied for the disambiguation process. Cases are nothing but the refined form of words collected from Semcor, used for deriving the sense of the ambiguous input word. All possible Part of Speech (PoS) listed in Brown Corpus are collected and grouped into seventeen groups, and each group is assigned with a constant value. Trigram features of input (ambiguous words) as well as cases are represented as vector of size 1×3. Vector values for the ambiguous word and other two neighboring words are taken out from those assigned weights based on their PoS. In this paper ten different distance metric functions are empirically analyzed for improving the accuracy performance of word disambiguation with minimal knowledge sources. Neural Network is used for extracting correct sense of the ambiguous word from the selected minimal distance cases. In this paper, a long sentence is taken to project the performance of disambiguation process. From the result, it is clear that, post-trigramed Hamming function (F9) produced appreciable disambiguation accuracy 78.57% (recognized eleven ambiguous words out of fourteen).","PeriodicalId":369712,"journal":{"name":"2011 Third International Conference on Advanced Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Third International Conference on Advanced Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOAC.2011.6165190","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

In general, different levels of knowledge are used for disambiguation. In this paper, only three knowledge features or sources (trigram) are used to achieve the word sense disambiguation. Case based approach is applied for the disambiguation process. Cases are nothing but the refined form of words collected from Semcor, used for deriving the sense of the ambiguous input word. All possible Part of Speech (PoS) listed in Brown Corpus are collected and grouped into seventeen groups, and each group is assigned with a constant value. Trigram features of input (ambiguous words) as well as cases are represented as vector of size 1×3. Vector values for the ambiguous word and other two neighboring words are taken out from those assigned weights based on their PoS. In this paper ten different distance metric functions are empirically analyzed for improving the accuracy performance of word disambiguation with minimal knowledge sources. Neural Network is used for extracting correct sense of the ambiguous word from the selected minimal distance cases. In this paper, a long sentence is taken to project the performance of disambiguation process. From the result, it is clear that, post-trigramed Hamming function (F9) produced appreciable disambiguation accuracy 78.57% (recognized eleven ambiguous words out of fourteen).

查看原文本刊更多论文

基于三角特征的最优距离度量函数在基于实例的人工神经网络词义消歧中的应用

一般来说，不同层次的知识被用于消歧。本文仅使用三个知识特征或来源(三元组)来实现词义消歧。在消歧过程中采用了基于实例的方法。case只是从Semcor中收集的单词的精炼形式，用于派生歧义输入单词的意思。收集Brown语料库中所有可能的词类，并将其分为17组，每组赋一个常数。输入(歧义词)的三元组特征以及大小写表示为大小为1×3的向量。根据歧义词和相邻两个词的词序权重，提取歧义词和相邻两个词的向量值。本文对十种不同的距离度量函数进行了实证分析，以期在最少的知识来源下提高词消歧义的准确性。利用神经网络从选取的最小距离情况中提取歧义词的正确意义。本文以一个长句为例，对消歧过程的性能进行了评价。从结果可以清楚地看出，后三格汉明函数(F9)产生了可观的消歧准确率78.57%(识别出14个歧义词中的11个)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 Third International Conference on Advanced Computing

自引率

0.00%

发文量