Optimal distance metric function with trigram features for case based word sense disambiguation using artificial neural network

P. Tamilselvi, S. Srivatsa
{"title":"Optimal distance metric function with trigram features for case based word sense disambiguation using artificial neural network","authors":"P. Tamilselvi, S. Srivatsa","doi":"10.1109/ICOAC.2011.6165190","DOIUrl":null,"url":null,"abstract":"In general, different levels of knowledge are used for disambiguation. In this paper, only three knowledge features or sources (trigram) are used to achieve the word sense disambiguation. Case based approach is applied for the disambiguation process. Cases are nothing but the refined form of words collected from Semcor, used for deriving the sense of the ambiguous input word. All possible Part of Speech (PoS) listed in Brown Corpus are collected and grouped into seventeen groups, and each group is assigned with a constant value. Trigram features of input (ambiguous words) as well as cases are represented as vector of size 1×3. Vector values for the ambiguous word and other two neighboring words are taken out from those assigned weights based on their PoS. In this paper ten different distance metric functions are empirically analyzed for improving the accuracy performance of word disambiguation with minimal knowledge sources. Neural Network is used for extracting correct sense of the ambiguous word from the selected minimal distance cases. In this paper, a long sentence is taken to project the performance of disambiguation process. From the result, it is clear that, post-trigramed Hamming function (F9) produced appreciable disambiguation accuracy 78.57% (recognized eleven ambiguous words out of fourteen).","PeriodicalId":369712,"journal":{"name":"2011 Third International Conference on Advanced Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Third International Conference on Advanced Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICOAC.2011.6165190","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

In general, different levels of knowledge are used for disambiguation. In this paper, only three knowledge features or sources (trigram) are used to achieve the word sense disambiguation. Case based approach is applied for the disambiguation process. Cases are nothing but the refined form of words collected from Semcor, used for deriving the sense of the ambiguous input word. All possible Part of Speech (PoS) listed in Brown Corpus are collected and grouped into seventeen groups, and each group is assigned with a constant value. Trigram features of input (ambiguous words) as well as cases are represented as vector of size 1×3. Vector values for the ambiguous word and other two neighboring words are taken out from those assigned weights based on their PoS. In this paper ten different distance metric functions are empirically analyzed for improving the accuracy performance of word disambiguation with minimal knowledge sources. Neural Network is used for extracting correct sense of the ambiguous word from the selected minimal distance cases. In this paper, a long sentence is taken to project the performance of disambiguation process. From the result, it is clear that, post-trigramed Hamming function (F9) produced appreciable disambiguation accuracy 78.57% (recognized eleven ambiguous words out of fourteen).
基于三角特征的最优距离度量函数在基于实例的人工神经网络词义消歧中的应用
一般来说,不同层次的知识被用于消歧。本文仅使用三个知识特征或来源(三元组)来实现词义消歧。在消歧过程中采用了基于实例的方法。case只是从Semcor中收集的单词的精炼形式,用于派生歧义输入单词的意思。收集Brown语料库中所有可能的词类,并将其分为17组,每组赋一个常数。输入(歧义词)的三元组特征以及大小写表示为大小为1×3的向量。根据歧义词和相邻两个词的词序权重,提取歧义词和相邻两个词的向量值。本文对十种不同的距离度量函数进行了实证分析,以期在最少的知识来源下提高词消歧义的准确性。利用神经网络从选取的最小距离情况中提取歧义词的正确意义。本文以一个长句为例,对消歧过程的性能进行了评价。从结果可以清楚地看出,后三格汉明函数(F9)产生了可观的消歧准确率78.57%(识别出14个歧义词中的11个)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信