Combined WSD algorithms with LSA to identify semantic similarity in unstructured textual data

Mohammed Ahmed Taiye, S. S. Kamaruddin, F. Ahmad
{"title":"Combined WSD algorithms with LSA to identify semantic similarity in unstructured textual data","authors":"Mohammed Ahmed Taiye, S. S. Kamaruddin, F. Ahmad","doi":"10.1145/3018896.3056785","DOIUrl":null,"url":null,"abstract":"Semantically related sentence may not have any word in common. However, identifying the semantic similarity between words at sentence level possess difficult challenges such as polysemy, synonyms, heterogeneity and sparsity of unstructured textual datasets. It is assumed that sentences with similar text or words in common are semantically related. It means that the standard Information Retrieval (IR) measure based on word co-occurrence are not appropriate to tackle the aforementioned challenges of identifying semantics in unstructured text documents. Many semantic similarity measures have been proposed to resolve this non-trivial issues, but many existing studies did not properly utilize the combination of Corpus and Knowledge-based approach to solve the syntactic construct and the roles of Part Of Speech in identifying semantic similarities in sentences. In this research, we aim at proposing a method for measuring sentence semantic similarity identification that combines two algorithms from the knowledge-based Word Sense Disambiguation algorithms with Latent Semantic Analysis to identify the semantic similarity of sentences and to compare results with human evaluation.","PeriodicalId":131464,"journal":{"name":"Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3018896.3056785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Semantically related sentence may not have any word in common. However, identifying the semantic similarity between words at sentence level possess difficult challenges such as polysemy, synonyms, heterogeneity and sparsity of unstructured textual datasets. It is assumed that sentences with similar text or words in common are semantically related. It means that the standard Information Retrieval (IR) measure based on word co-occurrence are not appropriate to tackle the aforementioned challenges of identifying semantics in unstructured text documents. Many semantic similarity measures have been proposed to resolve this non-trivial issues, but many existing studies did not properly utilize the combination of Corpus and Knowledge-based approach to solve the syntactic construct and the roles of Part Of Speech in identifying semantic similarities in sentences. In this research, we aim at proposing a method for measuring sentence semantic similarity identification that combines two algorithms from the knowledge-based Word Sense Disambiguation algorithms with Latent Semantic Analysis to identify the semantic similarity of sentences and to compare results with human evaluation.
结合WSD算法和LSA识别非结构化文本数据中的语义相似度
语义相关的句子可能没有任何共同的词。然而,在句子层面识别词之间的语义相似性存在着非结构化文本数据集的多义性、同义性、异构性和稀疏性等难题。通常认为具有相似文本或相同单词的句子在语义上是相关的。这意味着基于词共现的标准信息检索(Information Retrieval, IR)度量不适合处理前面提到的在非结构化文本文档中识别语义的挑战。为了解决这一重要问题,人们提出了许多语义相似度度量方法,但现有的许多研究并没有很好地利用语料库和基于知识的方法相结合来解决句法结构和词性在句子语义相似度识别中的作用。在这项研究中,我们旨在提出一种测量句子语义相似度的方法,该方法结合了基于知识的词义消歧算法和潜在语义分析的两种算法来识别句子的语义相似度,并将结果与人类的评估进行比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信