Combined WSD algorithms with LSA to identify semantic similarity in unstructured textual data

Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing Pub Date : 2017-03-22 DOI:10.1145/3018896.3056785

Mohammed Ahmed Taiye, S. S. Kamaruddin, F. Ahmad

{"title":"Combined WSD algorithms with LSA to identify semantic similarity in unstructured textual data","authors":"Mohammed Ahmed Taiye, S. S. Kamaruddin, F. Ahmad","doi":"10.1145/3018896.3056785","DOIUrl":null,"url":null,"abstract":"Semantically related sentence may not have any word in common. However, identifying the semantic similarity between words at sentence level possess difficult challenges such as polysemy, synonyms, heterogeneity and sparsity of unstructured textual datasets. It is assumed that sentences with similar text or words in common are semantically related. It means that the standard Information Retrieval (IR) measure based on word co-occurrence are not appropriate to tackle the aforementioned challenges of identifying semantics in unstructured text documents. Many semantic similarity measures have been proposed to resolve this non-trivial issues, but many existing studies did not properly utilize the combination of Corpus and Knowledge-based approach to solve the syntactic construct and the roles of Part Of Speech in identifying semantic similarities in sentences. In this research, we aim at proposing a method for measuring sentence semantic similarity identification that combines two algorithms from the knowledge-based Word Sense Disambiguation algorithms with Latent Semantic Analysis to identify the semantic similarity of sentences and to compare results with human evaluation.","PeriodicalId":131464,"journal":{"name":"Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3018896.3056785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Semantically related sentence may not have any word in common. However, identifying the semantic similarity between words at sentence level possess difficult challenges such as polysemy, synonyms, heterogeneity and sparsity of unstructured textual datasets. It is assumed that sentences with similar text or words in common are semantically related. It means that the standard Information Retrieval (IR) measure based on word co-occurrence are not appropriate to tackle the aforementioned challenges of identifying semantics in unstructured text documents. Many semantic similarity measures have been proposed to resolve this non-trivial issues, but many existing studies did not properly utilize the combination of Corpus and Knowledge-based approach to solve the syntactic construct and the roles of Part Of Speech in identifying semantic similarities in sentences. In this research, we aim at proposing a method for measuring sentence semantic similarity identification that combines two algorithms from the knowledge-based Word Sense Disambiguation algorithms with Latent Semantic Analysis to identify the semantic similarity of sentences and to compare results with human evaluation.

查看原文本刊更多论文

结合WSD算法和LSA识别非结构化文本数据中的语义相似度

语义相关的句子可能没有任何共同的词。然而，在句子层面识别词之间的语义相似性存在着非结构化文本数据集的多义性、同义性、异构性和稀疏性等难题。通常认为具有相似文本或相同单词的句子在语义上是相关的。这意味着基于词共现的标准信息检索(Information Retrieval, IR)度量不适合处理前面提到的在非结构化文本文档中识别语义的挑战。为了解决这一重要问题，人们提出了许多语义相似度度量方法，但现有的许多研究并没有很好地利用语料库和基于知识的方法相结合来解决句法结构和词性在句子语义相似度识别中的作用。在这项研究中，我们旨在提出一种测量句子语义相似度的方法，该方法结合了基于知识的词义消歧算法和潜在语义分析的两种算法来识别句子的语义相似度，并将结果与人类的评估进行比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing

自引率

0.00%

发文量