{"title":"Combined WSD algorithms with LSA to identify semantic similarity in unstructured textual data","authors":"Mohammed Ahmed Taiye, S. S. Kamaruddin, F. Ahmad","doi":"10.1145/3018896.3056785","DOIUrl":null,"url":null,"abstract":"Semantically related sentence may not have any word in common. However, identifying the semantic similarity between words at sentence level possess difficult challenges such as polysemy, synonyms, heterogeneity and sparsity of unstructured textual datasets. It is assumed that sentences with similar text or words in common are semantically related. It means that the standard Information Retrieval (IR) measure based on word co-occurrence are not appropriate to tackle the aforementioned challenges of identifying semantics in unstructured text documents. Many semantic similarity measures have been proposed to resolve this non-trivial issues, but many existing studies did not properly utilize the combination of Corpus and Knowledge-based approach to solve the syntactic construct and the roles of Part Of Speech in identifying semantic similarities in sentences. In this research, we aim at proposing a method for measuring sentence semantic similarity identification that combines two algorithms from the knowledge-based Word Sense Disambiguation algorithms with Latent Semantic Analysis to identify the semantic similarity of sentences and to compare results with human evaluation.","PeriodicalId":131464,"journal":{"name":"Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Conference on Internet of things, Data and Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3018896.3056785","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Semantically related sentence may not have any word in common. However, identifying the semantic similarity between words at sentence level possess difficult challenges such as polysemy, synonyms, heterogeneity and sparsity of unstructured textual datasets. It is assumed that sentences with similar text or words in common are semantically related. It means that the standard Information Retrieval (IR) measure based on word co-occurrence are not appropriate to tackle the aforementioned challenges of identifying semantics in unstructured text documents. Many semantic similarity measures have been proposed to resolve this non-trivial issues, but many existing studies did not properly utilize the combination of Corpus and Knowledge-based approach to solve the syntactic construct and the roles of Part Of Speech in identifying semantic similarities in sentences. In this research, we aim at proposing a method for measuring sentence semantic similarity identification that combines two algorithms from the knowledge-based Word Sense Disambiguation algorithms with Latent Semantic Analysis to identify the semantic similarity of sentences and to compare results with human evaluation.