Automatic Construction of Interval-Valued Fuzzy Hindi WordNet using Lexico-Syntactic Patterns and Word Embeddings

IF 2 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

ACM Transactions on Asian and Low-Resource Language Information Processing Pub Date : 2024-02-02 DOI:10.1145/3643132

Minni Jain, Rajni Jindal, Amita Jain

{"title":"Automatic Construction of Interval-Valued Fuzzy Hindi WordNet using Lexico-Syntactic Patterns and Word Embeddings","authors":"Minni Jain, Rajni Jindal, Amita Jain","doi":"10.1145/3643132","DOIUrl":null,"url":null,"abstract":"<p>A computational lexicon is the backbone of any language processing system. It helps computers to understand the language complexity as a human does by inculcating words and their semantic associations. Manually constructed famous Hindi WordNet (HWN) consists of various classical semantic relations (crisp relations). To handle uncertainty and represent Hindi WordNet more semantically, Type- 1 fuzzy graphs are applied to relations of Hindi WordNet. But uncertainty in the crisp membership degree is not considered in Type 1 fuzzy set (T1FS). Also collecting billions (5,55,69,51,753 relations in HWN) of membership values from experts (humans) is not feasible. This paper applied the concept of Interval-Valued Fuzzy graphs and proposed Interval- Valued Fuzzy Hindi WordNet (IVFHWN). IVFHWN automatically identifies Interval- Valued Fuzzy relations between words and their degree of membership using word embeddings and lexico-syntactic patterns. The experimental results for the word sense disambiguation problem show better outcomes when IVFHWN is being used in place of Type 1 Fuzzy Hindi WordNet and classical Hindi WordNet.</p>","PeriodicalId":54312,"journal":{"name":"ACM Transactions on Asian and Low-Resource Language Information Processing","volume":"42 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Asian and Low-Resource Language Information Processing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3643132","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

A computational lexicon is the backbone of any language processing system. It helps computers to understand the language complexity as a human does by inculcating words and their semantic associations. Manually constructed famous Hindi WordNet (HWN) consists of various classical semantic relations (crisp relations). To handle uncertainty and represent Hindi WordNet more semantically, Type- 1 fuzzy graphs are applied to relations of Hindi WordNet. But uncertainty in the crisp membership degree is not considered in Type 1 fuzzy set (T1FS). Also collecting billions (5,55,69,51,753 relations in HWN) of membership values from experts (humans) is not feasible. This paper applied the concept of Interval-Valued Fuzzy graphs and proposed Interval- Valued Fuzzy Hindi WordNet (IVFHWN). IVFHWN automatically identifies Interval- Valued Fuzzy relations between words and their degree of membership using word embeddings and lexico-syntactic patterns. The experimental results for the word sense disambiguation problem show better outcomes when IVFHWN is being used in place of Type 1 Fuzzy Hindi WordNet and classical Hindi WordNet.

查看原文本刊更多论文

利用词典句法模式和词语嵌入自动构建区间值模糊印地语词网

计算词典是任何语言处理系统的支柱。它通过灌输单词及其语义关联，帮助计算机像人类一样理解语言的复杂性。人工构建的著名印地语词网（HWN）由各种经典语义关系（清晰关系）组成。为了处理不确定性并更语义化地表示印地语 WordNet，对印地语 WordNet 的关系应用了 1 类模糊图。但在 1 类模糊集（T1FS）中没有考虑清晰成员度的不确定性。此外，从专家（人类）那里收集数十亿（HWN 中的 5,55,69,51,753 个关系）的成员值也不可行。本文应用了区间值模糊图的概念，并提出了区间值模糊印地语词网（IVFHWN）。IVFHWN 利用词嵌入和词义句法模式自动识别词与词之间的区间值模糊关系及其成员度。词义消歧问题的实验结果表明，用 IVFHWN 代替第一类模糊印地语词网和经典印地语词网时，效果更好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Asian and Low-Resource Language Information Processing Computer Science-General Computer Science

CiteScore

3.60

自引率

15.00%

发文量

241

期刊介绍： The ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP) publishes high quality original archival papers and technical notes in the areas of computation and processing of information in Asian languages, low-resource languages of Africa, Australasia, Oceania and the Americas, as well as related disciplines. The subject areas covered by TALLIP include, but are not limited to: -Computational Linguistics: including computational phonology, computational morphology, computational syntax (e.g. parsing), computational semantics, computational pragmatics, etc. -Linguistic Resources: including computational lexicography, terminology, electronic dictionaries, cross-lingual dictionaries, electronic thesauri, etc. -Hardware and software algorithms and tools for Asian or low-resource language processing, e.g., handwritten character recognition. -Information Understanding: including text understanding, speech understanding, character recognition, discourse processing, dialogue systems, etc. -Machine Translation involving Asian or low-resource languages. -Information Retrieval: including natural language processing (NLP) for concept-based indexing, natural language query interfaces, semantic relevance judgments, etc. -Information Extraction and Filtering: including automatic abstraction, user profiling, etc. -Speech processing: including text-to-speech synthesis and automatic speech recognition. -Multimedia Asian Information Processing: including speech, image, video, image/text translation, etc. -Cross-lingual information processing involving Asian or low-resource languages. -Papers that deal in theory, systems design, evaluation and applications in the aforesaid subjects are appropriate for TALLIP. Emphasis will be placed on the originality and the practical significance of the reported research.