形态信息检索的语义模型:比较累积分析

2020 2nd Annual International Conference on Information and Sciences (AiCIS) Pub Date : 2020-11-01 DOI:10.1109/AiCIS51645.2020.00011

N. Z. Tawfeeq, W. S. Abed, Omar Ghazal

{"title":"形态信息检索的语义模型:比较累积分析","authors":"N. Z. Tawfeeq, W. S. Abed, Omar Ghazal","doi":"10.1109/AiCIS51645.2020.00011","DOIUrl":null,"url":null,"abstract":"The main function of information retrieval (IR) system is to obtain efficient and exactly a minimum subset of document that is related to user concern. Synonymy and polysemy act as a barrier for natural language processing algorithms due to overestimation and misrepresentation. The proposed model uses the implicit of higher rank structure in combing terms with document to optimize the identification of relevant document based on terms used in queries with an enhanced automatic indexing approach has been suggested. The study benefited from the use of Term Frequency Inverse Document Frequency (TF-IDF) method to assign weight for each term in the document. Each document is presented as a vector of weight in the space. Also, the user query is represented as vector of weight. Finally, a Singular Value Decomposition (SVD) approach has been used in which a huge weight of term-document matrix is factorized into collection of vectors for approximation of the original matrix. The cosine similarity is also used to determine the closed vector of document to the user query. In regard to English information retrieval, It was observed that TF-IDF showed higher performance before term percentage 0.3 while Latent Semantic Indexing (LSI) was more stable than TF-IDF, especially in terms of the use of word association.","PeriodicalId":388584,"journal":{"name":"2020 2nd Annual International Conference on Information and Sciences (AiCIS)","volume":"233 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A semantic model of morphological information retrieval: A comparative accumulative analysis\",\"authors\":\"N. Z. Tawfeeq, W. S. Abed, Omar Ghazal\",\"doi\":\"10.1109/AiCIS51645.2020.00011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main function of information retrieval (IR) system is to obtain efficient and exactly a minimum subset of document that is related to user concern. Synonymy and polysemy act as a barrier for natural language processing algorithms due to overestimation and misrepresentation. The proposed model uses the implicit of higher rank structure in combing terms with document to optimize the identification of relevant document based on terms used in queries with an enhanced automatic indexing approach has been suggested. The study benefited from the use of Term Frequency Inverse Document Frequency (TF-IDF) method to assign weight for each term in the document. Each document is presented as a vector of weight in the space. Also, the user query is represented as vector of weight. Finally, a Singular Value Decomposition (SVD) approach has been used in which a huge weight of term-document matrix is factorized into collection of vectors for approximation of the original matrix. The cosine similarity is also used to determine the closed vector of document to the user query. In regard to English information retrieval, It was observed that TF-IDF showed higher performance before term percentage 0.3 while Latent Semantic Indexing (LSI) was more stable than TF-IDF, especially in terms of the use of word association.\",\"PeriodicalId\":388584,\"journal\":{\"name\":\"2020 2nd Annual International Conference on Information and Sciences (AiCIS)\",\"volume\":\"233 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 2nd Annual International Conference on Information and Sciences (AiCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AiCIS51645.2020.00011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd Annual International Conference on Information and Sciences (AiCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AiCIS51645.2020.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

信息检索(information retrieval, IR)系统的主要功能是高效准确地获取与用户关注相关的文档的最小子集。同义词和多义词由于高估和误读而成为自然语言处理算法的障碍。该模型利用高阶结构的隐式将术语与文档结合，以优化查询中使用的术语对相关文档的识别，并提出了一种增强的自动索引方法。该研究得益于使用术语频率逆文档频率(TF-IDF)方法为文档中的每个术语分配权重。每个文档被表示为空间中的权重向量。同样，用户查询被表示为权重向量。最后，采用奇异值分解(SVD)方法，将权重较大的词-文档矩阵分解为向量集合，逼近原矩阵。余弦相似度也用于确定文档到用户查询的封闭向量。在英语信息检索方面，词汇百分比为0.3之前，TF-IDF表现出更高的性能，而潜在语义索引(LSI)比TF-IDF更稳定，尤其是在单词关联的使用方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A semantic model of morphological information retrieval: A comparative accumulative analysis

The main function of information retrieval (IR) system is to obtain efficient and exactly a minimum subset of document that is related to user concern. Synonymy and polysemy act as a barrier for natural language processing algorithms due to overestimation and misrepresentation. The proposed model uses the implicit of higher rank structure in combing terms with document to optimize the identification of relevant document based on terms used in queries with an enhanced automatic indexing approach has been suggested. The study benefited from the use of Term Frequency Inverse Document Frequency (TF-IDF) method to assign weight for each term in the document. Each document is presented as a vector of weight in the space. Also, the user query is represented as vector of weight. Finally, a Singular Value Decomposition (SVD) approach has been used in which a huge weight of term-document matrix is factorized into collection of vectors for approximation of the original matrix. The cosine similarity is also used to determine the closed vector of document to the user query. In regard to English information retrieval, It was observed that TF-IDF showed higher performance before term percentage 0.3 while Latent Semantic Indexing (LSI) was more stable than TF-IDF, especially in terms of the use of word association.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 2nd Annual International Conference on Information and Sciences (AiCIS)

自引率

0.00%

发文量