形态信息检索的语义模型:比较累积分析

N. Z. Tawfeeq, W. S. Abed, Omar Ghazal
{"title":"形态信息检索的语义模型:比较累积分析","authors":"N. Z. Tawfeeq, W. S. Abed, Omar Ghazal","doi":"10.1109/AiCIS51645.2020.00011","DOIUrl":null,"url":null,"abstract":"The main function of information retrieval (IR) system is to obtain efficient and exactly a minimum subset of document that is related to user concern. Synonymy and polysemy act as a barrier for natural language processing algorithms due to overestimation and misrepresentation. The proposed model uses the implicit of higher rank structure in combing terms with document to optimize the identification of relevant document based on terms used in queries with an enhanced automatic indexing approach has been suggested. The study benefited from the use of Term Frequency Inverse Document Frequency (TF-IDF) method to assign weight for each term in the document. Each document is presented as a vector of weight in the space. Also, the user query is represented as vector of weight. Finally, a Singular Value Decomposition (SVD) approach has been used in which a huge weight of term-document matrix is factorized into collection of vectors for approximation of the original matrix. The cosine similarity is also used to determine the closed vector of document to the user query. In regard to English information retrieval, It was observed that TF-IDF showed higher performance before term percentage 0.3 while Latent Semantic Indexing (LSI) was more stable than TF-IDF, especially in terms of the use of word association.","PeriodicalId":388584,"journal":{"name":"2020 2nd Annual International Conference on Information and Sciences (AiCIS)","volume":"233 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A semantic model of morphological information retrieval: A comparative accumulative analysis\",\"authors\":\"N. Z. Tawfeeq, W. S. Abed, Omar Ghazal\",\"doi\":\"10.1109/AiCIS51645.2020.00011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main function of information retrieval (IR) system is to obtain efficient and exactly a minimum subset of document that is related to user concern. Synonymy and polysemy act as a barrier for natural language processing algorithms due to overestimation and misrepresentation. The proposed model uses the implicit of higher rank structure in combing terms with document to optimize the identification of relevant document based on terms used in queries with an enhanced automatic indexing approach has been suggested. The study benefited from the use of Term Frequency Inverse Document Frequency (TF-IDF) method to assign weight for each term in the document. Each document is presented as a vector of weight in the space. Also, the user query is represented as vector of weight. Finally, a Singular Value Decomposition (SVD) approach has been used in which a huge weight of term-document matrix is factorized into collection of vectors for approximation of the original matrix. The cosine similarity is also used to determine the closed vector of document to the user query. In regard to English information retrieval, It was observed that TF-IDF showed higher performance before term percentage 0.3 while Latent Semantic Indexing (LSI) was more stable than TF-IDF, especially in terms of the use of word association.\",\"PeriodicalId\":388584,\"journal\":{\"name\":\"2020 2nd Annual International Conference on Information and Sciences (AiCIS)\",\"volume\":\"233 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 2nd Annual International Conference on Information and Sciences (AiCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AiCIS51645.2020.00011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd Annual International Conference on Information and Sciences (AiCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AiCIS51645.2020.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

信息检索(information retrieval, IR)系统的主要功能是高效准确地获取与用户关注相关的文档的最小子集。同义词和多义词由于高估和误读而成为自然语言处理算法的障碍。该模型利用高阶结构的隐式将术语与文档结合,以优化查询中使用的术语对相关文档的识别,并提出了一种增强的自动索引方法。该研究得益于使用术语频率逆文档频率(TF-IDF)方法为文档中的每个术语分配权重。每个文档被表示为空间中的权重向量。同样,用户查询被表示为权重向量。最后,采用奇异值分解(SVD)方法,将权重较大的词-文档矩阵分解为向量集合,逼近原矩阵。余弦相似度也用于确定文档到用户查询的封闭向量。在英语信息检索方面,词汇百分比为0.3之前,TF-IDF表现出更高的性能,而潜在语义索引(LSI)比TF-IDF更稳定,尤其是在单词关联的使用方面。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A semantic model of morphological information retrieval: A comparative accumulative analysis
The main function of information retrieval (IR) system is to obtain efficient and exactly a minimum subset of document that is related to user concern. Synonymy and polysemy act as a barrier for natural language processing algorithms due to overestimation and misrepresentation. The proposed model uses the implicit of higher rank structure in combing terms with document to optimize the identification of relevant document based on terms used in queries with an enhanced automatic indexing approach has been suggested. The study benefited from the use of Term Frequency Inverse Document Frequency (TF-IDF) method to assign weight for each term in the document. Each document is presented as a vector of weight in the space. Also, the user query is represented as vector of weight. Finally, a Singular Value Decomposition (SVD) approach has been used in which a huge weight of term-document matrix is factorized into collection of vectors for approximation of the original matrix. The cosine similarity is also used to determine the closed vector of document to the user query. In regard to English information retrieval, It was observed that TF-IDF showed higher performance before term percentage 0.3 while Latent Semantic Indexing (LSI) was more stable than TF-IDF, especially in terms of the use of word association.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信