{"title":"形态信息检索的语义模型:比较累积分析","authors":"N. Z. Tawfeeq, W. S. Abed, Omar Ghazal","doi":"10.1109/AiCIS51645.2020.00011","DOIUrl":null,"url":null,"abstract":"The main function of information retrieval (IR) system is to obtain efficient and exactly a minimum subset of document that is related to user concern. Synonymy and polysemy act as a barrier for natural language processing algorithms due to overestimation and misrepresentation. The proposed model uses the implicit of higher rank structure in combing terms with document to optimize the identification of relevant document based on terms used in queries with an enhanced automatic indexing approach has been suggested. The study benefited from the use of Term Frequency Inverse Document Frequency (TF-IDF) method to assign weight for each term in the document. Each document is presented as a vector of weight in the space. Also, the user query is represented as vector of weight. Finally, a Singular Value Decomposition (SVD) approach has been used in which a huge weight of term-document matrix is factorized into collection of vectors for approximation of the original matrix. The cosine similarity is also used to determine the closed vector of document to the user query. In regard to English information retrieval, It was observed that TF-IDF showed higher performance before term percentage 0.3 while Latent Semantic Indexing (LSI) was more stable than TF-IDF, especially in terms of the use of word association.","PeriodicalId":388584,"journal":{"name":"2020 2nd Annual International Conference on Information and Sciences (AiCIS)","volume":"233 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A semantic model of morphological information retrieval: A comparative accumulative analysis\",\"authors\":\"N. Z. Tawfeeq, W. S. Abed, Omar Ghazal\",\"doi\":\"10.1109/AiCIS51645.2020.00011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main function of information retrieval (IR) system is to obtain efficient and exactly a minimum subset of document that is related to user concern. Synonymy and polysemy act as a barrier for natural language processing algorithms due to overestimation and misrepresentation. The proposed model uses the implicit of higher rank structure in combing terms with document to optimize the identification of relevant document based on terms used in queries with an enhanced automatic indexing approach has been suggested. The study benefited from the use of Term Frequency Inverse Document Frequency (TF-IDF) method to assign weight for each term in the document. Each document is presented as a vector of weight in the space. Also, the user query is represented as vector of weight. Finally, a Singular Value Decomposition (SVD) approach has been used in which a huge weight of term-document matrix is factorized into collection of vectors for approximation of the original matrix. The cosine similarity is also used to determine the closed vector of document to the user query. In regard to English information retrieval, It was observed that TF-IDF showed higher performance before term percentage 0.3 while Latent Semantic Indexing (LSI) was more stable than TF-IDF, especially in terms of the use of word association.\",\"PeriodicalId\":388584,\"journal\":{\"name\":\"2020 2nd Annual International Conference on Information and Sciences (AiCIS)\",\"volume\":\"233 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 2nd Annual International Conference on Information and Sciences (AiCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AiCIS51645.2020.00011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd Annual International Conference on Information and Sciences (AiCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AiCIS51645.2020.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A semantic model of morphological information retrieval: A comparative accumulative analysis
The main function of information retrieval (IR) system is to obtain efficient and exactly a minimum subset of document that is related to user concern. Synonymy and polysemy act as a barrier for natural language processing algorithms due to overestimation and misrepresentation. The proposed model uses the implicit of higher rank structure in combing terms with document to optimize the identification of relevant document based on terms used in queries with an enhanced automatic indexing approach has been suggested. The study benefited from the use of Term Frequency Inverse Document Frequency (TF-IDF) method to assign weight for each term in the document. Each document is presented as a vector of weight in the space. Also, the user query is represented as vector of weight. Finally, a Singular Value Decomposition (SVD) approach has been used in which a huge weight of term-document matrix is factorized into collection of vectors for approximation of the original matrix. The cosine similarity is also used to determine the closed vector of document to the user query. In regard to English information retrieval, It was observed that TF-IDF showed higher performance before term percentage 0.3 while Latent Semantic Indexing (LSI) was more stable than TF-IDF, especially in terms of the use of word association.