使用后缀树改进向量模型

2007 2nd International Conference on Digital Information Management Pub Date : 2007-10-01 DOI:10.5019/J.IJCIR.2009.168

J. Martinovič, T. Novosad, V. Snás̃el

{"title":"使用后缀树改进向量模型","authors":"J. Martinovič, T. Novosad, V. Snás̃el","doi":"10.5019/J.IJCIR.2009.168","DOIUrl":null,"url":null,"abstract":"There are many ways how to search for documents in document collections. These methods take advantage of Boolean, vector, probabilistic and other models for representation of documents, queries, rules and procedures which can determine correspondence between user requests and documents. Each of these models have several restrictions. These restrictions do not allow a user to find all relevant documents. There are many irrelevant documents among returned ones by the system and some relevant documents missing at all. In the article there is a new method suggested which uses suffix trees for the vector query improvement. This method treats with documents as a, set of phrases (sentences) not just as a set of words. The sentence has a specific, semantic meaning (words in the sentence are ordered). This is advantage in comparison with the treated document just like with, a bag of words.","PeriodicalId":198626,"journal":{"name":"2007 2nd International Conference on Digital Information Management","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"Vector model improvement using suffix trees\",\"authors\":\"J. Martinovič, T. Novosad, V. Snás̃el\",\"doi\":\"10.5019/J.IJCIR.2009.168\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There are many ways how to search for documents in document collections. These methods take advantage of Boolean, vector, probabilistic and other models for representation of documents, queries, rules and procedures which can determine correspondence between user requests and documents. Each of these models have several restrictions. These restrictions do not allow a user to find all relevant documents. There are many irrelevant documents among returned ones by the system and some relevant documents missing at all. In the article there is a new method suggested which uses suffix trees for the vector query improvement. This method treats with documents as a, set of phrases (sentences) not just as a set of words. The sentence has a specific, semantic meaning (words in the sentence are ordered). This is advantage in comparison with the treated document just like with, a bag of words.\",\"PeriodicalId\":198626,\"journal\":{\"name\":\"2007 2nd International Conference on Digital Information Management\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 2nd International Conference on Digital Information Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5019/J.IJCIR.2009.168\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 2nd International Conference on Digital Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5019/J.IJCIR.2009.168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

在文档集合中搜索文档的方法有很多种。这些方法利用布尔、向量、概率和其他模型来表示文档、查询、规则和过程，这些模型可以确定用户请求和文档之间的对应关系。这些模型都有一些限制。这些限制不允许用户查找所有相关文档。系统退回的文件中有很多不相关的文件，有些相关的文件根本就没有。本文提出了一种利用后缀树对向量查询进行改进的新方法。这种方法将文档视为一组短语(句子)，而不仅仅是一组单词。句子具有特定的语义意义(句子中的单词是有序的)。与处理过的文档相比，这是一个优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Vector model improvement using suffix trees

There are many ways how to search for documents in document collections. These methods take advantage of Boolean, vector, probabilistic and other models for representation of documents, queries, rules and procedures which can determine correspondence between user requests and documents. Each of these models have several restrictions. These restrictions do not allow a user to find all relevant documents. There are many irrelevant documents among returned ones by the system and some relevant documents missing at all. In the article there is a new method suggested which uses suffix trees for the vector query improvement. This method treats with documents as a, set of phrases (sentences) not just as a set of words. The sentence has a specific, semantic meaning (words in the sentence are ordered). This is advantage in comparison with the treated document just like with, a bag of words.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 2nd International Conference on Digital Information Management

自引率

0.00%

发文量