Proceedings Eighth Symposium on String Processing and Information Retrieval最新文献

筛选
英文 中文
Genome rearrangements distance by fusion, fission, and transposition is easy 基因组重排距离通过融合,裂变,和转位是容易的
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989776
Zanoni Dias, J. Meidanis
{"title":"Genome rearrangements distance by fusion, fission, and transposition is easy","authors":"Zanoni Dias, J. Meidanis","doi":"10.1109/SPIRE.2001.989776","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989776","url":null,"abstract":"Given two genomes represented as circularly ordered sequences of genes, we show a polynomial time algorithm for the minimum weight series of fusion, jissions, and transpositions (with transpositions weighing twice as much as fusions and$ssions) that transforms one genome into the other. The algorithm is based on classical results ofpermutation group theory and is the jirst polynomial result for a genome rearrangement problem involving transpositions. It has been observed in real biological instances that transpositions occur with about ha&- the frequency of reversals. Although we are not using reversals in this study, this observation motivated the double weight assigned to transpositions.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122214562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
A stemming algorithm for the portuguese language 葡萄牙语的词干提取算法
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989755
Viviane Moreira Orengo, C. Huyck
{"title":"A stemming algorithm for the portuguese language","authors":"Viviane Moreira Orengo, C. Huyck","doi":"10.1109/SPIRE.2001.989755","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989755","url":null,"abstract":"Stemming algorithms are traditionally used in Information Retrieval with the goal of enhancing recall, as they conflate the variant forms of a word into a common representation. This paper describes the development of a simple and eflective su&?x-stripping algorithm for Portuguese. The stemmer is evaluated using a method proposed by Paice f9/. The results show that it performs significantly better than the Portuguese version of the Porter algorithm.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126870550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 112
Distributed query processing using partitioned inverted files 使用分区倒置文件的分布式查询处理
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989733
C. Badue, Ricardo Baeza-Yates, B. Ribeiro-Neto, N. Ziviani
{"title":"Distributed query processing using partitioned inverted files","authors":"C. Badue, Ricardo Baeza-Yates, B. Ribeiro-Neto, N. Ziviani","doi":"10.1109/SPIRE.2001.989733","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989733","url":null,"abstract":"In this paper; we study query processing in a distributed text database. The novelty is a real distributed architecture implementation that oflers concurrent query service. The distributed system adopts a network of workstations model and the client-server paradigm. The document collection is indexed with an imerted$le. We adopt two distinct strategies of index partitioning in the distributed system, namely local index partitioning and global indexpartitioning. In both strategies, documents are ranked using the vector space model along with a documentfiltering technique for fast ranking. We evaluate and compare the impact of the two index partitioning strategies on query processing per$ormance. Experimental results on retrieval eficiency show that, within our framework, the global index partitioning outpe~orms the local index partitioning.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131440674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 115
Adding security to compressed information retrieval systems 增加压缩信息检索系统的安全性
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 2001-11-13 DOI: 10.1109/SPIRE.2001.989778
R. Milidiú, C. G. Mello, José Rodrigues Fernandes
{"title":"Adding security to compressed information retrieval systems","authors":"R. Milidiú, C. G. Mello, José Rodrigues Fernandes","doi":"10.1109/SPIRE.2001.989778","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989778","url":null,"abstract":"Word-based Huffman coding has widespread use in information retrieval systems. Besides its compressing power, it also enables the implementation of both indexing and searching schema in the compressed file. In this work, an algorithm that adds securiry to compressed data is proposed. It shows a small loss in coding, decoding and compression performances. The algorithm uses homophonic substitution, canonical Huffman codes and a secret key for enciphering.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130273732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Storing semistructured data in relational databases 在关系数据库中存储半结构化数据
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 2001-11-01 DOI: 10.1109/SPIRE.2001.989749
K. V. Magalhães, Alberto H. F. Laender, A. D. Silva
{"title":"Storing semistructured data in relational databases","authors":"K. V. Magalhães, Alberto H. F. Laender, A. D. Silva","doi":"10.1109/SPIRE.2001.989749","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989749","url":null,"abstract":"This paper presents an approach to storing semistructured data in relational databases. We focus on semistructured data as extracted from Web pages by a tool called DEBYE (Data Extraction By Example), and organized according to its data model, the DEByE Object Model (DEByEOM). The approach presented here consists in representing the structure of the objects extracted by DEByE by a relational schema and populating the corresponding database accordingly. We also show how to retrieve such objects by automatically transforming high-level query specifications (query patterns) into SQL queries that are executed over the relational database. Experiments results carried out to evaluate our approach are also described.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132298584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Speed-up of Aho-Corasick pattern matching machines by rearranging states 通过重新排列状态加速Aho-Corasick模式匹配机
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 1900-01-01 DOI: 10.1109/SPIRE.2001.989753
T. Nishimura, S. Fukamachi, T. Shinohara
{"title":"Speed-up of Aho-Corasick pattern matching machines by rearranging states","authors":"T. Nishimura, S. Fukamachi, T. Shinohara","doi":"10.1109/SPIRE.2001.989753","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989753","url":null,"abstract":"This article describes speed-up of string pattern matching by rearranging states in Aho-Corasick pattern matching machine, which is a kind of afinite automaton. We realized speed-up of string pattern matching using data compression. Although we obtain higher compression ratio using a finite state model, it doesn't lead speed-up of string pattern matching. Because the pattern matching machine becomes very large, when compression codes are complex. Random Access Memory (RAM) are scattered with states used frequently Such states are close to the initial state of pattern matching machine. We rearrange states so as to collecting states used frequently for CPU cache eficiency. We renumber states in breadth-first order. In experiments, the elapsed time is reduced to about 55% in case of a compressed English text.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122165342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
On using two-phase filtering in indexed approximate string matching with application to searching unique oligonucleotides 两相滤波在索引近似字符串匹配中的应用及其在唯一寡核苷酸搜索中的应用
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 1900-01-01 DOI: 10.1109/spire.2001.989742
H. Hyyro
{"title":"On using two-phase filtering in indexed approximate string matching with application to searching unique oligonucleotides","authors":"H. Hyyro","doi":"10.1109/spire.2001.989742","DOIUrl":"https://doi.org/10.1109/spire.2001.989742","url":null,"abstract":"We discuss using an indexing scheme to accelerate approximate search over a static text in the case of using unit cost edit distance as the measure of similarity between strings. First we generally consider the filtering criteria that can be used as a basis for the index, and then propose using filtering twice before the final checking phase. The last part consists of presenting an indexed approximate string matching application in bioinformatics, which is the search of unique oligonucleotides. We present practical comparisons and results for using different filtering schemes in this application. Our tests have involved a total of 15 different genomes, from which we present some results involving the largest two of these: The genome of Saccharomyces cerevisiae (baker's yeast) and a recent draft of the human genome, the latter being also the main target of the application.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121070692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A comparative study of topic identification on newspaper and e-mail 报纸与电子邮件话题识别的比较研究
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 1900-01-01 DOI: 10.1109/SPIRE.2001.989770
B. Bigi, A. Brun, J. Haton, K. Smaïli, I. Zitouni
{"title":"A comparative study of topic identification on newspaper and e-mail","authors":"B. Bigi, A. Brun, J. Haton, K. Smaïli, I. Zitouni","doi":"10.1109/SPIRE.2001.989770","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989770","url":null,"abstract":"This work presents several statistical methods for topic identification on two kinds of textual data: newspaper articles and e-mails. Five methods are tested on these two corpora: topic unigrams, cache model, TFIDF classijier, topic peqdexity, and weighted model. Our work aims to study these methods by confronting them to very diferent data. This study is very fruitful for our research. Statistical topic identiJication methods depend not only on a corpus, but also on its type. One of the methods achieves a topic identiJcation of 80% on a general newspaper corpus but does not exceed 30% on e-mail corpus. Another method gives the best result on e-mails, but has not the same behavior on a newspaper corpus. We also show in this paper that almost all our methods achieve good results in retrieving the first two manually annotated labels.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133594057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
Exact distribution of deletion sizes for unavoidable strings 不可避免的字符串的删除大小的确切分布
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 1900-01-01 DOI: 10.1109/SPIRE.2001.10014
Christine E. Heitsch
{"title":"Exact distribution of deletion sizes for unavoidable strings","authors":"Christine E. Heitsch","doi":"10.1109/SPIRE.2001.10014","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.10014","url":null,"abstract":"We constructively prove the exact distribution of deletion sizes for unavoidable strings, under the reductive decidability method of Zimin and Bean et al. Bounds such as these on the unique initial reductions of unavoidable strings were instrumental in proving the computational intractability of the reduction algorithm. We also provide the necessa y supporting results, including some useful approximations on the deletion sizes of individual strings. This work improves upon previous results that, although suficient to establish the desired exponential lower bound, were far from optimal.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132638381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Semantic thesaurus for automatic expanded query in information retrieval 信息检索中用于自动扩展查询的语义同义词典
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 1900-01-01 DOI: 10.1109/SPIRE.2001.10023
Marco González, Vera Lúcia Strube de Lima
{"title":"Semantic thesaurus for automatic expanded query in information retrieval","authors":"Marco González, Vera Lúcia Strube de Lima","doi":"10.1109/SPIRE.2001.10023","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.10023","url":null,"abstract":"This article proposes (a) a semantic structuring for thesauri and (b) a procedure that handles it, for automatic query expansion in information retrieval. The thesaurus for this experiment was built manually, based on a traditional dictionary, adopting aspects from the Generative Lexicon Theory by James Pustejovsky as well as concepts from object oriented software modeling. We show how to select new terms for query expansion and to calculate their weights. This last task is performed according to intersections of the derived lexical sets and to the depth level for descriptors search with respect to each considered term. Also, an evaluation of the use of these resources is presented.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115286041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信