6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)最新文献

筛选
英文 中文
CoBWeb-a crawler for the Brazilian Web 蛛网——巴西网络的爬虫
A. D. Silva, Eveline Veloso, P. B. Golgher, B. Ribeiro-Neto, Alberto H. F. Laender, N. Ziviani
{"title":"CoBWeb-a crawler for the Brazilian Web","authors":"A. D. Silva, Eveline Veloso, P. B. Golgher, B. Ribeiro-Neto, Alberto H. F. Laender, N. Ziviani","doi":"10.1109/SPIRE.1999.796594","DOIUrl":"https://doi.org/10.1109/SPIRE.1999.796594","url":null,"abstract":"One of the key components of current Web search engines is the document collector. The paper describes CoBWeb, an automatic document collector whose architecture is distributed and highly scalable. CoBWeb aims at collecting large amounts of documents per time period while observing operational and ethical limits in the crawling process. CoBWeb is part of the SIAM (Information Systems in Mobile Computing Environments) search engine which is being implemented to support the Brazilian Web. Thus, several results related to the Brazilian Web are presented.","PeriodicalId":131279,"journal":{"name":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124668121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
A unifying framework for compressed pattern matching 压缩模式匹配的统一框架
T. Kida, Yusuke Shibata, M. Takeda, A. Shinohara, S. Arikawa
{"title":"A unifying framework for compressed pattern matching","authors":"T. Kida, Yusuke Shibata, M. Takeda, A. Shinohara, S. Arikawa","doi":"10.1109/SPIRE.1999.796582","DOIUrl":"https://doi.org/10.1109/SPIRE.1999.796582","url":null,"abstract":"We introduce a general framework which is suitable to capture an essence of compressed pattern matching according to various dictionary based compressions, and propose a compressed pattern matching algorithm for the framework. The goal is to find all occurrences of a pattern in a text without decompression, which is one of the most active topics in string matching. Our framework includes such compression methods as Lempel-Ziv family, (LZ77, LZSS, LZ78, LZW) (J. Ziv and A. Lempel, 1978), byte-pair encoding, and the static dictionary based method. Technically, our pattern matching algorithm extends that for LZW compressed text presented by A. Amir et al. (1996).","PeriodicalId":131279,"journal":{"name":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133626698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
A fast algorithm on average for all-against-all sequence matching 全对全序列匹配的平均快速算法
Ricardo Baeza-Yates, G. Gonnet
{"title":"A fast algorithm on average for all-against-all sequence matching","authors":"Ricardo Baeza-Yates, G. Gonnet","doi":"10.1109/SPIRE.1999.796573","DOIUrl":"https://doi.org/10.1109/SPIRE.1999.796573","url":null,"abstract":"We present an algorithm which attempts to align pairs of subsequences from a database of genetic sequences. The algorithm simulates the classical dynamic programming alignment algorithm over a suffix array of the database. We provide a detailed average case analysis which shows that the running time of the algorithm is subquadratic with respect to the database size. A similar algorithm solves the approximate string matching problem in sublinear average time.","PeriodicalId":131279,"journal":{"name":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131358470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
The EC query language applied to old manuscripts EC查询语言适用于旧手稿
J. Vegas, P. Fuente, Ricardo Baeza-Yates
{"title":"The EC query language applied to old manuscripts","authors":"J. Vegas, P. Fuente, Ricardo Baeza-Yates","doi":"10.1109/SPIRE.1999.796597","DOIUrl":"https://doi.org/10.1109/SPIRE.1999.796597","url":null,"abstract":"We show the possibilities of the EC query language in a very structured environment as a catalog of old manuscripts. The EC language can deal with simple queries and with more complex ones, as approximate searches. We have done two classes of experiments. The first one shows that the structure does not change the statistical behaviour of the system with regard to the frequency of the words. The second kind of experiments tends to show the statistical behaviour of the database when we use different structural elements in the queries.","PeriodicalId":131279,"journal":{"name":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116826472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effects of term segmentation on Chinese/English cross-language information retrieval 词分词对中英文跨语言信息检索的影响
Douglas W. Oard, Jianqiang Wang
{"title":"Effects of term segmentation on Chinese/English cross-language information retrieval","authors":"Douglas W. Oard, Jianqiang Wang","doi":"10.1109/SPIRE.1999.796590","DOIUrl":"https://doi.org/10.1109/SPIRE.1999.796590","url":null,"abstract":"The majority of recent Cross-Language Information Retrieval (CLIR) research has focused on European languages. CLIR problems that involve East Asian languages such as Chinese introduce additional challenges, because written Chinese texts lack boundaries between terms. The paper examines three Chinese segmentation techniques in combination with two variants of dictionary-based Chinese to English query translation. The results indicate that failure to segment terms, particularly technical terms and names, can have a cascading effect that reduces retrieval effectiveness. Task-tuned segmentation algorithms and alternative term weighting strategies are suggested as productive directions for future work.","PeriodicalId":131279,"journal":{"name":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130270014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Bounds for parametric sequence comparison 参数序列比较的界
David Fernández-Baca, T. Seppäläinen, G. Slutzki
{"title":"Bounds for parametric sequence comparison","authors":"David Fernández-Baca, T. Seppäläinen, G. Slutzki","doi":"10.1109/SPIRE.1999.796578","DOIUrl":"https://doi.org/10.1109/SPIRE.1999.796578","url":null,"abstract":"We consider the problem of computing a global alignment between two or more sequences subject to varying mismatch and indel penalties. We prove a tight 3(n/2/spl pi/)/sup 2/3/+O(n/sup 1/3/logn) bound on the worst-case number of distinct optimum alignments for two sequences of length n as the parameters are varied. This refines a O(n/sup 2/3/) upper bound by D. Gusfield et al. (1994). Our lower bound requires an unbounded alphabet. For strings over a binary alphabet, we prove a /spl Omega/(n/sup 1/2/) lower bound. For the parametric global alignment of k/spl ges/2 sequences under sum-of-pairs scoring, we prove a 3((k/2)n/2/spl pi/)/sup 2/3/+O(k/sup 2/3/n/sup 1/3/logn) upper bound on the number of distinct optimality regions and a /spl Omega/(n/sup 2/3/) lower bound. Based on experimental evidence, we conjecture that for two random sequences, the number of optimality regions is approximately /spl radic/n with high probability.","PeriodicalId":131279,"journal":{"name":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129917907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Top-down extraction of semi-structured data 自顶向下提取半结构化数据
B. Ribeiro-Neto, Alberto H. F. Laender, A. D. Silva
{"title":"Top-down extraction of semi-structured data","authors":"B. Ribeiro-Neto, Alberto H. F. Laender, A. D. Silva","doi":"10.1109/SPIRE.1999.796593","DOIUrl":"https://doi.org/10.1109/SPIRE.1999.796593","url":null,"abstract":"We propose an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. We propose a top-down strategy that extracts complex objects, decomposing them in objects less complex, until atomic objects have been extracted. Through experimentation, we demonstrate that with a small number of given examples, our strategy is able to extract most of the objects present in a Web source given as input.","PeriodicalId":131279,"journal":{"name":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122135369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Emotional awareness in collaborative systems 协作系统中的情感意识
O. García, J. Favela, R. Machorro
{"title":"Emotional awareness in collaborative systems","authors":"O. García, J. Favela, R. Machorro","doi":"10.1109/SPIRE.1999.796607","DOIUrl":"https://doi.org/10.1109/SPIRE.1999.796607","url":null,"abstract":"Emotions play an important role in human interaction. Both, our own emotional state and our perception of that of others with which we collaborate influence the outcome of cooperative work. With the growing interest in providing computational support for the recognition and representation of emotions, there is a clear interest in adding such facilities to groupware systems and to evaluate the positive and negative effects of using this additional channel of communication. We discuss the issues involved in supporting a new type of collaborative awareness in groupware, namely, emotional awareness. We also present two emotion-based sample applications, and discussion to further motivate work in this area within the collaborative community.","PeriodicalId":131279,"journal":{"name":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132655508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
An efficient method for in memory construction of suffix arrays 一种在内存中构造后缀数组的有效方法
Hideo Itoh, Hozumi Tanaka
{"title":"An efficient method for in memory construction of suffix arrays","authors":"Hideo Itoh, Hozumi Tanaka","doi":"10.1109/SPIRE.1999.796581","DOIUrl":"https://doi.org/10.1109/SPIRE.1999.796581","url":null,"abstract":"The suffix array is a string-indexing structure and a memory efficient alternative to the suffix tree. It has many advantages for text processing. We propose an efficient algorithm for sorting suffixes. We call this algorithm the two-stage suffix sort. One of our ideas is to exploit the specific relationships between adjacent suffixes. Our algorithm makes it possible to use the suffix array for much larger texts and suggests new areas of application. Our experiments on several text data sets (including 514-MB Japanese newspapers) demonstrate that our algorithm is 4.5 to 6.9 times faster than Quicksort, and 2.5 to 3.6 times faster than K. Sadakane's (1998) algorithm, which is considered to be the fastest algorithm in previous work.","PeriodicalId":131279,"journal":{"name":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125570310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 74
String-oriented databases String-oriented数据库
A. Rajasekar
{"title":"String-oriented databases","authors":"A. Rajasekar","doi":"10.1109/SPIRE.1999.796591","DOIUrl":"https://doi.org/10.1109/SPIRE.1999.796591","url":null,"abstract":"Relational databases and Datalog view each attribute as indivisible. This view, though useful in several applications, does not provide a suitable database paradigm for use in genetic, multimedia or scientific databases. Data in these applications are unstructured; querying on sub-strings of attribute values is often necessary. Moreover due to imprecision and incompleteness in the data, approximate reasoning also becomes indispensable. Our aim is to view strings as database objects that can be compared, divided, subsumed, interpreted and approximated. Allowing such operations on strings enriches the semantics and increases the expressive power of database languages. We develop an extension to the relational algebra, augmenting it with the concept of a string expression with a rich structure of string variables, mapping functions, interpreted string operations and approximate evaluations. We study properties of such expressions and show that many of the well-known properties of relational algebra hold in the extension. We also discuss an extension to Datalog(String) and an implementation of a prototype system called S-log. S-log integrates pattern matching in Datalog framework. We contend that string oriented database systems would be useful in applications that require efficient sub-structure analysis, such as aligning DNA strings using motifs, retrieving and synthesizing iconic images based on content.","PeriodicalId":131279,"journal":{"name":"6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1999-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129265384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信