{"title":"Speeding-up hirschberg and hunt-szymanski LCS algorithms","authors":"M. Crochemore, C. Iliopoulos, Y. Pinzón","doi":"10.1109/SPIRE.2001.989737","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989737","url":null,"abstract":"Two algorithms are presented that solve the problem of recovering the longest common subsequence of two strings. The £rst algorithm is an improvement of Hirschberg's divide-and- conquer algorithm. The second algorithm is an improvement of Hunt-Szymanski algorithm based on an ef£cient computation of all dominant match points. These two algorithms use bit-vector operations and are shown to work very ef£ciently in practice.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132348809","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A model for the representation and focussed retrieval of structured documents based on fuzzy aggregation","authors":"G. Kazai, M. Lalmas, T. Roelleke","doi":"10.1109/SPIRE.2001.989746","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989746","url":null,"abstract":"Effective retrieval of structured documents should exploit the content and structural knowledge associated with the documents. This knowledge can be used to focus retrieval to the best entry points: document components that contain relevant information, and from which users can browse to retrieve further relevant components. To enable this, suitable representation methods must be developed. This paper presents a model for representing structured documents to allow for their focussed retrieval. The model is founded on fuzzy aggregation, an approach based on the fuzzy representation of linguistic quantifiers and ordered weighted averaging operators. By defining the representation of a document component as the fuzzy aggregation of its related components, we arrive at a document representation that supports the selection of best entry points.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128045097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a graphical user interface for focussed retrieval of structured documents","authors":"F. Crestani, P. de la Fuente, J. Vegas","doi":"10.1109/SPIRE.2001.989775","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989775","url":null,"abstract":"Many document collections contain documents that have signiJicant structure. Structured document retrieval requires diferent models and interfaces from standard information Retrieval. An Information Retrieval system dealing with structured documents has to enable a user to query, browse retrieved documents, and provide query refinement and relevance feedback based not only on full documents but also on specific parts of them, according to their structure. Currently, very few IR systems enable such level of flexibility and interaction, because of limitations in indexing and retrieval models and in interfaces. In this papec we present the design of a new graphical user interface for structured document retrieval. This interface provides the user with an intuitive and yet powerful set of tools for structured document searching, retrieved list navigation, and search refinement.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134133466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using semantics for paragraph selection in question answering systems","authors":"J. Vicedo","doi":"10.1109/SPIRE.2001.989765","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989765","url":null,"abstract":"Ejiciency of term-based Question Answering systems is limited to answering questions whose answer is expressed in documents by using mainly the same terms appearing in questions. The system presented in this paper overcomes this fact by performing open domain Question Answering (QA) from a semantic perspective. For this purpose, we define a general semantic model that represents the concepts referenced into the questions as well as a relevance measure that allows locating and ranking fragments of documents fiom whose content is possible to infer the answer to specific questions. mth the purpose of evaluation, this model has been embedded into a full QA system. Comparison of performance between our model and term-based approaches shows that QA measures improve signiJicantly when this model is applied to paragraph selection process.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125584931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Kadota, Masahiro Hirao, A. Ishino, M. Takeda, A. Shinohara, F. Matsuo
{"title":"Musical sequence comparison for melodic and rhythmic similarities","authors":"T. Kadota, Masahiro Hirao, A. Ishino, M. Takeda, A. Shinohara, F. Matsuo","doi":"10.1109/SPIRE.2001.989744","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989744","url":null,"abstract":"We address the problem of musical sequence comparison for melodic similarity. Starting with a very simple similarity measure, we improve it step-by-step to finally obtain an acceptable measure. While the measure is still simple and has only two tuning parameters, it is better than that proposed by Mongeau and Sankoff (1990) in the sense that it can distinguish variations on a particular theme from a mixed collection of variations on multiple themes by Mozart, more successfully than the Mongeau-Sankoff measure. We also present a measure for quantifying rhythmic similarity and evaluate its performance on popular Japanese songs.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126960898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast categorisation of large document collections","authors":"Vaughan R. Shanks, H. Williams","doi":"10.1109/SPIRE.2001.989757","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989757","url":null,"abstract":"As the volume of data stored online increases, careful management of large document collections becomes increasingly important. Categorisation is one important document management technique. It has been efectively employed in the Web, where links to documents are maintained in topic or interest areas in, for example, the manuallycategorised Yahoo!‘ hierarchy. The drawback of manual categorisation is that it is practical only on small numbers of documents, it is not scalable, and relies on the subjective judgement of human assessors. Automatic categorisation has been shown to be an accurate alternative to manual categorisation. In automatic categorisation, documents are processed and automatically assigned to pre-defined categories that represent an interest or topic area. We propose and investigate heuristics for fast categorisation of laGe collections of documents that are focused on selecting a minimal set of representative features from uncategorised documents. We show that these new heuristics are accurate-in some cases more accurate than the baseline techniques-and also permit more than three-fold reductions in processing time for categorising large collections.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115830031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Compaction techniques for nextword indexes","authors":"D. Bahle, H. Williams, J. Zobel","doi":"10.1109/SPIRE.2001.989735","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989735","url":null,"abstract":"Most queries to text search engines are ranked or Boolean. Phrase querying is a powerful technique for rejning searches, but is expensive to implement on conventional indexes. In previous work we introduced the nextword index, a structure specifically designed for phrase queries, which however is relatively large. In this paper we introduce new compaction techniques for nextword indexes. In contrast to most index compression schemes, these techniques are lossy, yet as we show allow full resolution ofphrase queries without false match checking. We show experimentally that our novel techniques lead to significant savings in index size.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"54 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126007681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Re-store: a system for compressing, browsing, and searching large documents","authors":"Alistair Moffat, R. Wan","doi":"10.1109/SPIRE.2001.989752","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989752","url":null,"abstract":"A constant temperature box comprises a body and a lid therefor which are of adiabatic construction, and is incorporated with a container used as a cooling or heating source, the container being made flat and arranged opposite to each other at the side walls of the box body, so that the container may cool or warm foodstuffs and beverages kept within the constant temperature box.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"76 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127089805","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An efficient bottom-up distance between trees","authors":"G. Valiente","doi":"10.1109/SPIRE.2001.989761","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989761","url":null,"abstract":"A new bottom-up distance measure for labeled trees, which is based on the largest common forest of the trees and has the threefold advantage of independence ofparticular edit costs, low complexity, and coverage of ordered and unordered trees, is introduced and related in this paper with other distance measures published in the literature. Algorithms for computing the bottom-up distance in time linear in the number ofnodes are given in full detail.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"132 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121775181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
N. Brisaboa, Miguel R. Penabad, Á. Places, F. J. Rodríguez
{"title":"A documental database query language","authors":"N. Brisaboa, Miguel R. Penabad, Á. Places, F. J. Rodríguez","doi":"10.1109/SPIRE.2001.989772","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989772","url":null,"abstract":"This work presents a natural language based technique to build user interfaces to query document databases through the web. We call such technique Bounded Natural Language (BNL). Interfaces based on BNL are useful to query document databases containing only structured data, containing only text or containing both of them. That is, the underlying formalism of BNL can integrate restrictions over structured and non-structured data (as text).Interfaces using BNL can be programmed ad hoc for any document database but in this paper we present a system with an ontology based architecture in which the user interface is automatically generated by a software module (User Interface Generator) capable of reading and following the ontology. This ontology is a conceptualization of the database model, which uses a label in natural language for any concept in the ontology. Each label represents the usual name for a concept in the real world.The ontology includes general concepts useful when the user is interested in documents in any corpus in the database, and specific concepts useful when the user is interested in a specific corpus. That is, databases can store one or more corpus of documents and queries can be issued either over the whole database or over a specific corpus.The ontology guides the execution of the User Interface Generator and other software modules in such a way that any change in the database does not imply making changes in the program code, because the whole system runs following the ontology. That is, if a modification in the database schema occurs, only the ontology must be changed and the User Interface Generator will produce a new and different user interface adapted to the new database.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121872303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}