Proceedings Eighth Symposium on String Processing and Information Retrieval最新文献

筛选
英文 中文
A subquadratic algorithm for cluster and outlier detection in massive metric data 海量度量数据中聚类和离群点检测的次二次算法
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 1900-01-01 DOI: 10.1109/SPIRE.2001.10018
Edgar Chávez
{"title":"A subquadratic algorithm for cluster and outlier detection in massive metric data","authors":"Edgar Chávez","doi":"10.1109/SPIRE.2001.10018","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.10018","url":null,"abstract":"The problem of cluster and outlier detection is a classic problem of non-parametric statistics. In recent times the need for cluster analysis in massive multimedia data sets (terabytes of data sampled from a metric space) have demonstrated the need for solutions both in the sense of being capable of automatic clustering metric data and at reasonable speed. Since cluster properties involve the relationship between each pair of data set elements, a good clustering algorithm must examine (in principle) every distance pair and hence has quadratic complexity. An appealing trend to achieve subquadratic complexity is either a) to use an approximation for a classic clustering algorithm or b) to design a new algorithm for clustering. This paper presents a new clustering algorithm performing O(n1+α) distance computations (the operation ofleading complexity), with 0 ⩽ α ⩽ 1 a constant depending on the intrinsic dimension of the sample data. The algorithm can detect outliers in the sample data and, if desired, it can produce a hierarchical structure (a dendogram) pointing to clusters at different resolutions.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130567924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Relating web characteristics with link based web page ranking 将网页特征与基于链接的网页排名联系起来
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 1900-01-01 DOI: 10.1109/SPIRE.2001.989734
Ricardo Baeza-Yates, C. Castillo
{"title":"Relating web characteristics with link based web page ranking","authors":"Ricardo Baeza-Yates, C. Castillo","doi":"10.1109/SPIRE.2001.989734","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989734","url":null,"abstract":"In the last years, several techniques based in link analysis have been proposed and used in search engines to rank Web pages. As links are generated by humans, link based ranking seems to give better results than traditional automatic techniques such as word based ranking. However, no studies have been done about their real impact. In this paper we extend global page ranking techniques to Web site ranking, and do a first experimental analysis of link ranking regarding the structure and dynamics of the Web.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133003392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
On-line construction of symmetric compact directed acyclic word graphs 对称紧致有向无环字图的在线构造
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 1900-01-01 DOI: 10.1109/SPIRE.2001.989743
Shunsuke Inenaga, H. Hoshino, A. Shinohara, M. Takeda, S. Arikawa
{"title":"On-line construction of symmetric compact directed acyclic word graphs","authors":"Shunsuke Inenaga, H. Hoshino, A. Shinohara, M. Takeda, S. Arikawa","doi":"10.1109/SPIRE.2001.989743","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989743","url":null,"abstract":"The Compact Directed Acyclic Word Graph (CDAWG) is a space-eflcient data structure that supports indices of a string. The Symmetric Directed Acyclic Word Graph (SCDAWG) for a string w is a dual structure that supports indices of both w and the reverse of w simultaneously. Blumer et al. gave the first algorithm to construct an SCDAWG from a given string, that works in an of-line manner. In this papec we show an on-line algorithm that constructs an SCDAWGfiom a given string directly.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127826724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Using edit distance in point-pattern matching 在点模式匹配中使用编辑距离
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 1900-01-01 DOI: 10.1109/SPIRE.2001.989751
V. Makinen
{"title":"Using edit distance in point-pattern matching","authors":"V. Makinen","doi":"10.1109/SPIRE.2001.989751","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989751","url":null,"abstract":"Edit distance is a powerful measure of similarity in string matching, measuring the minimum amount of insertions, deletions, and substitutions to convert a string into another string. This measure is ofte. contrasted with time warping in speech processing, that measures how close two trajectories are by allowing compression and expansion operations on time scale. Erne warping can be easily generalized to measure the similarity between ID point-patterns (ascending lists of real values), as the diference between ith and (i l ) th points in a point-pattern can be considered as the value of a trajectory at the time i. Howeve< we show that edit distance is more natural choice, and derive a measure by calculating the minimum amount of space needed to insert and delete between points to convert a point-pattern into another. We show that this measure defines a metric. We also define a substitution operation such that the distance calculation automatically separates the points into matching and mismatching points. The algorithms are based on dynamic programming. The main motivation for these methods is two and higher dimensional point-pattern matching, and therefore we generalize these methods into the 2 0 case, and show that this generalization leads to an NP-complete problem. There is also applications for the I D case; we discuss shortly the matching of tree ring sequences in dendrochronology.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127863073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On compression of parse trees 关于解析树的压缩
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 1900-01-01 DOI: 10.1109/SPIRE.2001.989759
J. Tarhio
{"title":"On compression of parse trees","authors":"J. Tarhio","doi":"10.1109/SPIRE.2001.989759","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989759","url":null,"abstract":"We consider methods for compressing parse trees, especially techniques based on statistical modeling. We regard a sequence of productions corresponding to a sum of the path from the root of a tree to a node x as the context of a node x. The contexts are augmented with branching information of the nodes. By applying the text compression algorithm PPMon such contexts we achieve good compression results. We compare experimentally the PPMapproach with other methods.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"127 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123071128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Of maps bigger than the empire 比帝国还大的地图
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 1900-01-01 DOI: 10.1109/SPIRE.2001.989732
A. Apostolico
{"title":"Of maps bigger than the empire","authors":"A. Apostolico","doi":"10.1109/SPIRE.2001.989732","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.989732","url":null,"abstract":"In a passage by J.L. Borges on the \"exactitude of Science,\" a fictitious author describes an Empire in which the art of Cartography \"logro tal perfeccion que el mapa de una sola Provincia ocupaba toda la Ciudad, y el mapa del Imperio toda una Provincia.\" With time, these huge maps wouldn't be enough, and the Colleges of the Cartographers erected a map of the Empire that equalled in width the Empire itself... This paper concerns itself with increasing cases of pattern discovery and data mining in which synopses, indices and relationships thereof seem to grow faster and bigger than the phenomena they were meant to encapsulate. The paper then reviews specific examples of algorithmic and combinatorial constructs that proved capable of alleviating such paradoxes in the author's recent work experience.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114398253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Semantic labeling - unveiling the main components of meaning of free-text 语义标注——揭示自由文本意义的主要成分
Proceedings Eighth Symposium on String Processing and Information Retrieval Pub Date : 1900-01-01 DOI: 10.1109/SPIRE.2001.10027
Y. Zieman, R. Salas
{"title":"Semantic labeling - unveiling the main components of meaning of free-text","authors":"Y. Zieman, R. Salas","doi":"10.1109/SPIRE.2001.10027","DOIUrl":"https://doi.org/10.1109/SPIRE.2001.10027","url":null,"abstract":"An experimentally proven methodology for computing semantic labels for natural language and its use in semantic processing of text is described. A combinatorial model of the conceptual space is created where semantic labels result as combinations ofprimary or atomic concepts called Semantic Factors. The set of about 2,500 Semantic Factors is defined. The basic semantic element of a language is a morpheme-type element (s-morpheme), the minimalpart ofa language that bears its own meaning. All s-morphemes in the Knowledge Base (about 15,000 for English) are labeled. The label for a phrase (its ¿Concept Codel7 results as a combination of the labels for the smorphemes constituting it. Algorithms are designed to identify the s-morphemes in a phrase and to generate the phrase¿s Concept Code. The matching procedure compares Concept Codes and identifies conceptually close ones - those sharing a maximal number of Semantic Factors. Similarity is identified here as a match between the Concept Codes of two Text objects. Since a Concept Code is essentially language independent, this technology is appropriate for implementation in cross-language applications. An example is described of an application in the bio-medical domain, where documents of a database of more than 12 million titles are being successfully retrieved in about 50% of the queries normally rejected by traditional search methods.","PeriodicalId":107511,"journal":{"name":"Proceedings Eighth Symposium on String Processing and Information Retrieval","volume":"190 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132747928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书