小字符串集数据结构的性能

Australasian Computer Science Conference Pub Date : 1900-01-01 DOI:10.1145/563857.563812

S. Heinz, J. Zobel

{"title":"小字符串集数据结构的性能","authors":"S. Heinz, J. Zobel","doi":"10.1145/563857.563812","DOIUrl":null,"url":null,"abstract":"Fundamental structures such as trees and hash tables are used for managing data in a huge variety of circumstances. Making the right choice of structure is essential to efficiency. In previous work we have explored the performance of a range of data structures---different forms of trees, tries, and hash tables---for the task of managing sets of millions of strings, and have developed new variants of each that are more efficient for this task than previous alternatives. In this paper we test the performance of the same data structures on small sets of strings, in the context of document processing for index construction. Our results show that the new structures, in particular our burst trie, are the most efficient choice for this task, thus demonstrating that they are suitable for managing sets of hundreds to millions of distinct strings, and for input of hundreds to billions of occurrences.","PeriodicalId":136130,"journal":{"name":"Australasian Computer Science Conference","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Performance of Data Structures for Small Sets of Strings\",\"authors\":\"S. Heinz, J. Zobel\",\"doi\":\"10.1145/563857.563812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Fundamental structures such as trees and hash tables are used for managing data in a huge variety of circumstances. Making the right choice of structure is essential to efficiency. In previous work we have explored the performance of a range of data structures---different forms of trees, tries, and hash tables---for the task of managing sets of millions of strings, and have developed new variants of each that are more efficient for this task than previous alternatives. In this paper we test the performance of the same data structures on small sets of strings, in the context of document processing for index construction. Our results show that the new structures, in particular our burst trie, are the most efficient choice for this task, thus demonstrating that they are suitable for managing sets of hundreds to millions of distinct strings, and for input of hundreds to billions of occurrences.\",\"PeriodicalId\":136130,\"journal\":{\"name\":\"Australasian Computer Science Conference\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Australasian Computer Science Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/563857.563812\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Australasian Computer Science Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/563857.563812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

诸如树和哈希表之类的基本结构用于管理各种情况下的数据。正确选择结构对提高效率至关重要。在之前的工作中，我们探索了一系列数据结构的性能——不同形式的树、尝试和哈希表——用于管理数百万个字符串集的任务，并开发了每种数据结构的新变体，它们比以前的替代方案更有效。在本文中，我们在索引构建的文档处理上下文中测试了相同数据结构在小字符串集上的性能。我们的结果表明，新的结构，特别是我们的burst trie，是这项任务最有效的选择，因此表明它们适用于管理数亿个不同字符串的集合，以及数千亿次出现的输入。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Performance of Data Structures for Small Sets of Strings

Fundamental structures such as trees and hash tables are used for managing data in a huge variety of circumstances. Making the right choice of structure is essential to efficiency. In previous work we have explored the performance of a range of data structures---different forms of trees, tries, and hash tables---for the task of managing sets of millions of strings, and have developed new variants of each that are more efficient for this task than previous alternatives. In this paper we test the performance of the same data structures on small sets of strings, in the context of document processing for index construction. Our results show that the new structures, in particular our burst trie, are the most efficient choice for this task, thus demonstrating that they are suitable for managing sets of hundreds to millions of distinct strings, and for input of hundreds to billions of occurrences.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Australasian Computer Science Conference

自引率

0.00%

发文量