{"title":"查询大型数据库","authors":"Nathan Beneke","doi":"10.61366/2576-2176.1065","DOIUrl":null,"url":null,"abstract":"This paper investigates two approaches to improving query times on large relational databases. The first technique capitalizes on the knowledge of a database’s structures and properties one typically has. This technique can execute some queries exactly in a constant, bounded amount of time. When this technique cannot be used to exactly execute a query we show how it can still be used to drastically lower the run-time on the query while getting a good approximation of the exact result. We also discuss the complexity of deciding whether a query is evaluable in this way, both theoretically and practically. The second approach approximates aggregate queries by incorporating only part of the data, rather than all of the data the query pertains to. We briefly investigate an established method of sampling a random subset of the data, and then a newer method which partially reads every tuple and puts deterministic error bounds on the results.","PeriodicalId":113813,"journal":{"name":"Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Querying Large Databases\",\"authors\":\"Nathan Beneke\",\"doi\":\"10.61366/2576-2176.1065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper investigates two approaches to improving query times on large relational databases. The first technique capitalizes on the knowledge of a database’s structures and properties one typically has. This technique can execute some queries exactly in a constant, bounded amount of time. When this technique cannot be used to exactly execute a query we show how it can still be used to drastically lower the run-time on the query while getting a good approximation of the exact result. We also discuss the complexity of deciding whether a query is evaluable in this way, both theoretically and practically. The second approach approximates aggregate queries by incorporating only part of the data, rather than all of the data the query pertains to. We briefly investigate an established method of sampling a random subset of the data, and then a newer method which partially reads every tuple and puts deterministic error bounds on the results.\",\"PeriodicalId\":113813,\"journal\":{\"name\":\"Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.61366/2576-2176.1065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.61366/2576-2176.1065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
This paper investigates two approaches to improving query times on large relational databases. The first technique capitalizes on the knowledge of a database’s structures and properties one typically has. This technique can execute some queries exactly in a constant, bounded amount of time. When this technique cannot be used to exactly execute a query we show how it can still be used to drastically lower the run-time on the query while getting a good approximation of the exact result. We also discuss the complexity of deciding whether a query is evaluable in this way, both theoretically and practically. The second approach approximates aggregate queries by incorporating only part of the data, rather than all of the data the query pertains to. We briefly investigate an established method of sampling a random subset of the data, and then a newer method which partially reads every tuple and puts deterministic error bounds on the results.