{"title":"Demonstrating NaturalMiner: Searching Large Data Sets for Abstract Patterns Described in Natural Language","authors":"Immanuel Trummer","doi":"10.1145/3555041.3589694","DOIUrl":null,"url":null,"abstract":"The NaturalMiner system seeks to extract facts from large relational data sets that match abstract patterns defined in natural language. For instance, this enables users to search, with regards to a specific airline, for evidence that \"the airline underperforms\" or \"the airline outperforms'' within a data set containing flight statistics, hinting at areas for improvements or strengths to advertise. Internally, NaturalMiner iteratively generates statistical facts from data by processing SQL queries, selecting facts to generate by a reinforcement learning approach. It uses pre-trained language models to score candidate facts with regards to user-specified search patterns, returning the fact combination with maximal score after a user-specified time budget. To deal with large data sets, NaturalMiner features customized caching and sampling strategies. The proposed demonstration will showcase search for different patterns described in natural language, covering different data sets and scenarios.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion of the 2023 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555041.3589694","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The NaturalMiner system seeks to extract facts from large relational data sets that match abstract patterns defined in natural language. For instance, this enables users to search, with regards to a specific airline, for evidence that "the airline underperforms" or "the airline outperforms'' within a data set containing flight statistics, hinting at areas for improvements or strengths to advertise. Internally, NaturalMiner iteratively generates statistical facts from data by processing SQL queries, selecting facts to generate by a reinforcement learning approach. It uses pre-trained language models to score candidate facts with regards to user-specified search patterns, returning the fact combination with maximal score after a user-specified time budget. To deal with large data sets, NaturalMiner features customized caching and sampling strategies. The proposed demonstration will showcase search for different patterns described in natural language, covering different data sets and scenarios.