演示NaturalMiner:搜索用自然语言描述的抽象模式的大数据集

Companion of the 2023 International Conference on Management of Data Pub Date : 2023-06-04 DOI:10.1145/3555041.3589694

Immanuel Trummer

{"title":"演示NaturalMiner:搜索用自然语言描述的抽象模式的大数据集","authors":"Immanuel Trummer","doi":"10.1145/3555041.3589694","DOIUrl":null,"url":null,"abstract":"The NaturalMiner system seeks to extract facts from large relational data sets that match abstract patterns defined in natural language. For instance, this enables users to search, with regards to a specific airline, for evidence that \"the airline underperforms\" or \"the airline outperforms'' within a data set containing flight statistics, hinting at areas for improvements or strengths to advertise. Internally, NaturalMiner iteratively generates statistical facts from data by processing SQL queries, selecting facts to generate by a reinforcement learning approach. It uses pre-trained language models to score candidate facts with regards to user-specified search patterns, returning the fact combination with maximal score after a user-specified time budget. To deal with large data sets, NaturalMiner features customized caching and sampling strategies. The proposed demonstration will showcase search for different patterns described in natural language, covering different data sets and scenarios.","PeriodicalId":161812,"journal":{"name":"Companion of the 2023 International Conference on Management of Data","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Demonstrating NaturalMiner: Searching Large Data Sets for Abstract Patterns Described in Natural Language\",\"authors\":\"Immanuel Trummer\",\"doi\":\"10.1145/3555041.3589694\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The NaturalMiner system seeks to extract facts from large relational data sets that match abstract patterns defined in natural language. For instance, this enables users to search, with regards to a specific airline, for evidence that \\\"the airline underperforms\\\" or \\\"the airline outperforms'' within a data set containing flight statistics, hinting at areas for improvements or strengths to advertise. Internally, NaturalMiner iteratively generates statistical facts from data by processing SQL queries, selecting facts to generate by a reinforcement learning approach. It uses pre-trained language models to score candidate facts with regards to user-specified search patterns, returning the fact combination with maximal score after a user-specified time budget. To deal with large data sets, NaturalMiner features customized caching and sampling strategies. The proposed demonstration will showcase search for different patterns described in natural language, covering different data sets and scenarios.\",\"PeriodicalId\":161812,\"journal\":{\"name\":\"Companion of the 2023 International Conference on Management of Data\",\"volume\":\"41 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Companion of the 2023 International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3555041.3589694\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Companion of the 2023 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3555041.3589694","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

NaturalMiner系统试图从大型关系数据集中提取与自然语言定义的抽象模式相匹配的事实。例如，用户可以在包含航班统计数据的数据集中搜索特定航空公司“表现不佳”或“表现优异”的证据，从而提示需要改进的领域或需要宣传的优势。在内部，NaturalMiner通过处理SQL查询迭代地从数据中生成统计事实，选择通过强化学习方法生成的事实。它使用预训练的语言模型根据用户指定的搜索模式对候选事实进行评分，在用户指定的时间预算之后返回具有最大分数的事实组合。为了处理大型数据集，NaturalMiner提供了定制的缓存和采样策略。建议的演示将展示搜索用自然语言描述的不同模式，涵盖不同的数据集和场景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Demonstrating NaturalMiner: Searching Large Data Sets for Abstract Patterns Described in Natural Language

The NaturalMiner system seeks to extract facts from large relational data sets that match abstract patterns defined in natural language. For instance, this enables users to search, with regards to a specific airline, for evidence that "the airline underperforms" or "the airline outperforms'' within a data set containing flight statistics, hinting at areas for improvements or strengths to advertise. Internally, NaturalMiner iteratively generates statistical facts from data by processing SQL queries, selecting facts to generate by a reinforcement learning approach. It uses pre-trained language models to score candidate facts with regards to user-specified search patterns, returning the fact combination with maximal score after a user-specified time budget. To deal with large data sets, NaturalMiner features customized caching and sampling strategies. The proposed demonstration will showcase search for different patterns described in natural language, covering different data sets and scenarios.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Companion of the 2023 International Conference on Management of Data

自引率

0.00%

发文量