Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar
{"title":"NAVINACT: Combining Navigation and Imitation Learning for Bootstrapping Reinforcement Learning","authors":"Amisha Bhaskar, Zahiruddin Mahammad, Sachin R Jadhav, Pratap Tokekar","doi":"arxiv-2408.04054","DOIUrl":"https://doi.org/arxiv-2408.04054","url":null,"abstract":"Reinforcement Learning (RL) has shown remarkable progress in simulation\u0000environments, yet its application to real-world robotic tasks remains limited\u0000due to challenges in exploration and generalisation. To address these issues,\u0000we introduce NAVINACT, a framework that chooses when the robot should use\u0000classical motion planning-based navigation and when it should learn a policy.\u0000To further improve the efficiency in exploration, we use imitation data to\u0000bootstrap the exploration. NAVINACT dynamically switches between two modes of\u0000operation: navigating to a waypoint using classical techniques when away from\u0000the objects and reinforcement learning for fine-grained manipulation control\u0000when about to interact with objects. NAVINACT consists of a multi-head\u0000architecture composed of ModeNet for mode classification, NavNet for waypoint\u0000prediction, and InteractNet for precise manipulation. By combining the\u0000strengths of RL and Imitation Learning (IL), NAVINACT improves sample\u0000efficiency and mitigates distribution shift, ensuring robust task execution. We\u0000evaluate our approach across multiple challenging simulation environments and\u0000real-world tasks, demonstrating superior performance in terms of adaptability,\u0000efficiency, and generalization compared to existing methods. In both simulated\u0000and real-world settings, NAVINACT demonstrates robust performance. In\u0000simulations, NAVINACT surpasses baseline methods by 10-15% in training success\u0000rates at 30k samples and by 30-40% during evaluation phases. In real-world\u0000scenarios, it demonstrates a 30-40% higher success rate on simpler tasks\u0000compared to baselines and uniquely succeeds in complex, two-stage manipulation\u0000tasks. Datasets and supplementary materials can be found on our website:\u0000{https://raaslab.org/projects/NAVINACT/}.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"56 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomy Phan, Benran Zhang, Shao-Hung Chan, Sven Koenig
{"title":"Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic","authors":"Thomy Phan, Benran Zhang, Shao-Hung Chan, Sven Koenig","doi":"arxiv-2408.02960","DOIUrl":"https://doi.org/arxiv-2408.02960","url":null,"abstract":"Anytime multi-agent path finding (MAPF) is a promising approach to scalable\u0000path optimization in multi-agent systems. MAPF-LNS, based on Large Neighborhood\u0000Search (LNS), is the current state-of-the-art approach where a fast initial\u0000solution is iteratively optimized by destroying and repairing selected paths of\u0000the solution. Current MAPF-LNS variants commonly use an adaptive selection\u0000mechanism to choose among multiple destroy heuristics. However, to determine\u0000promising destroy heuristics, MAPF-LNS requires a considerable amount of\u0000exploration time. As common destroy heuristics are non-adaptive, any\u0000performance bottleneck caused by these heuristics cannot be overcome via\u0000adaptive heuristic selection alone, thus limiting the overall effectiveness of\u0000MAPF-LNS in terms of solution cost. In this paper, we propose Adaptive\u0000Delay-based Destroy-and-Repair Enhanced with Success-based Self-Learning\u0000(ADDRESS) as a single-destroy-heuristic variant of MAPF-LNS. ADDRESS applies\u0000restricted Thompson Sampling to the top-K set of the most delayed agents to\u0000select a seed agent for adaptive LNS neighborhood generation. We evaluate\u0000ADDRESS in multiple maps from the MAPF benchmark set and demonstrate cost\u0000improvements by at least 50% in large-scale scenarios with up to a thousand\u0000agents, compared with the original MAPF-LNS and other state-of-the-art methods.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Albert Sawczyn, Katsiaryna Viarenich, Konrad Wojtasik, Aleksandra Domogała, Marcin Oleksy, Maciej Piasecki, Tomasz Kajdanowicz
{"title":"Developing PUGG for Polish: A Modern Approach to KBQA, MRC, and IR Dataset Construction","authors":"Albert Sawczyn, Katsiaryna Viarenich, Konrad Wojtasik, Aleksandra Domogała, Marcin Oleksy, Maciej Piasecki, Tomasz Kajdanowicz","doi":"arxiv-2408.02337","DOIUrl":"https://doi.org/arxiv-2408.02337","url":null,"abstract":"Advancements in AI and natural language processing have revolutionized\u0000machine-human language interactions, with question answering (QA) systems\u0000playing a pivotal role. The knowledge base question answering (KBQA) task,\u0000utilizing structured knowledge graphs (KG), allows for handling extensive\u0000knowledge-intensive questions. However, a significant gap exists in KBQA\u0000datasets, especially for low-resource languages. Many existing construction\u0000pipelines for these datasets are outdated and inefficient in human labor, and\u0000modern assisting tools like Large Language Models (LLM) are not utilized to\u0000reduce the workload. To address this, we have designed and implemented a\u0000modern, semi-automated approach for creating datasets, encompassing tasks such\u0000as KBQA, Machine Reading Comprehension (MRC), and Information Retrieval (IR),\u0000tailored explicitly for low-resource environments. We executed this pipeline\u0000and introduced the PUGG dataset, the first Polish KBQA dataset, and novel\u0000datasets for MRC and IR. Additionally, we provide a comprehensive\u0000implementation, insightful findings, detailed statistics, and evaluation of\u0000baseline models.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Counterfactual Shapley Values for Explaining Reinforcement Learning","authors":"Yiwei Shi, Qi Zhang, Kevin McAreavey, Weiru Liu","doi":"arxiv-2408.02529","DOIUrl":"https://doi.org/arxiv-2408.02529","url":null,"abstract":"This paper introduces a novel approach Counterfactual Shapley Values (CSV),\u0000which enhances explainability in reinforcement learning (RL) by integrating\u0000counterfactual analysis with Shapley Values. The approach aims to quantify and\u0000compare the contributions of different state dimensions to various action\u0000choices. To more accurately analyze these impacts, we introduce new\u0000characteristic value functions, the ``Counterfactual Difference Characteristic\u0000Value\" and the ``Average Counterfactual Difference Characteristic Value.\" These\u0000functions help calculate the Shapley values to evaluate the differences in\u0000contributions between optimal and non-optimal actions. Experiments across\u0000several RL domains, such as GridWorld, FrozenLake, and Taxi, demonstrate the\u0000effectiveness of the CSV method. The results show that this method not only\u0000improves transparency in complex RL systems but also quantifies the differences\u0000across various decisions.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"191 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perfect Information Monte Carlo with Postponing Reasoning","authors":"Jérôme Arjonilla, Abdallah Saffidine, Tristan Cazenave","doi":"arxiv-2408.02380","DOIUrl":"https://doi.org/arxiv-2408.02380","url":null,"abstract":"Imperfect information games, such as Bridge and Skat, present challenges due\u0000to state-space explosion and hidden information, posing formidable obstacles\u0000for search algorithms. Determinization-based algorithms offer a resolution by\u0000sampling hidden information and solving the game in a perfect information\u0000setting, facilitating rapid and effective action estimation. However,\u0000transitioning to perfect information introduces challenges, notably one called\u0000strategy fusion.This research introduces `Extended Perfect Information Monte\u0000Carlo' (EPIMC), an online algorithm inspired by the state-of-the-art\u0000determinization-based approach Perfect Information Monte Carlo (PIMC). EPIMC\u0000enhances the capabilities of PIMC by postponing the perfect information\u0000resolution, reducing alleviating issues related to strategy fusion. However,\u0000the decision to postpone the leaf evaluator introduces novel considerations,\u0000such as the interplay between prior levels of reasoning and the newly deferred\u0000resolution. In our empirical analysis, we investigate the performance of EPIMC\u0000across a range of games, with a particular focus on those characterized by\u0000varying degrees of strategy fusion. Our results demonstrate notable performance\u0000enhancements, particularly in games where strategy fusion significantly impacts\u0000gameplay. Furthermore, our research contributes to the theoretical foundation\u0000of determinization-based algorithms addressing challenges associated with\u0000strategy fusion.%, thereby enhancing our understanding of these algorithms\u0000within the context of imperfect information game scenarios.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"15 Suppl 1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sahra Ghalebikesabi, Eugene Bagdasaryan, Ren Yi, Itay Yona, Ilia Shumailov, Aneesh Pappu, Chongyang Shi, Laura Weidinger, Robert Stanforth, Leonard Berrada, Pushmeet Kohli, Po-Sen Huang, Borja Balle
{"title":"Operationalizing Contextual Integrity in Privacy-Conscious Assistants","authors":"Sahra Ghalebikesabi, Eugene Bagdasaryan, Ren Yi, Itay Yona, Ilia Shumailov, Aneesh Pappu, Chongyang Shi, Laura Weidinger, Robert Stanforth, Leonard Berrada, Pushmeet Kohli, Po-Sen Huang, Borja Balle","doi":"arxiv-2408.02373","DOIUrl":"https://doi.org/arxiv-2408.02373","url":null,"abstract":"Advanced AI assistants combine frontier LLMs and tool access to autonomously\u0000perform complex tasks on behalf of users. While the helpfulness of such\u0000assistants can increase dramatically with access to user information including\u0000emails and documents, this raises privacy concerns about assistants sharing\u0000inappropriate information with third parties without user supervision. To steer\u0000information-sharing assistants to behave in accordance with privacy\u0000expectations, we propose to operationalize $textit{contextual integrity}$\u0000(CI), a framework that equates privacy with the appropriate flow of information\u0000in a given context. In particular, we design and evaluate a number of\u0000strategies to steer assistants' information-sharing actions to be CI compliant.\u0000Our evaluation is based on a novel form filling benchmark composed of synthetic\u0000data and human annotations, and it reveals that prompting frontier LLMs to\u0000perform CI-based reasoning yields strong results.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"32 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141949512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Development of REGAI: Rubric Enabled Generative Artificial Intelligence","authors":"Zach Johnson, Jeremy Straub","doi":"arxiv-2408.02811","DOIUrl":"https://doi.org/arxiv-2408.02811","url":null,"abstract":"This paper presents and evaluates a new retrieval augmented generation (RAG)\u0000and large language model (LLM)-based artificial intelligence (AI) technique:\u0000rubric enabled generative artificial intelligence (REGAI). REGAI uses rubrics,\u0000which can be created manually or automatically by the system, to enhance the\u0000performance of LLMs for evaluation purposes. REGAI improves on the performance\u0000of both classical LLMs and RAG-based LLM techniques. This paper describes\u0000REGAI, presents data regarding its performance and discusses several possible\u0000application areas for the technology.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SR-CIS: Self-Reflective Incremental System with Decoupled Memory and Reasoning","authors":"Biqing Qi, Junqi Gao, Xinquan Chen, Dong Li, Weinan Zhang, Bowen Zhou","doi":"arxiv-2408.01970","DOIUrl":"https://doi.org/arxiv-2408.01970","url":null,"abstract":"The ability of humans to rapidly learn new knowledge while retaining old\u0000memories poses a significant challenge for current deep learning models. To\u0000handle this challenge, we draw inspiration from human memory and learning\u0000mechanisms and propose the Self-Reflective Complementary Incremental System\u0000(SR-CIS). Comprising the deconstructed Complementary Inference Module (CIM) and\u0000Complementary Memory Module (CMM), SR-CIS features a small model for fast\u0000inference and a large model for slow deliberation in CIM, enabled by the\u0000Confidence-Aware Online Anomaly Detection (CA-OAD) mechanism for efficient\u0000collaboration. CMM consists of task-specific Short-Term Memory (STM) region and\u0000a universal Long-Term Memory (LTM) region. By setting task-specific Low-Rank\u0000Adaptive (LoRA) and corresponding prototype weights and biases, it instantiates\u0000external storage for parameter and representation memory, thus deconstructing\u0000the memory module from the inference module. By storing textual descriptions of\u0000images during training and combining them with the Scenario Replay Module (SRM)\u0000post-training for memory combination, along with periodic short-to-long-term\u0000memory restructuring, SR-CIS achieves stable incremental memory with limited\u0000storage requirements. Balancing model plasticity and memory stability under\u0000constraints of limited storage and low data resources, SR-CIS surpasses\u0000existing competitive baselines on multiple standard and few-shot incremental\u0000learning benchmarks.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Visual Grounding for Object-Level Generalization in Reinforcement Learning","authors":"Haobin Jiang, Zongqing Lu","doi":"arxiv-2408.01942","DOIUrl":"https://doi.org/arxiv-2408.01942","url":null,"abstract":"Generalization is a pivotal challenge for agents following natural language\u0000instructions. To approach this goal, we leverage a vision-language model (VLM)\u0000for visual grounding and transfer its vision-language knowledge into\u0000reinforcement learning (RL) for object-centric tasks, which makes the agent\u0000capable of zero-shot generalization to unseen objects and instructions. By\u0000visual grounding, we obtain an object-grounded confidence map for the target\u0000object indicated in the instruction. Based on this map, we introduce two routes\u0000to transfer VLM knowledge into RL. Firstly, we propose an object-grounded\u0000intrinsic reward function derived from the confidence map to more effectively\u0000guide the agent towards the target object. Secondly, the confidence map offers\u0000a more unified, accessible task representation for the agent's policy, compared\u0000to language embeddings. This enables the agent to process unseen objects and\u0000instructions through comprehensible visual confidence maps, facilitating\u0000zero-shot object-level generalization. Single-task experiments prove that our\u0000intrinsic reward significantly improves performance on challenging skill\u0000learning. In multi-task experiments, through testing on tasks beyond the\u0000training set, we show that the agent, when provided with the confidence map as\u0000the task representation, possesses better generalization capabilities than\u0000language-based conditioning. The code is available at\u0000https://github.com/PKU-RL/COPL.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antonio De Santis, Marco Balduini, Federico De Santis, Andrea Proia, Arsenio Leo, Marco Brambilla, Emanuele Della Valle
{"title":"Integrating Large Language Models and Knowledge Graphs for Extraction and Validation of Textual Test Data","authors":"Antonio De Santis, Marco Balduini, Federico De Santis, Andrea Proia, Arsenio Leo, Marco Brambilla, Emanuele Della Valle","doi":"arxiv-2408.01700","DOIUrl":"https://doi.org/arxiv-2408.01700","url":null,"abstract":"Aerospace manufacturing companies, such as Thales Alenia Space, design,\u0000develop, integrate, verify, and validate products characterized by high\u0000complexity and low volume. They carefully document all phases for each product\u0000but analyses across products are challenging due to the heterogeneity and\u0000unstructured nature of the data in documents. In this paper, we propose a\u0000hybrid methodology that leverages Knowledge Graphs (KGs) in conjunction with\u0000Large Language Models (LLMs) to extract and validate data contained in these\u0000documents. We consider a case study focused on test data related to electronic\u0000boards for satellites. To do so, we extend the Semantic Sensor Network\u0000ontology. We store the metadata of the reports in a KG, while the actual test\u0000results are stored in parquet accessible via a Virtual Knowledge Graph. The\u0000validation process is managed using an LLM-based approach. We also conduct a\u0000benchmarking study to evaluate the performance of state-of-the-art LLMs in\u0000executing this task. Finally, we analyze the costs and benefits of automating\u0000preexisting processes of manual data extraction and validation for subsequent\u0000cross-report analyses.","PeriodicalId":501479,"journal":{"name":"arXiv - CS - Artificial Intelligence","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141930655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}