Mario Fiorino, Muddasar Naeem, Mario Ciampi, Antonio Coronato
{"title":"确定以指标为导向的危险情况学习方法","authors":"Mario Fiorino, Muddasar Naeem, Mario Ciampi, Antonio Coronato","doi":"10.3390/technologies12070103","DOIUrl":null,"url":null,"abstract":"Artificial intelligence has brought many innovations to our lives. At the same time, it is worth designing robust safety machine learning (ML) algorithms to obtain more benefits from technology. Reinforcement learning (RL) being an important ML method is largely applied in safety-centric scenarios. In such a situation, learning safety constraints are necessary to avoid undesired outcomes. Within the traditional RL paradigm, agents typically focus on identifying states associated with high rewards to maximize its long-term returns. This prioritization can lead to a neglect of potentially hazardous situations. Particularly, the exploration phase can pose significant risks, as it necessitates actions that may have unpredictable consequences. For instance, in autonomous driving applications, an RL agent might discover routes that yield high efficiency but fail to account for sudden hazardous conditions such as sharp turns or pedestrian crossings, potentially leading to catastrophic failures. Ensuring the safety of agents operating in unpredictable environments with potentially catastrophic failure states remains a critical challenge. This paper introduces a novel metric-driven approach aimed at containing risk in RL applications. Central to this approach are two developed indicators: the Hazard Indicator and the Risk Indicator. These metrics are designed to evaluate the safety of an environment by quantifying the likelihood of transitioning from safe states to failure states and assessing the associated risks. The fact that these indicators are characterized by a straightforward implementation, a highly generalizable probabilistic mathematical foundation, and a domain-independent nature makes them particularly interesting. To demonstrate their efficacy, we conducted experiments across various use cases, showcasing the feasibility of our proposed metrics. By enabling RL agents to effectively manage hazardous states, this approach paves the way for a more reliable and readily implementable RL in practical applications.","PeriodicalId":504839,"journal":{"name":"Technologies","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Defining a Metric-Driven Approach for Learning Hazardous Situations\",\"authors\":\"Mario Fiorino, Muddasar Naeem, Mario Ciampi, Antonio Coronato\",\"doi\":\"10.3390/technologies12070103\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial intelligence has brought many innovations to our lives. At the same time, it is worth designing robust safety machine learning (ML) algorithms to obtain more benefits from technology. Reinforcement learning (RL) being an important ML method is largely applied in safety-centric scenarios. In such a situation, learning safety constraints are necessary to avoid undesired outcomes. Within the traditional RL paradigm, agents typically focus on identifying states associated with high rewards to maximize its long-term returns. This prioritization can lead to a neglect of potentially hazardous situations. Particularly, the exploration phase can pose significant risks, as it necessitates actions that may have unpredictable consequences. For instance, in autonomous driving applications, an RL agent might discover routes that yield high efficiency but fail to account for sudden hazardous conditions such as sharp turns or pedestrian crossings, potentially leading to catastrophic failures. Ensuring the safety of agents operating in unpredictable environments with potentially catastrophic failure states remains a critical challenge. This paper introduces a novel metric-driven approach aimed at containing risk in RL applications. Central to this approach are two developed indicators: the Hazard Indicator and the Risk Indicator. These metrics are designed to evaluate the safety of an environment by quantifying the likelihood of transitioning from safe states to failure states and assessing the associated risks. The fact that these indicators are characterized by a straightforward implementation, a highly generalizable probabilistic mathematical foundation, and a domain-independent nature makes them particularly interesting. To demonstrate their efficacy, we conducted experiments across various use cases, showcasing the feasibility of our proposed metrics. By enabling RL agents to effectively manage hazardous states, this approach paves the way for a more reliable and readily implementable RL in practical applications.\",\"PeriodicalId\":504839,\"journal\":{\"name\":\"Technologies\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-07-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/technologies12070103\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/technologies12070103","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
人工智能为我们的生活带来了许多创新。与此同时,为了从技术中获得更多益处,设计强大的安全机器学习(ML)算法也是值得的。强化学习(RL)作为一种重要的 ML 方法,主要应用于以安全为中心的场景。在这种情况下,有必要学习安全约束,以避免出现不期望的结果。在传统的强化学习范例中,代理通常专注于识别与高回报相关的状态,以最大化其长期回报。这种优先顺序可能会导致忽视潜在的危险情况。特别是,探索阶段可能会带来巨大风险,因为它需要采取可能产生不可预测后果的行动。例如,在自动驾驶应用中,RL 代理可能会发现产生高效率的路线,但却没有考虑到急转弯或人行横道等突发危险情况,从而可能导致灾难性故障。确保代理在不可预测的环境中运行的安全性,以及潜在的灾难性故障状态,仍然是一个严峻的挑战。本文介绍了一种新颖的度量驱动方法,旨在控制 RL 应用中的风险。这种方法的核心是两个已开发的指标:危险指标和风险指标。这些指标旨在通过量化从安全状态过渡到失效状态的可能性以及评估相关风险来评价环境的安全性。这些指标的特点是实施简单、具有高度通用性的概率数学基础以及与领域无关的性质,这使得它们特别有趣。为了证明这些指标的有效性,我们在各种使用案例中进行了实验,展示了我们提出的指标的可行性。通过让 RL 代理有效管理危险状态,这种方法为在实际应用中实现更可靠、更易于实施的 RL 铺平了道路。
Defining a Metric-Driven Approach for Learning Hazardous Situations
Artificial intelligence has brought many innovations to our lives. At the same time, it is worth designing robust safety machine learning (ML) algorithms to obtain more benefits from technology. Reinforcement learning (RL) being an important ML method is largely applied in safety-centric scenarios. In such a situation, learning safety constraints are necessary to avoid undesired outcomes. Within the traditional RL paradigm, agents typically focus on identifying states associated with high rewards to maximize its long-term returns. This prioritization can lead to a neglect of potentially hazardous situations. Particularly, the exploration phase can pose significant risks, as it necessitates actions that may have unpredictable consequences. For instance, in autonomous driving applications, an RL agent might discover routes that yield high efficiency but fail to account for sudden hazardous conditions such as sharp turns or pedestrian crossings, potentially leading to catastrophic failures. Ensuring the safety of agents operating in unpredictable environments with potentially catastrophic failure states remains a critical challenge. This paper introduces a novel metric-driven approach aimed at containing risk in RL applications. Central to this approach are two developed indicators: the Hazard Indicator and the Risk Indicator. These metrics are designed to evaluate the safety of an environment by quantifying the likelihood of transitioning from safe states to failure states and assessing the associated risks. The fact that these indicators are characterized by a straightforward implementation, a highly generalizable probabilistic mathematical foundation, and a domain-independent nature makes them particularly interesting. To demonstrate their efficacy, we conducted experiments across various use cases, showcasing the feasibility of our proposed metrics. By enabling RL agents to effectively manage hazardous states, this approach paves the way for a more reliable and readily implementable RL in practical applications.