Explainable activity recognition in videos: Lessons learned

Applied AI letters Pub Date : 2021-11-26 DOI:10.1002/ail2.59

Chiradeep Roy, Mahsan Nourani, Donald R. Honeycutt, Jeremy E. Block, Tahrima Rahman, Eric D. Ragan, Nicholas Ruozzi, Vibhav Gogate

{"title":"Explainable activity recognition in videos: Lessons learned","authors":"Chiradeep Roy, Mahsan Nourani, Donald R. Honeycutt, Jeremy E. Block, Tahrima Rahman, Eric D. Ragan, Nicholas Ruozzi, Vibhav Gogate","doi":"10.1002/ail2.59","DOIUrl":null,"url":null,"abstract":"<p>We consider the following activity recognition task: given a video, infer the set of activities being performed in the video and assign each frame to an activity. This task can be solved using modern deep learning architectures based on neural networks or conventional classifiers such as linear models and decision trees. While neural networks exhibit superior predictive performance as compared with decision trees and linear models, they are also uninterpretable and less explainable. We address this <i>accuracy-explanability gap</i> using a novel framework that feeds the output of a deep neural network to an interpretable, tractable probabilistic model called dynamic cutset networks, and performs joint reasoning over the two to answer questions. The neural network helps achieve high accuracy while dynamic cutset networks because of their polytime probabilistic reasoning capabilities make the system more explainable. We demonstrate the efficacy of our approach by using it to build three prototype systems that solve human-machine tasks having varying levels of difficulty using cooking videos as an accessible domain. We describe high-level technical details and key lessons learned in our human subjects evaluations of these systems.</p>","PeriodicalId":72253,"journal":{"name":"Applied AI letters","volume":"2 4","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/ail2.59","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied AI letters","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/ail2.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

We consider the following activity recognition task: given a video, infer the set of activities being performed in the video and assign each frame to an activity. This task can be solved using modern deep learning architectures based on neural networks or conventional classifiers such as linear models and decision trees. While neural networks exhibit superior predictive performance as compared with decision trees and linear models, they are also uninterpretable and less explainable. We address this accuracy-explanability gap using a novel framework that feeds the output of a deep neural network to an interpretable, tractable probabilistic model called dynamic cutset networks, and performs joint reasoning over the two to answer questions. The neural network helps achieve high accuracy while dynamic cutset networks because of their polytime probabilistic reasoning capabilities make the system more explainable. We demonstrate the efficacy of our approach by using it to build three prototype systems that solve human-machine tasks having varying levels of difficulty using cooking videos as an accessible domain. We describe high-level technical details and key lessons learned in our human subjects evaluations of these systems.

Abstract Image

查看原文本刊更多论文

视频中可解释的活动识别:经验教训

我们考虑以下活动识别任务:给定一个视频，推断视频中正在执行的活动集，并将每一帧分配给一个活动。这个任务可以使用基于神经网络或传统分类器(如线性模型和决策树)的现代深度学习架构来解决。虽然与决策树和线性模型相比，神经网络表现出优越的预测性能，但它们也是不可解释和不可解释的。我们使用一种新的框架来解决这种准确性和可解释性之间的差距，该框架将深度神经网络的输出提供给一个可解释的、可处理的概率模型，称为动态割集网络，并对两者进行联合推理以回答问题。神经网络有助于实现高精度，而动态割集网络由于其多时概率推理能力使系统更具可解释性。我们通过使用它来构建三个原型系统来证明我们的方法的有效性，这些系统可以解决具有不同难度的人机任务，并将烹饪视频作为可访问域。我们描述了在这些系统的人类受试者评估中获得的高级技术细节和关键经验教训。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied AI letters

自引率

0.00%

发文量