ASELMAR: Active and semi-supervised learning-based framework to reduce multi-labeling efforts for activity recognition

IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Aydin Saribudak , Sifan Yuan , Chenyang Gao , Waverly V. Gestrich-Thompson , Zachary P. Milestone , Randall S. Burd , Ivan Marsic
{"title":"ASELMAR: Active and semi-supervised learning-based framework to reduce multi-labeling efforts for activity recognition","authors":"Aydin Saribudak ,&nbsp;Sifan Yuan ,&nbsp;Chenyang Gao ,&nbsp;Waverly V. Gestrich-Thompson ,&nbsp;Zachary P. Milestone ,&nbsp;Randall S. Burd ,&nbsp;Ivan Marsic","doi":"10.1016/j.cviu.2024.104269","DOIUrl":null,"url":null,"abstract":"<div><div>Manual annotation of unlabeled data for model training is expensive and time-consuming, especially for visual datasets requiring domain-specific experience for multi-labeling, such as video records generated in hospital settings. There is a need to build frameworks to reduce human labeling efforts while improving training performance. Semi-supervised learning is widely used to generate predictions for unlabeled samples in a partially labeled datasets. Active learning can be used with semi-supervised learning to annotate unlabeled samples to reduce the sampling bias due to the label predictions. We developed the <span>aselmar</span> framework based on active and semi-supervised learning techniques to reduce the time and effort associated with multi-labeling of unlabeled samples for activity recognition. <span>aselmar</span> (i) categorizes the predictions for unlabeled data based on the confidence level in predictions using fixed and adaptive threshold settings, (ii) applies a label verification procedure for the samples with the ambiguous prediction, and (iii) retrains the model iteratively using samples with their high-confidence predictions or manual annotations. We also designed a software tool to guide domain experts in verifying ambiguous predictions. We applied <span>aselmar</span> to recognize eight selected activities from our trauma resuscitation video dataset and evaluated their performance based on the label verification time and the mean <span>ap</span> score metric. The label verification required by <span>aselmar</span> was 12.1% of the manual annotation effort for the unlabeled video records. The improvement in the mean <span>ap</span> score was 5.7% for the first iteration and 8.3% for the second iteration with the fixed threshold-based method compared to the baseline model. The p-values were below 0.05 for the target activities. Using an adaptive-threshold method, <span>aselmar</span> achieved a decrease in <span>ap</span> score deviation, implying an improvement in model robustness. For a speech-based case study, the word error rate decreased by 6.2%, and the average transcription factor increased 2.6 times, supporting the broad applicability of ASELMAR in reducing labeling efforts from domain experts.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"251 ","pages":"Article 104269"},"PeriodicalIF":4.3000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Vision and Image Understanding","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1077314224003503","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Manual annotation of unlabeled data for model training is expensive and time-consuming, especially for visual datasets requiring domain-specific experience for multi-labeling, such as video records generated in hospital settings. There is a need to build frameworks to reduce human labeling efforts while improving training performance. Semi-supervised learning is widely used to generate predictions for unlabeled samples in a partially labeled datasets. Active learning can be used with semi-supervised learning to annotate unlabeled samples to reduce the sampling bias due to the label predictions. We developed the aselmar framework based on active and semi-supervised learning techniques to reduce the time and effort associated with multi-labeling of unlabeled samples for activity recognition. aselmar (i) categorizes the predictions for unlabeled data based on the confidence level in predictions using fixed and adaptive threshold settings, (ii) applies a label verification procedure for the samples with the ambiguous prediction, and (iii) retrains the model iteratively using samples with their high-confidence predictions or manual annotations. We also designed a software tool to guide domain experts in verifying ambiguous predictions. We applied aselmar to recognize eight selected activities from our trauma resuscitation video dataset and evaluated their performance based on the label verification time and the mean ap score metric. The label verification required by aselmar was 12.1% of the manual annotation effort for the unlabeled video records. The improvement in the mean ap score was 5.7% for the first iteration and 8.3% for the second iteration with the fixed threshold-based method compared to the baseline model. The p-values were below 0.05 for the target activities. Using an adaptive-threshold method, aselmar achieved a decrease in ap score deviation, implying an improvement in model robustness. For a speech-based case study, the word error rate decreased by 6.2%, and the average transcription factor increased 2.6 times, supporting the broad applicability of ASELMAR in reducing labeling efforts from domain experts.
ASELMAR:基于主动和半监督学习的框架,以减少活动识别的多标签工作
手工标注未标记数据用于模型训练既昂贵又耗时,特别是对于需要特定领域经验才能进行多标记的可视化数据集,例如在医院环境中生成的视频记录。有必要建立框架,以减少人工标签的努力,同时提高培训绩效。半监督学习被广泛用于对部分标记数据集中的未标记样本进行预测。主动学习可以与半监督学习一起用于标注未标记的样本,以减少由于标签预测而导致的抽样偏差。我们开发了基于主动和半监督学习技术的aselmar框架,以减少与活动识别中未标记样本的多次标记相关的时间和精力。Aselmar (i)使用固定和自适应阈值设置,根据预测中的置信度对未标记数据的预测进行分类,(ii)对具有模糊预测的样本应用标签验证程序,以及(iii)使用具有高置信度预测或手动注释的样本迭代地重新训练模型。我们还设计了一个软件工具来指导领域专家验证模糊预测。我们应用aselmar从我们的创伤复苏视频数据集中识别8个选定的活动,并根据标签验证时间和平均ap评分指标评估它们的表现。aselmar要求的标签验证是未标记视频记录手工注释工作量的12.1%。与基线模型相比,基于固定阈值方法的第一次迭代平均ap评分提高5.7%,第二次迭代平均ap评分提高8.3%。目标活性的p值小于0.05。使用自适应阈值方法,aselmar实现了ap分数偏差的减少,这意味着模型鲁棒性的提高。对于基于语音的案例研究,单词错误率降低了6.2%,平均转录因子增加了2.6倍,支持ASELMAR在减少领域专家标记工作方面的广泛适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computer Vision and Image Understanding
Computer Vision and Image Understanding 工程技术-工程:电子与电气
CiteScore
7.80
自引率
4.40%
发文量
112
审稿时长
79 days
期刊介绍: The central focus of this journal is the computer analysis of pictorial information. Computer Vision and Image Understanding publishes papers covering all aspects of image analysis from the low-level, iconic processes of early vision to the high-level, symbolic processes of recognition and interpretation. A wide range of topics in the image understanding area is covered, including papers offering insights that differ from predominant views. Research Areas Include: • Theory • Early vision • Data structures and representations • Shape • Range • Motion • Matching and recognition • Architecture and languages • Vision systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信