A multi-level semantic web for hard-to-specify domain concept, Pedestrian, in ML-based software

IF 3.3 3区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

Requirements Engineering Pub Date : 2022-01-08 DOI:10.1007/s00766-021-00366-0

Barzamini, Hamed, Shahzad, Murtuza, Alhoori, Hamed, Rahimi, Mona

{"title":"A multi-level semantic web for hard-to-specify domain concept, Pedestrian, in ML-based software","authors":"Barzamini, Hamed, Shahzad, Murtuza, Alhoori, Hamed, Rahimi, Mona","doi":"10.1007/s00766-021-00366-0","DOIUrl":null,"url":null,"abstract":"Machine Learning (ML) algorithms are widely used in building software-intensive systems, including safety-critical ones. Unlike traditional software components, Machine-Learned Components (MLC)s, software components built using ML algorithms, learn their specifications through generalizing the common features that they find in a limited set of collected examples. While this inductive nature overcomes the limitations of programming hard-to-specify concepts, the same feature becomes problematic for verifying safety in ML-based software systems. One reason is that, due to MLCs data-driven nature, there is often no set of explicitly written and pre-defined specifications, against which the MLC can be verified. In this regard, we propose to partially specify hard-to-specify domain concepts, which MLCs tend to classify, instead of fully relying on their inductive learning ability from arbitrarily-collected datasets. In this paper, we propose a semi-automated approach to construct a multi-level semantic web to partially outline the hard-to-specify, yet crucial, domain concept “pedestrian” in automotive domain. We evaluate the applicability of the generated semantic web in two ways: first, with a reference to the web, we augment a pedestrian dataset for a missing feature, wheelchair, to show training a state-of-the-art ML-based object detector on the augmented dataset improves its accuracy in detecting pedestrians; second, we evaluate the coverage of the generated semantic web based on multiple state-of-the-art pedestrian and human datasets.","PeriodicalId":20912,"journal":{"name":"Requirements Engineering","volume":"116 ","pages":""},"PeriodicalIF":3.3000,"publicationDate":"2022-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Requirements Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s00766-021-00366-0","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 1

Abstract

Machine Learning (ML) algorithms are widely used in building software-intensive systems, including safety-critical ones. Unlike traditional software components, Machine-Learned Components (MLC)s, software components built using ML algorithms, learn their specifications through generalizing the common features that they find in a limited set of collected examples. While this inductive nature overcomes the limitations of programming hard-to-specify concepts, the same feature becomes problematic for verifying safety in ML-based software systems. One reason is that, due to MLCs data-driven nature, there is often no set of explicitly written and pre-defined specifications, against which the MLC can be verified. In this regard, we propose to partially specify hard-to-specify domain concepts, which MLCs tend to classify, instead of fully relying on their inductive learning ability from arbitrarily-collected datasets. In this paper, we propose a semi-automated approach to construct a multi-level semantic web to partially outline the hard-to-specify, yet crucial, domain concept “pedestrian” in automotive domain. We evaluate the applicability of the generated semantic web in two ways: first, with a reference to the web, we augment a pedestrian dataset for a missing feature, wheelchair, to show training a state-of-the-art ML-based object detector on the augmented dataset improves its accuracy in detecting pedestrians; second, we evaluate the coverage of the generated semantic web based on multiple state-of-the-art pedestrian and human datasets.

查看原文本刊更多论文

在基于ml的软件中，针对难以指定的领域概念行人的多层次语义网

机器学习(ML)算法广泛用于构建软件密集型系统，包括安全关键系统。与传统的软件组件不同，机器学习组件(MLC)是使用ML算法构建的软件组件，它通过概括在有限的收集示例中发现的共同特征来学习它们的规范。虽然这种归纳性克服了编程中难以指定概念的限制，但在基于ml的软件系统中验证安全性时，同样的特性会成为问题。一个原因是，由于MLC的数据驱动性质，通常没有一组明确编写和预定义的规范，MLC可以根据这些规范进行验证。在这方面，我们建议部分指定难以指定的领域概念，这是mlc倾向于分类的，而不是完全依赖它们从任意收集的数据集中归纳学习的能力。在本文中，我们提出了一种半自动化的方法来构建一个多级语义网，以部分概述汽车领域中难以指定但至关重要的领域概念“行人”。我们通过两种方式评估生成的语义网的适用性:首先，参考web，我们为缺失的特征(轮椅)增强行人数据集，以显示在增强数据集上训练最先进的基于ml的对象检测器可以提高其检测行人的准确性;其次，我们基于多个最先进的行人和人类数据集评估生成的语义网的覆盖范围。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Requirements Engineering 工程技术-计算机：软件工程

CiteScore

7.10

自引率

10.70%

发文量

审稿时长

>12 weeks

期刊介绍： The journal provides a focus for the dissemination of new results about the elicitation, representation and validation of requirements of software intensive information systems or applications. Theoretical and applied submissions are welcome, but all papers must explicitly address: -the practical consequences of the ideas for the design of complex systems -how the ideas should be evaluated by the reflective practitioner The journal is motivated by a multi-disciplinary view that considers requirements not only in terms of software components specification but also in terms of activities for their elicitation, representation and agreement, carried out within an organisational and social context. To this end, contributions are sought from fields such as software engineering, information systems, occupational sociology, cognitive and organisational psychology, human-computer interaction, computer-supported cooperative work, linguistics and philosophy for work addressing specifically requirements engineering issues.