MLTEing模型:协商、评估和记录模型和系统质量

2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER) Pub Date : 2023-03-03 DOI:10.1109/ICSE-NIER58687.2023.00012

Katherine R. Maffey, Kyle Dotterrer, Jennifer Niemann, Iain J. Cruickshank, G. Lewis, Christian Kästner

{"title":"MLTEing模型:协商、评估和记录模型和系统质量","authors":"Katherine R. Maffey, Kyle Dotterrer, Jennifer Niemann, Iain J. Cruickshank, G. Lewis, Christian Kästner","doi":"10.1109/ICSE-NIER58687.2023.00012","DOIUrl":null,"url":null,"abstract":"Many organizations seek to ensure that machine learning (ML) and artificial intelligence (AI) systems work as intended in production but currently do not have a cohesive methodology in place to do so. To fill this gap, we propose MLTE (Machine Learning Test and Evaluation, colloquially referred to as \"melt\"), a framework and implementation to evaluate ML models and systems. The framework compiles state-of-the-art evaluation techniques into an organizational process for interdisciplinary teams, including model developers, software engineers, system owners, and other stakeholders. MLTE tooling supports this process by providing a domain-specific language that teams can use to express model requirements, an infrastructure to define, generate, and collect ML evaluation metrics, and the means to communicate results.","PeriodicalId":297025,"journal":{"name":"2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities\",\"authors\":\"Katherine R. Maffey, Kyle Dotterrer, Jennifer Niemann, Iain J. Cruickshank, G. Lewis, Christian Kästner\",\"doi\":\"10.1109/ICSE-NIER58687.2023.00012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many organizations seek to ensure that machine learning (ML) and artificial intelligence (AI) systems work as intended in production but currently do not have a cohesive methodology in place to do so. To fill this gap, we propose MLTE (Machine Learning Test and Evaluation, colloquially referred to as \\\"melt\\\"), a framework and implementation to evaluate ML models and systems. The framework compiles state-of-the-art evaluation techniques into an organizational process for interdisciplinary teams, including model developers, software engineers, system owners, and other stakeholders. MLTE tooling supports this process by providing a domain-specific language that teams can use to express model requirements, an infrastructure to define, generate, and collect ML evaluation metrics, and the means to communicate results.\",\"PeriodicalId\":297025,\"journal\":{\"name\":\"2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSE-NIER58687.2023.00012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE-NIER58687.2023.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

许多组织试图确保机器学习(ML)和人工智能(AI)系统在生产中按预期工作，但目前没有一个有凝聚力的方法来做到这一点。为了填补这一空白，我们提出了MLTE(机器学习测试和评估，俗称“melt”)，这是一个评估机器学习模型和系统的框架和实现。框架将最先进的评估技术汇编到跨学科团队的组织过程中，包括模型开发人员、软件工程师、系统所有者和其他涉众。MLTE工具通过提供特定于领域的语言来支持这个过程，团队可以使用这种语言来表达模型需求，提供用于定义、生成和收集ML评估度量的基础结构，以及交流结果的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MLTEing Models: Negotiating, Evaluating, and Documenting Model and System Qualities

Many organizations seek to ensure that machine learning (ML) and artificial intelligence (AI) systems work as intended in production but currently do not have a cohesive methodology in place to do so. To fill this gap, we propose MLTE (Machine Learning Test and Evaluation, colloquially referred to as "melt"), a framework and implementation to evaluate ML models and systems. The framework compiles state-of-the-art evaluation techniques into an organizational process for interdisciplinary teams, including model developers, software engineers, system owners, and other stakeholders. MLTE tooling supports this process by providing a domain-specific language that teams can use to express model requirements, an infrastructure to define, generate, and collect ML evaluation metrics, and the means to communicate results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE/ACM 45th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER)

自引率

0.00%

发文量