PDFA Distillation via String Probability Queries {PDFA Distillation via String Probability Queries}

arXiv - CS - Formal Languages and Automata Theory Pub Date : 2024-06-26 DOI:arxiv-2406.18328

Robert Baumgartner, Sicco Verwer

引用次数: 0

Abstract

Probabilistic deterministic finite automata (PDFA) are discrete event systems modeling conditional probabilities over languages: Given an already seen sequence of tokens they return the probability of tokens of interest to appear next. These types of models have gained interest in the domain of explainable machine learning, where they are used as surrogate models for neural networks trained as language models. In this work we present an algorithm to distill PDFA from neural networks. Our algorithm is a derivative of the L# algorithm and capable of learning PDFA from a new type of query, in which the algorithm infers conditional probabilities from the probability of the queried string to occur. We show its effectiveness on a recent public dataset by distilling PDFA from a set of trained neural networks.

查看原文本刊更多论文

通过字符串概率查询进行 PDFA 提炼 {通过字符串概率查询进行 PDFA 提炼}.

概率确定性有限自动机（PDFA）是模拟语言条件概率的离散事件系统：给定一个已经出现过的词组序列，它们会返回下一个出现的相关词组的概率。这些类型的模型在可解释机器学习领域引起了人们的兴趣，它们被用作神经网络受限语言模型的替代模型。在这项工作中，我们提出了一种从神经网络中提炼 PDFA 的算法。我们的算法是 L# 算法的衍生算法，能够从一种新的查询类型中学习 PDFA，在这种查询类型中，算法从查询字符串发生的概率中推导出条件概率。通过从一组训练有素的神经网络中提炼出 PDFA，我们在最近的一个公开数据集上展示了该算法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Formal Languages and Automata Theory

自引率

0.00%

发文量