{"title":"The impact of state merging on predictive accuracy in probabilistic tree automata: Dietze's conjecture revisited","authors":"","doi":"10.1016/j.jcss.2024.103563","DOIUrl":null,"url":null,"abstract":"<div><p>Dietze's conjecture concerns the problem of equipping a tree automaton <em>M</em> with weights to make it probabilistic, in such a way that the resulting automaton <em>N</em> predicts a given corpus <span><math><mi>C</mi></math></span> as accurately as possible. The conjecture states that the accuracy cannot increase if the states in <em>M</em> are merged with respect to an equivalence relation ∼ so that the result is a smaller automaton <span><math><msup><mrow><mi>M</mi></mrow><mrow><mo>∼</mo></mrow></msup></math></span>. Put differently, merging states can never improve predictions. This is under the assumption that both <em>M</em> and <span><math><msup><mrow><mi>M</mi></mrow><mrow><mo>∼</mo></mrow></msup></math></span> are bottom-up deterministic and accept every tree in <span><math><mi>C</mi></math></span>. We prove that the conjecture holds, using a construction that turns any probabilistic version <span><math><msup><mrow><mi>N</mi></mrow><mrow><mo>∼</mo></mrow></msup></math></span> of <span><math><msup><mrow><mi>M</mi></mrow><mrow><mo>∼</mo></mrow></msup></math></span> into a probabilistic version <em>N</em> of <em>M</em>, such that <em>N</em> assigns at least as great a weight to each tree in <span><math><mi>C</mi></math></span> as <span><math><msup><mrow><mi>N</mi></mrow><mrow><mo>∼</mo></mrow></msup></math></span> does.</p></div>","PeriodicalId":50224,"journal":{"name":"Journal of Computer and System Sciences","volume":"146 ","pages":"Article 103563"},"PeriodicalIF":1.1000,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0022000024000588/pdfft?md5=9e1c1d599bbcfc040fd29b857c6c21e8&pid=1-s2.0-S0022000024000588-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computer and System Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022000024000588","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BUSINESS, FINANCE","Score":null,"Total":0}
引用次数: 0
Abstract
Dietze's conjecture concerns the problem of equipping a tree automaton M with weights to make it probabilistic, in such a way that the resulting automaton N predicts a given corpus as accurately as possible. The conjecture states that the accuracy cannot increase if the states in M are merged with respect to an equivalence relation ∼ so that the result is a smaller automaton . Put differently, merging states can never improve predictions. This is under the assumption that both M and are bottom-up deterministic and accept every tree in . We prove that the conjecture holds, using a construction that turns any probabilistic version of into a probabilistic version N of M, such that N assigns at least as great a weight to each tree in as does.
迪茨猜想涉及的问题是为树状自动机 M 添加权重,使其具有概率性,从而使由此产生的自动机 N 能尽可能准确地预测给定语料 C。该猜想指出,如果根据等价关系 ∼ 合并 M 中的状态,从而得到一个更小的自动机 M∼,那么准确度就不会提高。换句话说,合并状态永远无法改善预测结果。我们使用一种构造证明猜想成立,这种构造将 M∼ 的任何概率版本 N 转变成 M 的概率版本 N,使得 N 对 C 中每棵树赋予的权重至少与 N 一样大。
期刊介绍:
The Journal of Computer and System Sciences publishes original research papers in computer science and related subjects in system science, with attention to the relevant mathematical theory. Applications-oriented papers may also be accepted and they are expected to contain deep analytic evaluation of the proposed solutions.
Research areas include traditional subjects such as:
• Theory of algorithms and computability
• Formal languages
• Automata theory
Contemporary subjects such as:
• Complexity theory
• Algorithmic Complexity
• Parallel & distributed computing
• Computer networks
• Neural networks
• Computational learning theory
• Database theory & practice
• Computer modeling of complex systems
• Security and Privacy.