DDM: Data-Driven Modeling of Physical Phenomenon with Application to METOC

2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI) Pub Date : 2019-07-01 DOI:10.1109/IRI.2019.00057

S. Rubin, Lydia Bouzar-Benlabiod

{"title":"DDM: Data-Driven Modeling of Physical Phenomenon with Application to METOC","authors":"S. Rubin, Lydia Bouzar-Benlabiod","doi":"10.1109/IRI.2019.00057","DOIUrl":null,"url":null,"abstract":"The problem addressed by this paper pertains to the representation, acquisition, and randomization of experiential knowledge for autonomous systems in expert reconnaissance. Such systems are characterized by the requirement to render proper decisions not explicitly programmed for. Cases are defined to consist of domain-specific data (e.g., heterogeneous sensory data), which may not be fully general due to the inclusion of (a) extraneous predicates and/or because (b) the predicates are overly specific. Rules satisfy the definition of cases and result from cases (rules), which have undergone at least one of the aforementioned generalizations. Extraneous antecedent predicates may be discovered from cases (rules) sharing a common consequent, if binary tautologies are found in case (rule) pairings, or if higher tautologies are found in a multiplicity of such cases (rules). Eliminating such extraneous antecedent predicates allows for the discovery of possible additional extraneous antecedent predicates - where the antecedent of one is a proper subset of the other. Candidate rules are formed from the intersection of combinations of two or more case (rule) antecedent sets implying a common consequent. The removed antecedent subsets are acquired as new rules implying the common consequent, which are conditioned to fire by the non-monotonic actions of their common antecedent (i.e., by way of an embedded antecedent predicate) - reducing the specificity of the parents by generalizing them into smaller, more reusable rules. Similarly, more general consequent sequences are formed from common subsequences shared by two or more consequent sequences being non-deterministically implied by a common antecedent. The removed consequent subsequences are acquired as new rules, which are set to fire before or after that of its parent's common dependency - reducing the specificity of the parents by generalizing them into smaller, more reusable rules. The rule to fire first will non-monotonically trigger the rule to fire next. This process iterates, since randomization of one side may enable further randomization of the other side. Tautologies are extracted and common subsets or subsequences form candidate rules as previously described (i.e., without creating duplicate productions). The context for the transformations is provided by the cases (rules), which are effectively acquired as previously described. Knowledge is segmented on the basis of whether it is a case, or a rule. Knowledge is further dynamically segmented on the basis of maximally shared left-hand sides (LHS) and maximally shared right-hand sides (RHS) - using logical pointers to minimize space-time requirements. It is proven that the allowance for non determinism is required, which implies that candidate rules cannot be invalidated by syntactically checking them for contradiction with a known valid dependency. A possibility metric is provided for each production, which cumulatively tracks the similarity of the context and a selected production's situation. Each context may be associated with a minimum possibility metric in order that no production creating a lesser metric may fire. If the application of a production(s) is deemed to be unsuccessful and the production(s) are deemed not to be erroneous, then a correct case is preferentially acquired, where available (i.e., in lieu of rule deletion - by default), which will always fire in lieu of the erroneous production(s) on the given context, since it is more specific, by definition (i.e., using a most-specific-first inference engine). Random and symmetric search are integrated to insure broad coverage of the search space. The (transformed) context may be fuzzily matched to a situation, which it does not cover. Not only does this allow for the generation of questions to confirm the missing predicate information, but provides for the abduction of a possible response as well. Cases (rules) are stored in segments so as to maximize their coherency across parallel processors. Less useful knowledge is expunged in keeping with this policy. Segments or processors are subdivided 2 into local groups to minimize contention on the bus and memory in a massively parallel architecture. Acyclic transformations serve to provide a metaphorical explanation for the inference engine upon request. Finally, examples of the methodology, as applied to METOC modeling, are provided throughout. A high level of intelligence is consistent with the theory of randomization, which is fundamental for the development of truly autonomous systems for use in domains ranging from meteorology to C4ISR.","PeriodicalId":295028,"journal":{"name":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRI.2019.00057","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The problem addressed by this paper pertains to the representation, acquisition, and randomization of experiential knowledge for autonomous systems in expert reconnaissance. Such systems are characterized by the requirement to render proper decisions not explicitly programmed for. Cases are defined to consist of domain-specific data (e.g., heterogeneous sensory data), which may not be fully general due to the inclusion of (a) extraneous predicates and/or because (b) the predicates are overly specific. Rules satisfy the definition of cases and result from cases (rules), which have undergone at least one of the aforementioned generalizations. Extraneous antecedent predicates may be discovered from cases (rules) sharing a common consequent, if binary tautologies are found in case (rule) pairings, or if higher tautologies are found in a multiplicity of such cases (rules). Eliminating such extraneous antecedent predicates allows for the discovery of possible additional extraneous antecedent predicates - where the antecedent of one is a proper subset of the other. Candidate rules are formed from the intersection of combinations of two or more case (rule) antecedent sets implying a common consequent. The removed antecedent subsets are acquired as new rules implying the common consequent, which are conditioned to fire by the non-monotonic actions of their common antecedent (i.e., by way of an embedded antecedent predicate) - reducing the specificity of the parents by generalizing them into smaller, more reusable rules. Similarly, more general consequent sequences are formed from common subsequences shared by two or more consequent sequences being non-deterministically implied by a common antecedent. The removed consequent subsequences are acquired as new rules, which are set to fire before or after that of its parent's common dependency - reducing the specificity of the parents by generalizing them into smaller, more reusable rules. The rule to fire first will non-monotonically trigger the rule to fire next. This process iterates, since randomization of one side may enable further randomization of the other side. Tautologies are extracted and common subsets or subsequences form candidate rules as previously described (i.e., without creating duplicate productions). The context for the transformations is provided by the cases (rules), which are effectively acquired as previously described. Knowledge is segmented on the basis of whether it is a case, or a rule. Knowledge is further dynamically segmented on the basis of maximally shared left-hand sides (LHS) and maximally shared right-hand sides (RHS) - using logical pointers to minimize space-time requirements. It is proven that the allowance for non determinism is required, which implies that candidate rules cannot be invalidated by syntactically checking them for contradiction with a known valid dependency. A possibility metric is provided for each production, which cumulatively tracks the similarity of the context and a selected production's situation. Each context may be associated with a minimum possibility metric in order that no production creating a lesser metric may fire. If the application of a production(s) is deemed to be unsuccessful and the production(s) are deemed not to be erroneous, then a correct case is preferentially acquired, where available (i.e., in lieu of rule deletion - by default), which will always fire in lieu of the erroneous production(s) on the given context, since it is more specific, by definition (i.e., using a most-specific-first inference engine). Random and symmetric search are integrated to insure broad coverage of the search space. The (transformed) context may be fuzzily matched to a situation, which it does not cover. Not only does this allow for the generation of questions to confirm the missing predicate information, but provides for the abduction of a possible response as well. Cases (rules) are stored in segments so as to maximize their coherency across parallel processors. Less useful knowledge is expunged in keeping with this policy. Segments or processors are subdivided 2 into local groups to minimize contention on the bus and memory in a massively parallel architecture. Acyclic transformations serve to provide a metaphorical explanation for the inference engine upon request. Finally, examples of the methodology, as applied to METOC modeling, are provided throughout. A high level of intelligence is consistent with the theory of randomization, which is fundamental for the development of truly autonomous systems for use in domains ranging from meteorology to C4ISR.

查看原文本刊更多论文

物理现象的数据驱动建模及其在METOC中的应用

本文研究的是专家侦察中自主系统经验知识的表示、获取和随机化问题。这类系统的特点是需要提供适当的决策，而不是明确编程。案例被定义为由特定领域的数据(例如，异构感官数据)组成，由于包含(a)无关的谓词和/或(b)谓词过于具体，这些数据可能不是完全通用的。规则满足案例的定义并由案例(规则)产生，这些案例(规则)至少经历了上述概括中的一种。如果在情况(规则)配对中发现二元重言式，或者在这种情况(规则)的多重中发现更高的重言式，则可以从共享共同结果的情况(规则)中发现无关的先行谓词。消除这样的外部先行谓词允许发现可能的其他外部先行谓词-其中一个先行谓词是另一个的适当子集。候选规则是由两个或更多的情况(规则)前词集合暗示一个共同的结果的组合的交集形成的。移除的前因子集作为新规则获得，暗示了共同的结果，这些结果是由它们共同前因子的非单调动作(即，通过嵌入的前因子谓词的方式)触发的——通过将它们概括为更小、更可重用的规则来降低父规则的特异性。类似地，更一般的结果序列是由两个或多个结果序列共享的共同子序列形成的，这些结果序列不确定地隐含在一个共同的前词中。删除的后续子序列作为新规则获得，这些新规则被设置为在其父类的公共依赖项之前或之后触发——通过将父类一般化为更小、更可重用的规则，降低了其父类的特异性。首先触发的规则将非单调地触发下一个触发的规则。这个过程是迭代的，因为一方的随机化可以使另一方的进一步随机化。重言式被提取出来，公共子集或子序列形成候选规则，如前所述(即，不创建重复的结果)。转换的上下文由案例(规则)提供，这些案例(规则)可以像前面描述的那样有效地获得。知识是根据它是案例还是规则来分割的。在最大共享左侧(LHS)和最大共享右侧(RHS)的基础上，使用逻辑指针进一步对知识进行动态分割，以最小化时空要求。它证明了需要允许非确定性，这意味着候选规则不能通过语法检查它们与已知有效依赖项的矛盾而无效。为每个生产提供了一个可能性度量，它可以累积跟踪环境的相似性和选定的生产情况。每个上下文都可以与最小可能性度量相关联，以便不会产生创建较小度量的产品。如果结果的应用被认为是不成功的，并且结果没有错误，那么在可用的情况下优先获得正确的情况(即，代替规则删除-默认情况下)，它将总是在给定上下文中代替错误的结果，因为它更具体，根据定义(即使用最具体优先的推理引擎)。将随机搜索与对称搜索相结合，保证了搜索空间的广泛覆盖。(转换后的)上下文可能会模糊地匹配到它没有涵盖的情况。这不仅允许生成问题来确认缺失的谓词信息，而且还提供了可能的响应的派生。案例(规则)存储在段中，以便最大限度地提高并行处理器之间的一致性。根据这一政策，不太有用的知识被删除。在大规模并行架构中，段或处理器被细分为本地组，以最小化总线和内存上的争用。非循环转换用于根据请求为推理引擎提供隐喻解释。最后，本文提供了应用于METOC建模的方法示例。高水平的智能与随机化理论是一致的，这是开发真正自主系统的基础，用于从气象学到C4ISR等领域。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI)

自引率

0.00%

发文量