能效用户行为检测的EM-LDA模型

Z. Zhao, Weisheng Xu, D. Chen
{"title":"能效用户行为检测的EM-LDA模型","authors":"Z. Zhao, Weisheng Xu, D. Chen","doi":"10.1109/ICSSE.2014.6887952","DOIUrl":null,"url":null,"abstract":"In energy efficient analysis, user behavior detection related to the dynamic demands of energy is a critical aspect to support the intelligent control schema of Building Management System. In this paper, anomalous occupancy of user behavior tends to be figured out from multiple time-series of occupancy record. The problems in this issue include the time-stamp detection and time-span identification of anomaly events. Most inference model based on Markov Chain can illustrate the time-stamp detection problem reasonably, but the time-span identification problem is just vaguely explained. Therefore, a Latent Dirichlet Allocation (LDA) model is declared to figure out those two problems efficiently. First, the discrete data of occupancy are expressed as mixture model of Poisson distribution, and are transformed to a dataset with several semantic concepts via Expectation-Maximization Algorithm. Then, the denotation of LDA components (including the words, the topic, the document, and the relevant parameters and hyper-parameters) are illustrated, according to the semantic dataset. Finally, particle filter algorithm is leveraged to sample latent variable of topic, according to the conditional posterior probability of word for specific topic. After iterations, the probability of samples is closely approximated the true marginal distribution of words with specific topic. Through the relation matrix of words and topic, the most possible topic can be explained for the specific document. If a document's topic is different with other document's topic, this document can be identified as a bias of point anomaly (noting generally the amount of topics setup to two). Due to a word can involve several time-stamps of the time-series in a time, other contextual anomalies nearby the point anomaly can be marked, and they are the notation of time-spans for anomalous events. With a step by step along the time-series, all time-stamps can be ergodic as the documents, then all the contextual anomalies can be explained as following the happening of point anomalous event.","PeriodicalId":166215,"journal":{"name":"2014 IEEE International Conference on System Science and Engineering (ICSSE)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"EM-LDA model of user behavior detection for energy efficiency\",\"authors\":\"Z. Zhao, Weisheng Xu, D. Chen\",\"doi\":\"10.1109/ICSSE.2014.6887952\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In energy efficient analysis, user behavior detection related to the dynamic demands of energy is a critical aspect to support the intelligent control schema of Building Management System. In this paper, anomalous occupancy of user behavior tends to be figured out from multiple time-series of occupancy record. The problems in this issue include the time-stamp detection and time-span identification of anomaly events. Most inference model based on Markov Chain can illustrate the time-stamp detection problem reasonably, but the time-span identification problem is just vaguely explained. Therefore, a Latent Dirichlet Allocation (LDA) model is declared to figure out those two problems efficiently. First, the discrete data of occupancy are expressed as mixture model of Poisson distribution, and are transformed to a dataset with several semantic concepts via Expectation-Maximization Algorithm. Then, the denotation of LDA components (including the words, the topic, the document, and the relevant parameters and hyper-parameters) are illustrated, according to the semantic dataset. Finally, particle filter algorithm is leveraged to sample latent variable of topic, according to the conditional posterior probability of word for specific topic. After iterations, the probability of samples is closely approximated the true marginal distribution of words with specific topic. Through the relation matrix of words and topic, the most possible topic can be explained for the specific document. If a document's topic is different with other document's topic, this document can be identified as a bias of point anomaly (noting generally the amount of topics setup to two). Due to a word can involve several time-stamps of the time-series in a time, other contextual anomalies nearby the point anomaly can be marked, and they are the notation of time-spans for anomalous events. With a step by step along the time-series, all time-stamps can be ergodic as the documents, then all the contextual anomalies can be explained as following the happening of point anomalous event.\",\"PeriodicalId\":166215,\"journal\":{\"name\":\"2014 IEEE International Conference on System Science and Engineering (ICSSE)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Conference on System Science and Engineering (ICSSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSSE.2014.6887952\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Conference on System Science and Engineering (ICSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSSE.2014.6887952","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

在节能分析中,与能源动态需求相关的用户行为检测是支持楼宇管理系统智能控制方案的一个重要方面。本文倾向于从占用记录的多个时间序列中找出用户行为的异常占用。本文研究的问题包括异常事件的时间戳检测和时间跨度识别。大多数基于马尔可夫链的推理模型都能较好地说明时间戳检测问题,但对时间跨度识别问题的解释却很模糊。因此,提出了一种潜在狄利克雷分配(Latent Dirichlet Allocation, LDA)模型来有效地解决这两个问题。首先,将占用率离散数据表示为泊松分布的混合模型,并通过期望最大化算法将其转换为包含多个语义概念的数据集;然后,根据语义数据集说明LDA组件(包括单词、主题、文档以及相关参数和超参数)的表示。最后,利用粒子滤波算法,根据特定主题词的条件后验概率,对主题潜变量进行采样。经过迭代,样本的概率近似于具有特定主题的词的真实边际分布。通过词与主题的关系矩阵,可以为特定文档解释最可能的主题。如果一个文档的主题与其他文档的主题不同,则可以将该文档识别为点异常偏差(通常注意主题设置为两个)。由于一个单词在一个时间内可以包含多个时间序列的时间戳,因此可以标记异常点附近的其他上下文异常,它们是异常事件的时间跨度标记。随着时间序列的逐级递进,所有的时间戳都可以作为文档遍历,那么所有的上下文异常都可以解释为点异常事件的发生。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
EM-LDA model of user behavior detection for energy efficiency
In energy efficient analysis, user behavior detection related to the dynamic demands of energy is a critical aspect to support the intelligent control schema of Building Management System. In this paper, anomalous occupancy of user behavior tends to be figured out from multiple time-series of occupancy record. The problems in this issue include the time-stamp detection and time-span identification of anomaly events. Most inference model based on Markov Chain can illustrate the time-stamp detection problem reasonably, but the time-span identification problem is just vaguely explained. Therefore, a Latent Dirichlet Allocation (LDA) model is declared to figure out those two problems efficiently. First, the discrete data of occupancy are expressed as mixture model of Poisson distribution, and are transformed to a dataset with several semantic concepts via Expectation-Maximization Algorithm. Then, the denotation of LDA components (including the words, the topic, the document, and the relevant parameters and hyper-parameters) are illustrated, according to the semantic dataset. Finally, particle filter algorithm is leveraged to sample latent variable of topic, according to the conditional posterior probability of word for specific topic. After iterations, the probability of samples is closely approximated the true marginal distribution of words with specific topic. Through the relation matrix of words and topic, the most possible topic can be explained for the specific document. If a document's topic is different with other document's topic, this document can be identified as a bias of point anomaly (noting generally the amount of topics setup to two). Due to a word can involve several time-stamps of the time-series in a time, other contextual anomalies nearby the point anomaly can be marked, and they are the notation of time-spans for anomalous events. With a step by step along the time-series, all time-stamps can be ergodic as the documents, then all the contextual anomalies can be explained as following the happening of point anomalous event.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信