{"title":"使用随机漫步从混合属性数据集生成随机向量","authors":"A. Skabar","doi":"10.1109/WSC.2016.7822168","DOIUrl":null,"url":null,"abstract":"Given data in a matrix X in which rows represent vectors and columns comprise a mix of discrete and continuous variables, the method presented in this paper can be used to generate random vectors whose elements display the same marginal distributions and correlations as the variables in X. The data is represented as a bipartite graph consisting of object nodes (representing vectors) and attribute value nodes. Random walk can be used to estimate the distribution of a target variable conditioned on the remaining variables, allowing a random value to be drawn for that variable. This leads to the use of Gibbs sampling to generate entire vectors. Unlike conventional methods, the proposed method requires neither the joint distribution nor the correlations to be specified, learned, or modeled explicitly in any way. Application to the Australian Credit dataset demonstrates the feasibility of the approach in generating random vectors on challenging real-world datasets.","PeriodicalId":367269,"journal":{"name":"2016 Winter Simulation Conference (WSC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Random vector generation from mixed-attribute datasets using random walk\",\"authors\":\"A. Skabar\",\"doi\":\"10.1109/WSC.2016.7822168\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Given data in a matrix X in which rows represent vectors and columns comprise a mix of discrete and continuous variables, the method presented in this paper can be used to generate random vectors whose elements display the same marginal distributions and correlations as the variables in X. The data is represented as a bipartite graph consisting of object nodes (representing vectors) and attribute value nodes. Random walk can be used to estimate the distribution of a target variable conditioned on the remaining variables, allowing a random value to be drawn for that variable. This leads to the use of Gibbs sampling to generate entire vectors. Unlike conventional methods, the proposed method requires neither the joint distribution nor the correlations to be specified, learned, or modeled explicitly in any way. Application to the Australian Credit dataset demonstrates the feasibility of the approach in generating random vectors on challenging real-world datasets.\",\"PeriodicalId\":367269,\"journal\":{\"name\":\"2016 Winter Simulation Conference (WSC)\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Winter Simulation Conference (WSC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WSC.2016.7822168\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Winter Simulation Conference (WSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WSC.2016.7822168","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Random vector generation from mixed-attribute datasets using random walk
Given data in a matrix X in which rows represent vectors and columns comprise a mix of discrete and continuous variables, the method presented in this paper can be used to generate random vectors whose elements display the same marginal distributions and correlations as the variables in X. The data is represented as a bipartite graph consisting of object nodes (representing vectors) and attribute value nodes. Random walk can be used to estimate the distribution of a target variable conditioned on the remaining variables, allowing a random value to be drawn for that variable. This leads to the use of Gibbs sampling to generate entire vectors. Unlike conventional methods, the proposed method requires neither the joint distribution nor the correlations to be specified, learned, or modeled explicitly in any way. Application to the Australian Credit dataset demonstrates the feasibility of the approach in generating random vectors on challenging real-world datasets.