Simulating Random Multivariate Correlated Data (Categorical Variables)

This is a repost of the second part of an example that I posted last year but at the time I only had the PDF document (written in $\LaTeXe$).

This is the second example to generate multivariate random associated data. This example shows how to generate ordinal, categorical, data. It is a little more complex than generating continuous data in that the correlation matrix and the marginal distribution is required.  This uses the R library GenOrd.

The graph above plots out the randomly generated data with the given correlation matrix and groups it  by the second variable.  Though there are many other approaches on graphing categorical data available.  One source is available here.

This example creates a 2-variable dataset. However, this can easily be extended to many more variables. The correlation matrix R for this 2-dimensional example.

$R = \left( \begin{smallmatrix} 1&-0.6\\ -0.6&1 \end{smallmatrix} \right)$

The R code below will generate an ordinal dataset with a correlation matrix of:

$R = \left( \begin{smallmatrix} 1&-0.5469243\\ -0.5469243&1 \end{smallmatrix} \right)$

Increasing the sample size will let the correlation coefficients converge on the target correlations.

library(GenOrd)
set.seed(1)
# Sets the marginals.
# The values are cumulative so for the first variable the first marginal will be .1, the second is .2, the third is .3, and the fourth is .4
marginal < - list(c(0.1,0.3,0.6),c(0.4,0.7,0.9)) # Checks the lower and upper bounds of the correlation coefficients. corrcheck(marginal) # Sets the correlation coefficients R <- matrix(c(1,-0.6,-0.6,1),2,2) # Correlation matrix n <- 100 ##Selects and ordinal sample with given correlation R and given marginals. m <- ordsample(n, marginal, R) ##compare it with the pre-defined R cor(m) table(m[,1],m[,2]) chisq.test(m) gbar < - tapply(m[,1], list(m[,1], m[,2]), length) par(mfrow=c(1,1)) barplot(gbar, beside=T, col=cm.colors(4), main="Example Bar Chart of Counts by Group",xlab="Group",ylab="Frequency") [/sourcecode]

Posted in Uncategorized

2 replies on “Simulating Random Multivariate Correlated Data (Categorical Variables)”

1. Maxim K. says:

I wonder how a dataset could be generated with both continuous and categorical variables interrelated by a pre-given correlation matrix.