For a given set of parameters alpha
and Beta
and
document-specific total word counts, simulate a document-by-term matrix.
Additional structuring variables (the numbers of topics (k),
documents (M), terms (V)) are inferred from input objects.
sim_LDA_data(N, Beta, alpha = NULL, Theta = NULL, seed = NULL)
N | A vector of document sizes (total word counts). Must be integer conformable. Is used to infer the total number of documents. |
---|---|
Beta |
|
alpha | Single positive numeric value for the Dirichlet distribution
parameter defining topics within documents. To specifically define
document topic probabilities, use |
Theta |
|
seed | Input to |
A document-by-term matrix
of counts (dim: M x V).
N <- c(10, 22, 15, 31) alpha <- 1.2 Beta <- matrix(c(0.1, 0.1, 0.8, 0.2, 0.6, 0.2), 2, 3, byrow = TRUE) sim_LDA_data(N, Beta, alpha = alpha) Theta <- matrix(c(0.2, 0.8, 0.8, 0.2, 0.5, 0.5, 0.9, 0.1), 4, 2, byrow = TRUE) sim_LDA_data(N, Beta, Theta = Theta)