For a given set of covariates X; parameters Beta, Eta, rho, and err; and document-specific time stamps tD and lengths N), simulate a document-by-topic matrix. Additional structuring variables (the numbers of topics (k), terms (V), documents (M), segments (S), and covariates per segment (C)) are inferred from input objects.

sim_LDA_TS_data(N, Beta, X, Eta, rho, tD, err = 0, seed = NULL)

Arguments

N

A vector of document sizes (total word counts). Must be integer conformable. Is used to infer the total number of documents.

Beta

matrix of categorical distribution parameters defining terms within topics. Dimension: k x V (number of topics x number of terms). Used to infer both (k) and (V). Must be non-negative and sum to 1 within topics.

X

matrix of covariates, dimension M (number of documents) x C (number of covariates, including the intercept) (a.k.a the design matrix).

Eta

matrix of regression parameters across the segments, dimension: SC (number of segments x number of covariates, including the intercept) x k (number of topics).

rho

Vector of integer-conformable time locations of changepoints or NULL if no changepoints. Used to determine the number of segments. Must exist within the bounds of the times of the documents, tD.

tD

Vector of integer-conformable times of the documents. Must be of length M (as determined by X).

err

Additive error on the link-scale. Must be a non-negative numeric value. Default value of 0 indicates no error.

seed

Input to set.seed.

Value

A document-by-term matrix of counts (dim: M x V).

Examples

  N <- c(10, 22, 15, 31)
  tD <- c(1, 3, 4, 6)
  rho <- 3
  X <- cbind(rep(1, 4), 1:4)
  Eta <- cbind(c(0.5, 0.3, 0.9, 0.5), c(1.2, 1.1, 0.1, 0.5))
  Beta <- matrix(c(0.1, 0.1, 0.8, 0.2, 0.6, 0.2), 2, 3, byrow = TRUE)
  err <- 1
  sim_LDA_TS_data(N, Beta, X, Eta, rho, tD, err)