For a given dataset consisting of counts of words across multiple documents in a corpus, conduct multiple Latent Dirichlet Allocation (LDA) models (using the Variational Expectation Maximization (VEM) algorithm; Blei et al. 2003) to account for [1] uncertainty in the number of latent topics and [2] the impact of initial values in the estimation procedure.

LDA_set is a list wrapper of LDA in the topicmodels package (Grun and Hornik 2011).

check_LDA_set_inputs checks that all of the inputs are proper for LDA_set (that the table of observations is conformable to a matrix of integers, the number of topics is an integer, the number of seeds is an integer and the controls list is proper).

LDA_set(document_term_table, topics = 2, nseeds = 1,
  control = list())

check_LDA_set_inputs(document_term_table, topics, nseeds, control)

Arguments

document_term_table

Table of observation count data (rows: documents, columns: terms. May be a class matrix or data.frame but must be conformable to a matrix of integers, as verified by check_document_term_table.

topics

Vector of the number of topics to evaluate for each model. Must be conformable to integer values.

nseeds

Number of seeds (replicate starts) to use for each value of topics. Must be conformable to integer value.

control

A list of parameters to control the running and selecting of LDA models. Values not input assume default values set by LDA_set_control. Values for running the LDAs replace defaults in (LDAcontol, see LDA (but if seed is given, it will be overwritten; use iseed instead).

Value

LDA_set: list (class: LDA_set) of LDA models (class: LDA_VEM). check_LDA_set_inputs: an error message is thrown if any input is improper, otherwise NULL.

References

Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3:993-1022. link.

Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software 40:13. link.

Examples

  data(rodents)
  lda_data <- rodents$document_term_table
  r_LDA <- LDA_set(lda_data, topics = 2, nseeds = 2)