For a given dataset consisting of counts of words across
multiple documents in a corpus, conduct multiple Latent Dirichlet
Allocation (LDA) models (using the Variational Expectation
Maximization (VEM) algorithm; Blei et al. 2003) to account for [1]
uncertainty in the number of latent topics and [2] the impact of initial
values in the estimation procedure.
LDA_set
is a list wrapper of LDA
in the topicmodels
package (Grun and Hornik 2011).
check_LDA_set_inputs
checks that all of the inputs
are proper for LDA_set
(that the table of observations is
conformable to a matrix of integers, the number of topics is an integer,
the number of seeds is an integer and the controls list is proper).
LDA_set(document_term_table, topics = 2, nseeds = 1, control = list()) check_LDA_set_inputs(document_term_table, topics, nseeds, control)
document_term_table | Table of observation count data (rows:
documents, columns: terms. May be a class |
---|---|
topics | Vector of the number of topics to evaluate for each model.
Must be conformable to |
nseeds | Number of seeds (replicate starts) to use for each
value of |
control | A |
LDA_set
: list
(class: LDA_set
) of LDA models
(class: LDA_VEM
).
check_LDA_set_inputs
: an error message is thrown if any input is
improper, otherwise NULL
.
Blei, D. M., A. Y. Ng, and M. I. Jordan. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3:993-1022. link.
Grun B. and K. Hornik. 2011. topicmodels: An R Package for Fitting Topic Models. Journal of Statistical Software 40:13. link.
data(rodents) lda_data <- rodents$document_term_table r_LDA <- LDA_set(lda_data, topics = 2, nseeds = 2)