This is a wrapper function that expands the main Time Series analyses function (TS) across the LDA models (estimated using LDA or LDA_set and the Time Series models, with respect to both continuous time formulas and the number of discrete changepoints. This function allows direct passage of the control parameters for the parallel tempering MCMC through to the main Time Series function, TS, via the ptMCMC_controls argument.

check_TS_on_LDA_inputs checks that the inputs to TS_on_LDA are of proper classes for a full analysis.

TS_on_LDA(LDA_models, document_covariate_table, formulas = ~1,
  nchangepoints = 0, timename = "time", weights = NULL,
  control = list())

check_TS_on_LDA_inputs(LDA_models, document_covariate_table,
  formulas = ~1, nchangepoints = 0, timename = "time",
  weights = NULL, control = list())

Arguments

LDA_models

List of LDA models (class LDA_set, produced by LDA_set) or a singular LDA model (class LDA, produced by LDA).

document_covariate_table

Document covariate table (rows: documents, columns: time index and covariate options). Every model needs a covariate to describe the time value for each document (in whatever units and whose name in the table is input in timename) that dictates the application of the change points. In addition, all covariates named within specific models in formula must be included. Must be a conformable to a data table, as verified by check_document_covariate_table.

formulas

Vector of formula(s) for the continuous (non-change point) component of the time series models. Any predictor variable included in a formula must also be a column in the document_covariate_table. Each element (formula) in the vector is evaluated for each number of change points and each LDA model.

nchangepoints

Vector of integers corresponding to the number of change points to include in the time series models. 0 is a valid input corresponding to no change points (i.e., a singular time series model), and the current implementation can reasonably include up to 6 change points. Each element in the vector is the number of change points used to segment the data for each formula (entry in formulas) component of the TS model, for each selected LDA model.

timename

character element indicating the time variable used in the time series. Defaults to "time". The variable must be integer-conformable or a Date. If the variable named is a Date, the input is converted to an integer, resulting in the timestep being 1 day, which is often not desired behavior.

weights

Optional class numeric vector of weights for each document. Defaults to NULL, translating to an equal weight for each document. When using multinom_TS in a standard LDATS analysis, it is advisable to weight the documents by their total size, as the result of LDA is a matrix of proportions, which does not account for size differences among documents. For most models, a scaling of the weights (so that the average is 1) is most appropriate, and this is accomplished using document_weights.

control

A list of parameters to control the fitting of the Time Series model including the parallel tempering Markov Chain Monte Carlo (ptMCMC) controls. Values not input assume defaults set by TS_control.

Value

TS_on_LDA: TS_on_LDA-class list of results from TS applied for each model on each LDA model input.

check_TS_inputs: An error message is thrown if any input is not proper, else NULL.

Examples

# \donttest{
  data(rodents)
  document_term_table <- rodents$document_term_table
  document_covariate_table <- rodents$document_covariate_table
  LDAs <- LDA_set(document_term_table, topics = 2:3, nseeds = 2)
  LDA_models <- select_LDA(LDAs)
  weights <- document_weights(document_term_table)
  formulas <- c(~ 1, ~ newmoon)
  mods <- TS_on_LDA(LDA_models, document_covariate_table, formulas,
                    nchangepoints = 0:1, timename = "newmoon", weights)
# }