Fit a set of multinomial regression models (via multinom, Venables and Ripley 2002) to a time series of data divided into multiple segments (a.k.a. chunks) based on given locations for a set of change points.

check_multinom_TS_inputs checks that the inputs to multinom_TS are of proper classes for an analysis.

multinom_TS(data, formula, changepoints = NULL, timename = "time",
  weights = NULL, control = list())

check_multinom_TS_inputs(data, formula = gamma ~ 1,
  changepoints = NULL, timename = "time", weights = NULL,
  control = list())

Arguments

data

data.frame including [1] the time variable (indicated in timename), [2] the predictor variables (required by formula) and [3], the multinomial response variable (indicated in formula) as verified by check_timename and check_formula. Note that the response variables should be formatted as a data.frame object named as indicated by the response entry in the control list, such as gamma for a standard TS analysis on LDA output. See Examples.

formula

formula defining the regression between relationship the change points. Any predictor variable included must also be a column in data and any (multinomial) response variable must be a set of columns in data, as verified by check_formula.

changepoints

Numeric vector indicating locations of the change points. Must be conformable to integer values. Validity checked by check_changepoints and verify_changepoint_locations.

timename

character element indicating the time variable used in the time series. Defaults to "time". The variable must be integer-conformable or a Date. If the variable named is a Date, the input is converted to an integer, resulting in the timestep being 1 day, which is often not desired behavior.

weights

Optional class numeric vector of weights for each document. Defaults to NULL, translating to an equal weight for each document. When using multinom_TS in a standard LDATS analysis, it is advisable to weight the documents by their total size, as the result of LDA is a matrix of proportions, which does not account for size differences among documents. For most models, a scaling of the weights (so that the average is 1) is most appropriate, and this is accomplished using document_weights.

control

A list of parameters to control the fitting of the Time Series model including the parallel tempering Markov Chain Monte Carlo (ptMCMC) controls. Values not input assume defaults set by TS_control.

Value

multinom_TS: Object of class multinom_TS_fit, which is a list of [1] chunk-level model fits ("chunk models"), [2] the total log likelihood combined across all chunks ("logLik"), and [3] a data.frame of chunk beginning and ending times ("logLik" with columns "start" and "end").

check_multinom_TS_inputs: an error message is thrown if any input is improper, otherwise NULL.

References

Venables, W. N. and B. D. Ripley. 2002. Modern and Applied Statistics with S. Fourth Edition. Springer, New York, NY, USA.

Examples

  data(rodents)
  dtt <- rodents$document_term_table
  lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE))
  dct <- rodents$document_covariate_table
  dct$gamma <- lda[[1]]@gamma
  weights <- document_weights(dtt)
  check_multinom_TS_inputs(dct, timename = "newmoon")
  mts <- multinom_TS(dct, formula = gamma ~ 1, changepoints = c(20,50),
                     timename = "newmoon", weights = weights)