This is the main interface function for the LDATS application of Bayesian change point Time Series analyses (Christensen et al. 2018), which extends the model of Western and Kleykamp (2004; see also Ruggieri 2013) to multinomial (proportional) response data using softmax regression (Ripley 1996, Venables and Ripley 2002, Bishop 2006) using a generalized linear modeling approach (McCullagh and Nelder 1989). The models are fit using parallel tempering Markov Chain Monte Carlo (ptMCMC) methods (Earl and Deem 2005) to locate change points and neural networks (Ripley 1996, Venables and Ripley 2002, Bishop 2006) to estimate regressors.

check_TS_inputs checks that the inputs to TS are of proper classes for a full analysis.

TS(data, formula = gamma ~ 1, nchangepoints = 0, timename = "time",
  weights = NULL, control = list())

check_TS_inputs(data, formula = gamma ~ 1, nchangepoints = 0,
  timename = "time", weights = NULL, control = list())

Arguments

data

data.frame including [1] the time variable (indicated in timename), [2] the predictor variables (required by formula) and [3], the multinomial response variable (indicated in formula) as verified by check_timename and check_formula. Note that the response variables should be formatted as a data.frame object named as indicated by the response entry in the control list, such as gamma for a standard TS analysis on LDA output. See Examples.

formula

formula defining the regression between relationship the change points. Any predictor variable included must also be a column in data and any (multinomial) response variable must be a set of columns in data, as verified by check_formula.

nchangepoints

integer corresponding to the number of change points to include in the model. 0 is a valid input (corresponding to no change points, so a singular time series model), and the current implementation can reasonably include up to 6 change points. The number of change points is used to dictate the segmentation of the time series into chunks fit with separate models dictated by formula.

timename

character element indicating the time variable used in the time series. Defaults to "time". The variable must be integer-conformable or a Date. If the variable named is a Date, the input is converted to an integer, resulting in the timestep being 1 day, which is often not desired behavior.

weights

Optional class numeric vector of weights for each document. Defaults to NULL, translating to an equal weight for each document. When using multinom_TS in a standard LDATS analysis, it is advisable to weight the documents by their total size, as the result of LDA is a matrix of proportions, which does not account for size differences among documents. For most models, a scaling of the weights (so that the average is 1) is most appropriate, and this is accomplished using document_weights.

control

A list of parameters to control the fitting of the Time Series model including the parallel tempering Markov Chain Monte Carlo (ptMCMC) controls. Values not input assume defaults set by TS_control.

Value

TS: TS_fit-class list containing the following elements, many of which are hidden for printing, but are accessible:

data

data input to the function.

formula

formula input to the function.

nchangepoints

nchangepoints input to the function.

weights

weights input to the function.

control

control input to the function.

lls

Iteration-by-iteration logLik values for the full time series fit by multinom_TS.

rhos

Iteration-by-iteration change point estimates from est_changepoints.

etas

Iteration-by-iteration marginal regressor estimates from est_regressors, which have been unconditioned with respect to the change point locations.

ptMCMC_diagnostics

ptMCMC diagnostics, see diagnose_ptMCMC

rho_summary

Summary table describing rhos (the change point locations), see summarize_rhos.

rho_vcov

Variance-covariance matrix for the estimates of rhos (the change point locations), see measure_rho_vcov.

eta_summary

Summary table describing ets (the regressors), see summarize_etas.

eta_vcov

Variance-covariance matrix for the estimates of etas (the regressors), see measure_eta_vcov.

logLik

Across-iteration average of log-likelihoods (lls).

nparams

Total number of parameters in the full model, including the change point locations and regressors.

deviance

Penalized negative log-likelihood, based on logLik and nparams.

check_TS_inputs: An error message is thrown if any input is not proper, else NULL.

References

Bishop, C. M. 2006. Pattern Recognition and Machine Learning. Springer, New York, NY, USA.

Christensen, E., D. J. Harris, and S. K. M. Ernest. 2018. Long-term community change through multiple rapid transitions in a desert rodent community. Ecology 99:1523-1529. link.

Earl, D. J. and M. W. Deem. 2005. Parallel tempering: theory, applications, and new perspectives. Physical Chemistry Chemical Physics 7: 3910-3916. link.

McCullagh, P. and J. A. Nelder. 1989. Generalized Linear Models. 2nd Edition. Chapman and Hall, New York, NY, USA.

Ripley, B. D. 1996. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK.

Ruggieri, E. 2013. A Bayesian approach to detecting change points in climactic records. International Journal of Climatology 33:520-528. link.

Venables, W. N. and B. D. Ripley. 2002. Modern and Applied Statistics with S. Fourth Edition. Springer, New York, NY, USA.

Western, B. and M. Kleykamp. 2004. A Bayesian change point model for historical time series analysis. Political Analysis 12:354-374. link.

Examples

  data(rodents)
  document_term_table <- rodents$document_term_table
  document_covariate_table <- rodents$document_covariate_table
  LDA_models <- LDA_set(document_term_table, topics = 2)[[1]]
  data <- document_covariate_table
  data$gamma <- LDA_models@gamma
  weights <- document_weights(document_term_table)
# \donttest{
  TSmod <- TS(data, gamma ~ 1, nchangepoints = 1, "newmoon", weights)
# }
  check_TS_inputs(data, timename = "newmoon")