Fit a set of multinomial regression models (via
multinom, Venables and Ripley 2002) to a time series
of data divided into multiple segments (a.k.a. chunks) based on given
locations for a set of change points.
check_multinom_TS_inputs checks that the inputs to
multinom_TS are of proper classes for an analysis.
multinom_TS(data, formula, changepoints = NULL, timename = "time",
weights = NULL, control = list())
check_multinom_TS_inputs(data, formula = gamma ~ 1,
changepoints = NULL, timename = "time", weights = NULL,
control = list())
Arguments
| data |
data.frame including [1] the time variable (indicated
in timename), [2] the predictor variables (required by
formula) and [3], the multinomial response variable (indicated in
formula) as verified by check_timename and
check_formula. Note that the response variables should be
formatted as a data.frame object named as indicated by the
response entry in the control list, such as gamma
for a standard TS analysis on LDA output. See Examples.
|
| formula |
formula defining the regression between
relationship the change points. Any
predictor variable included must also be a column in
data and any (multinomial) response variable must be a set of
columns in data, as verified by check_formula.
|
| changepoints |
Numeric vector indicating locations of the change
points. Must be conformable to integer values. Validity
checked by check_changepoints and
verify_changepoint_locations. |
| timename |
character element indicating the time variable
used in the time series. Defaults to "time". The variable must be
integer-conformable or a Date. If the variable named
is a Date, the input is converted to an integer, resulting in the
timestep being 1 day, which is often not desired behavior.
|
| weights |
Optional class numeric vector of weights for each
document. Defaults to NULL, translating to an equal weight for
each document. When using multinom_TS in a standard LDATS
analysis, it is advisable to weight the documents by their total size,
as the result of LDA is a matrix of
proportions, which does not account for size differences among documents.
For most models, a scaling of the weights (so that the average is 1) is
most appropriate, and this is accomplished using
document_weights. |
| control |
A list of parameters to control the fitting of the
Time Series model including the parallel tempering Markov Chain
Monte Carlo (ptMCMC) controls. Values not input assume defaults set by
TS_control. |
Value
multinom_TS: Object of class multinom_TS_fit,
which is a list of [1]
chunk-level model fits ("chunk models"), [2] the total log
likelihood combined across all chunks ("logLik"), and [3] a
data.frame of chunk beginning and ending times ("logLik"
with columns "start" and "end").
check_multinom_TS_inputs: an error message is thrown if any
input is improper, otherwise NULL.
References
Venables, W. N. and B. D. Ripley. 2002. Modern and Applied
Statistics with S. Fourth Edition. Springer, New York, NY, USA.
Examples
data(rodents)
dtt <- rodents$document_term_table
lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE))
dct <- rodents$document_covariate_table
dct$gamma <- lda[[1]]@gamma
weights <- document_weights(dtt)
check_multinom_TS_inputs(dct, timename = "newmoon")
mts <- multinom_TS(dct, formula = gamma ~ 1, changepoints = c(20,50),
timename = "newmoon", weights = weights)