Fit a set of multinomial regression models (via
multinom
, Venables and Ripley 2002) to a time series
of data divided into multiple segments (a.k.a. chunks) based on given
locations for a set of change points.
check_multinom_TS_inputs
checks that the inputs to
multinom_TS
are of proper classes for an analysis.
multinom_TS(data, formula, changepoints = NULL, timename = "time",
weights = NULL, control = list())
check_multinom_TS_inputs(data, formula = gamma ~ 1,
changepoints = NULL, timename = "time", weights = NULL,
control = list())
Arguments
data |
data.frame including [1] the time variable (indicated
in timename ), [2] the predictor variables (required by
formula ) and [3], the multinomial response variable (indicated in
formula ) as verified by check_timename and
check_formula . Note that the response variables should be
formatted as a data.frame object named as indicated by the
response entry in the control list, such as gamma
for a standard TS analysis on LDA output. See Examples .
|
formula |
formula defining the regression between
relationship the change points. Any
predictor variable included must also be a column in
data and any (multinomial) response variable must be a set of
columns in data , as verified by check_formula .
|
changepoints |
Numeric vector indicating locations of the change
points. Must be conformable to integer values. Validity
checked by check_changepoints and
verify_changepoint_locations . |
timename |
character element indicating the time variable
used in the time series. Defaults to "time" . The variable must be
integer-conformable or a Date . If the variable named
is a Date , the input is converted to an integer, resulting in the
timestep being 1 day, which is often not desired behavior.
|
weights |
Optional class numeric vector of weights for each
document. Defaults to NULL , translating to an equal weight for
each document. When using multinom_TS in a standard LDATS
analysis, it is advisable to weight the documents by their total size,
as the result of LDA is a matrix of
proportions, which does not account for size differences among documents.
For most models, a scaling of the weights (so that the average is 1) is
most appropriate, and this is accomplished using
document_weights . |
control |
A list of parameters to control the fitting of the
Time Series model including the parallel tempering Markov Chain
Monte Carlo (ptMCMC) controls. Values not input assume defaults set by
TS_control . |
Value
multinom_TS
: Object of class multinom_TS_fit
,
which is a list of [1]
chunk-level model fits ("chunk models"
), [2] the total log
likelihood combined across all chunks ("logLik"
), and [3] a
data.frame
of chunk beginning and ending times ("logLik"
with columns "start"
and "end"
).
check_multinom_TS_inputs
: an error message is thrown if any
input is improper, otherwise NULL
.
References
Venables, W. N. and B. D. Ripley. 2002. Modern and Applied
Statistics with S. Fourth Edition. Springer, New York, NY, USA.
Examples
data(rodents)
dtt <- rodents$document_term_table
lda <- LDA_set(dtt, 2, 1, list(quiet = TRUE))
dct <- rodents$document_covariate_table
dct$gamma <- lda[[1]]@gamma
weights <- document_weights(dtt)
check_multinom_TS_inputs(dct, timename = "newmoon")
mts <- multinom_TS(dct, formula = gamma ~ 1, changepoints = c(20,50),
timename = "newmoon", weights = weights)