LDATS package provides functionality for analyzing time series of high-dimensional data using a two-stage approach comprised of Latent Dirichlet Allocation (LDA) and Bayesian time series (TS) analyses.
For a full description of the math underlying the
LDATS package, see the technical document.
You can install the stable version of
LDATS from CRAN with:
To obtain the current development version of
LDATS from GitHub, install the
devtools package and then use it to install
Here is an example of a full LDA-TS analysis using the Portal rodent data:
Which conducts two replicates (
nseeds) for each of two to five topics in an LDA model using the document term table, selects the best (AIC) of those, then conducts two time series models on it (an intercept-only model under 0 and 1 changepoints), then selects the best (AIC) of the time series, and packages all the models together. This uses the document term table to weight the samples by their sizes (number of words) and instructs the function to use the column named
"newmoon" in the document covariates table as the time variable.
The resulting object is of class
LDA_TS, which has a few basic routines available:
prints the selected LDA and TS models and
produces a 4-panel figure of them a la Figure 1 from Christensen et al. 2018.
Based on initial work using LDA to analyze time-series data at Portal by Erica M. Christensen, David J. Harris, and S. K. Morgan Ernest, which has been published in Ecology
The motivating study—the Portal Project—has been funded nearly continuously since 1977 by the National Science Foundation, most recently by DEB-1622425 to S. K. M. Ernest, which also supported (in part) E. Christensen’s time. Much of the computational work (including time of J. Simonis, D. Harris, and H. Ye) was supported by the Gordon and Betty Moore Foundation’s Data-Driven Discovery Initiative through Grant GBMF4563 to E. P. White. R. Diaz was supported in part by a National Science Foundation Graduate Research Fellowship (No. DGE-1315138 and DGE-1842473).