MATSS is a package for conducting Macroecological Analyses of Time Series Structure. We designed it to help researchers quickly get started in analyses of ecological time series, and to reinforce and spread good practices in computational analyses.
We provide functionality to:
You can install
MATSS from github with:
# install.packages("remotes") remotes::install_github("weecology/MATSS", build_opts = c("--no-resave-data", "--no-manual")))
And load the package in the typical fashion:
One of the best ways to get started is to create a research compendium. An auto-updating example is visible at https://github.com/weecology/MATSSdemo
To get started, identify the location and name for your compendium. For example,
~/MATSSdemo will put the compendium inside your home directory (the
~ location), with the package name
"MATSSdemo". (Note that package names can only contain ASCII letters, numbers, and “.” and have to start with a letter.)
Running this code will perform the following operations:
analysisfolder to hold the analysis files
.bibfile to hold the MATSS reference linked to in the Rmarkdown report
Rfolder to hold function definitions
After creating the new project, the readme will contain further instructions to run the code. We summarize briefly here:
analysis/pipeline.Rcan be run to perform the analysis and generate the report.
analysis/report.mdcan be viewed.
For further details about how the code within the template project works, see the below guide to interacting with the datasets, the
drake workflow package, and our tools for building reproducible analyses.
Several datasets are included with this package - these can be loaded individually using these specific functions, and require no additional setup.
Other datasets require downloading. To facilitate this, we include functions to help configure a specific location on disk. To check your current setting:
and to configure this setting (and then follow the instructions therein):
To download individual datasets, call
install_retriever_data() with the name of the dataset:
To download all the datasets that are currently supported (i.e. with associated code for importing and formatting):
We tap into several collections of datasets in
MATSS, so it is useful to do some preprocessing to split the raw database files into separate datasets. These databases are: * BBS (the North American Breeding Bird Survey) * BioTIME (ecological assemblages from the BioTIME Consortium)
Processing these databases are necessary before loading individual datasets in.
prepare_datasets() # wrapper function to prepare all datasets # prepare_biotime_data() # prepare_bbs_ts_data()
MATSS to build off of the workflow package
drake for computational analyses. Thus, it can be helpful to have a general understanding of how to use
The basic apporach to using
drake::make()to perform the work described in a
We provide several functions to help construct plans:
build_datasets_plan()constructs a plan for the datasets, with options to include downloaded datasets
build_analyses_plan()constructs a plan for a set of analyses that applies a method to each dataset. It takes as arguments, a plan for the datasets and a plan for the methods.
collect_analyses()combines the output objects from a single analysis applied to multiple datasets. This helps to achieve a consistent structure for the results, regardless of what individual analysis functions actually return.
analysis_wrapper()is a function that wraps a method that applies to a single time series (such as calculating the slope of the linear trendline), so that the result can be applied to a dataset (resulting in outputs of the method applied to each individual time series in that dataset).
Usage of these functions is demonstrated in the template R script generated from
library(drake) library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union # define the plan plan <- drake_plan(data_1 = mtcars, data_2 = iris, my_model = lm(mpg ~ disp, data = data_1), my_summary = data_2 %>% group_by(Species) %>% summarize_all(mean)) # run the plan make(plan) #> ▶ target data_1 #> ▶ target data_2 #> ▶ target my_model #> ▶ target my_summary # check resulting objects readd(my_model) #> #> Call: #> lm(formula = mpg ~ disp, data = data_1) #> #> Coefficients: #> (Intercept) disp #> 29.59985 -0.04122 readd(my_summary) #> # A tibble: 3 x 5 #> Species Sepal.Length Sepal.Width Petal.Length Petal.Width #> <fct> <dbl> <dbl> <dbl> <dbl> #> 1 setosa 5.01 3.43 1.46 0.246 #> 2 versicolor 5.94 2.77 4.26 1.33 #> 3 virginica 6.59 2.97 5.55 2.03
Drake plans are run by calling
make(). This does several things. First it checks the cache to see if any targets need to be re-built, and then it proceeds to build all the targets, in some order that accounts for the dependencies between targets. (e.g. an analysis target that depends on a dataset target to be processed)
Note that if there are file inputs, it is important that they are declared explicitly using e.g.
file_out(). This enables Drake to check if those files are changed and to rebuild targets that depend on the files if needed. Otherwise Drake will treat them as fixed strings.
plan <- drake_plan(data = read.csv("some_data.csv")) make(plan) # make some changes to `some_data.csv` make(plan) # will NOT rebuild the `data` target