MATSS
is a package for conducting Macroecological Analyses of Time Series Structure. We designed it to help researchers quickly get started in analyses of ecological time series, and to reinforce and spread good practices in computational analyses.
We provide functionality to:
drake
packageYou can install MATSS
from github with:
# install.packages("remotes") remotes::install_github("weecology/MATSS", build_opts = c("--no-resave-data", "--no-manual")))
And load the package in the typical fashion:
library(MATSS)
One of the best ways to get started is to create a research compendium. An auto-updating example is visible at https://github.com/weecology/MATSSdemo
To get started, identify the location and name for your compendium. For example, ~/MATSSdemo
will put the compendium inside your home directory (the ~
location), with the package name "MATSSdemo"
. (Note that package names can only contain ASCII letters, numbers, and “.” and have to start with a letter.)
create_MATSS_compendium("<path>")
Running this code will perform the following operations:
DESCRIPTION
fileanalysis
folder to hold the analysis files
.bib
file to hold the MATSS reference linked to in the Rmarkdown reportR
folder to hold function definitions
After creating the new project, the readme will contain further instructions to run the code. We summarize briefly here:
analysis/pipeline.R
can be run to perform the analysis and generate the report.analysis/report.md
can be viewed.For further details about how the code within the template project works, see the below guide to interacting with the datasets, the drake
workflow package, and our tools for building reproducible analyses.
Several datasets are included with this package - these can be loaded individually using these specific functions, and require no additional setup.
Other datasets require downloading. To facilitate this, we include functions to help configure a specific location on disk. To check your current setting:
and to configure this setting (and then follow the instructions therein):
use_default_data_path("<path>")
To download individual datasets, call install_retriever_data()
with the name of the dataset:
install_retriever_data("veg-plots-sdl")
To download all the datasets that are currently supported (i.e. with associated code for importing and formatting):
We tap into several collections of datasets in MATSS
, so it is useful to do some preprocessing to split the raw database files into separate datasets. These databases are: * BBS (the North American Breeding Bird Survey) * BioTIME (ecological assemblages from the BioTIME Consortium)
Processing these databases are necessary before loading individual datasets in.
prepare_datasets() # wrapper function to prepare all datasets # prepare_biotime_data() # prepare_bbs_ts_data()
We designed MATSS
to build off of the workflow package drake
for computational analyses. Thus, it can be helpful to have a general understanding of how to use drake
.
The basic apporach to using drake
is:
drake
plansdrake::make()
to perform the work described in a drake
planWe provide several functions to help construct plans:
build_datasets_plan()
constructs a plan for the datasets, with options to include downloaded datasetsbuild_analyses_plan()
constructs a plan for a set of analyses that applies a method to each dataset. It takes as arguments, a plan for the datasets and a plan for the methods.collect_analyses()
combines the output objects from a single analysis applied to multiple datasets. This helps to achieve a consistent structure for the results, regardless of what individual analysis functions actually return.analysis_wrapper()
is a function that wraps a method that applies to a single time series (such as calculating the slope of the linear trendline), so that the result can be applied to a dataset (resulting in outputs of the method applied to each individual time series in that dataset).Usage of these functions is demonstrated in the template R script generated from create_MATSS_compendium()
.
library(drake) library(dplyr) #> #> Attaching package: 'dplyr' #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union # define the plan plan <- drake_plan(data_1 = mtcars, data_2 = iris, my_model = lm(mpg ~ disp, data = data_1), my_summary = data_2 %>% group_by(Species) %>% summarize_all(mean)) # run the plan make(plan) #> ▶ target data_1 #> ▶ target data_2 #> ▶ target my_model #> ▶ target my_summary # check resulting objects readd(my_model) #> #> Call: #> lm(formula = mpg ~ disp, data = data_1) #> #> Coefficients: #> (Intercept) disp #> 29.59985 -0.04122 readd(my_summary) #> # A tibble: 3 x 5 #> Species Sepal.Length Sepal.Width Petal.Length Petal.Width #> <fct> <dbl> <dbl> <dbl> <dbl> #> 1 setosa 5.01 3.43 1.46 0.246 #> 2 versicolor 5.94 2.77 4.26 1.33 #> 3 virginica 6.59 2.97 5.55 2.03
Drake plans are run by calling make()
. This does several things. First it checks the cache to see if any targets need to be re-built, and then it proceeds to build all the targets, in some order that accounts for the dependencies between targets. (e.g. an analysis target that depends on a dataset target to be processed)
The manual has more information about how Drake stores its cache and how Drake decides to rebuild targets.
Note that if there are file inputs, it is important that they are declared explicitly using e.g. file_in()
, knitr_in()
, and file_out()
. This enables Drake to check if those files are changed and to rebuild targets that depend on the files if needed. Otherwise Drake will treat them as fixed strings.
plan <- drake_plan(data = read.csv("some_data.csv")) make(plan) # make some changes to `some_data.csv` make(plan) # will NOT rebuild the `data` target
plan <- drake_plan(data = read.csv(file_in("some_data.csv"))) make(plan) # make some changes to `some_data.csv` make(plan) # will rebuild the `data` target