The universal data structure we’re going to use is:
abundance (required)covariates (optional)metadata (required)If both abundance and covariates are present in the list, then the two data.frames must have the same number of rows.
In the abundance data.frame:
Here, the common usage is for each column to be a species or taxon, and each row to be an observed sample. In other words, each column is a time series, with the rows sorted such that time advances down (higher row indices correspond to later times).
In the covariates data.frame:
The number of rows should match that of abundance, and rows of covariates should line up with abundance (either sampled simultaneously or concurrently). Common covariates are date and time, temperature, treatments, etc.
In the metadata list:
is_community entry, which indicates whether the time series in abundance can be treated as components of a community with interactions and/or shared drivers in some waycitation entry that is a vector of text values for the reference to the dataset. There can be multiple values (e.g. in the case of a specific dataset pulled from a larger database).location entry, it must contain at least a latitude and longitude value (in decimal form). location itself can be a data.frame or vector (that has names)timename entry, it refers to a column in the covariates data.frame that gives a time index for the data
tidyr::full_seq, along with a “period” entry (using 1 if missing) will produce the appropriate equi-timed spacingperiod entry, it must be compatible with tidyr::full_seq and the timename variable described above.species_table entry, it must have an id column that includes all the column names in abundances. This is intended to provide more information about the different variables in abundances.Here is an example of a correctly formatted dataset with covariates and metadata:
library(MATSS) data(dragons) str(dragons) #> List of 3 #> $ abundance :Classes 'tbl_df', 'tbl' and 'data.frame': 6 obs. of 3 variables: #> ..$ Red Spotted Dragon : num [1:6] 2 6 0 5 4 4 #> ..$ Green Striped Dragon : num [1:6] 6 0 4 1 9 7 #> ..$ Blue Eyes White Dragon: num [1:6] 0 0 0 1 0 0 #> $ covariates:'data.frame': 6 obs. of 3 variables: #> ..$ date : Date[1:6], format: "2014-06-28" "2015-06-28" ... #> ..$ precipitation: int [1:6] 7 6 14 18 9 5 #> ..$ effort : num [1:6] 3 3 2 4 1 9 #> $ metadata :List of 7 #> ..$ timename : chr "date" #> ..$ effort : chr "effort" #> ..$ period : num 365 #> ..$ authors :List of 2 #> .. ..$ :Class 'person' hidden list of 1 #> .. .. ..$ :List of 5 #> .. .. .. ..$ given : chr "Ellen" #> .. .. .. ..$ family : chr "Bledsoe" #> .. .. .. ..$ role : chr "aut" #> .. .. .. ..$ email : NULL #> .. .. .. ..$ comment: Named chr "0000-0002-3629-7235" #> .. .. .. .. ..- attr(*, "names")= chr "ORCID" #> .. ..$ :Class 'person' hidden list of 1 #> .. .. ..$ :List of 5 #> .. .. .. ..$ given : chr "Hao" #> .. .. .. ..$ family : chr "Ye" #> .. .. .. ..$ role : chr "aut" #> .. .. .. ..$ email : chr "hao.ye@weecology.org" #> .. .. .. ..$ comment: Named chr "0000-0002-8630-1458" #> .. .. .. .. ..- attr(*, "names")= chr "ORCID" #> .. ..- attr(*, "class")= chr "person" #> ..$ species_table:'data.frame': 4 obs. of 2 variables: #> .. ..$ id : Factor w/ 4 levels "Blue Eyes White Dragon",..: 4 3 1 2 #> .. ..$ game: Factor w/ 2 levels "pokemon","yugioh": NA NA 2 1 #> ..$ citation : chr "Hao Ye, Ellen K. Bledsoe, Renata Diaz, S. K. Morgan Ernest, Juniper L. Simonis, Ethan P. White, & Glenda M. Yen"| __truncated__ #> ..$ is_community : logi TRUE #> - attr(*, "class")= chr "matssdata"
We can view the abundance and covariates tables side by side:
|
|
We also provide a function for checking whether the data is formatted correctly:
check_data_format(dragons) #> [1] TRUE