Skip to contents

Overview

The portalcasting package provides the ability to add functions both to a local copy of the repository for testing as well as to contribute to the base set of models provided within the package (and thus executed in the main repository). Similarly, users may often want or need to analyze the data in a slightly different configuration than what is already available. Here, we walk through the steps to add user-defined models and data sets to the directory for local use or for integration within the production pipeline.

For the purposes here, consider that you are interested in adding a model named “newmod” to the forecasting suite. While “newmod” can work on the existing data, you are are really interested in a slightly different configuration (perhaps only the long-term kangaroo rat exclosures) of the data you call “newdata”.

Here, we assume the user has already run through a basic installation, set up, and evaluation of the package, as covered in the Getting Started vignette.

Model requirements

Unlike previous versions of portalcasting, starting with v0.9.0, there are very few formal requirements for the output of a model. The requirements have been further relaxed starting with v0.51.0, where we now leverage the model controls lists to track and document most functionality.

The forecasting pipeline will work with basically any univariate model fitting function that will run in R, and produce an object that is then capable of being processed by the forecasting function to forecast the model’s predictions, so long as they can be processed via process_model_output(), which only requires mean, lower, and upper elements. The vast majority of the information saved about the forecasts is located in the metadata files and controls lists, resulting in the model itself not needing to produce much specific output to be valid.

Models can be based on either pre-existing functions (e.g., the AutoArima model uses auto.arima from the forecasting package) or specialized functions (such as those designed for the runjags models in portalcasting, which are collated into fit_runjags). Each model-dataset-species combination is run in cast() wrapped in a tryCatch() call, which softens any errors in implementation. The cast() call runs a do.call implementation of the fit and forecast functions from the model controls list and then passes the output of the two functions to process_model_output.

Set up the directory

To allow for a more flexible environment, here we use the setup_sandbox() function to make and fill our directory, which we house at "~/sandbox":

library(portalcasting)
main <- "~/sandbox"
setup_sandbox(main = main)

The controls list for the standard, prefabricated (“prefab”) models that come with the portalcasting package is automatically added to the directory’s models sub folder, as are the model script files for the runjags models.

Similarly, the prefabricated data sets that come with the portalcasting package are automatically added to the data subdirectory.

Adding a model

A new model is added to the pipeline via (at least) additional model controls (with additional files if needed). A starting template for model controls is provided in the portalcasting core files and accessible via model_controls_template():

model_controls_template()
#> $metadata
#> $metadata$name
#> [1] "model_name"
#> 
#> $metadata$print_name
#> [1] "model name"
#> 
#> $metadata$tags
#> list()
#> 
#> $metadata$text
#> NULL
#> 
#> 
#> $fit
#> $fit$fun
#> NULL
#> 
#> $fit$args
#> NULL
#> 
#> 
#> $forecast
#> $forecast$fun
#> [1] "forecast"
#> 
#> $forecast$args
#> $forecast$args$object
#> [1] "model_fit"
#> 
#> $forecast$args$h
#> [1] "metadata$time$lead_time_newmoons"
#> 
#> $forecast$args$level
#> [1] "metadata$confidence_level"
#> 
#> 
#> 
#> $interpolate
#> $interpolate$needed
#> [1] FALSE
#> 
#> 
#> $datasets
#> $datasets$all
#> $datasets$all$species
#>  [1] "BA"    "DM"    "DO"    "DS"    "NA"    "OL"    "OT"    "PB"    "PE"   
#> [10] "PF"    "PL"    "PM"    "PP"    "RM"    "RO"    "SF"    "SH"    "total"
#> 
#> 
#> $datasets$controls
#> $datasets$controls$species
#>  [1] "BA"    "DM"    "DO"    "DS"    "NA"    "OL"    "OT"    "PB"    "PE"   
#> [10] "PF"    "PM"    "PP"    "RM"    "SF"    "SH"    "total"
#> 
#> 
#> $datasets$exclosures
#> $datasets$exclosures$species
#>  [1] "BA"    "NA"    "OL"    "OT"    "PB"    "PE"    "PF"    "PM"    "PP"   
#> [10] "RM"    "SF"    "SH"    "total"
#> 
#> 
#> 
#> $response
#> $response$link
#> NULL
#> 
#> $response$type
#> NULL
#> 
#> $response$scoring_family
#> NULL
#> 
#> 
#> $time
#> [1] "newmoon"

Metadata

One can create custom controls using a suite of new_model_<> functions, each of which wraps a call to a specific component of the model_controls_template inside an update_list() call:

new_model_controls()
#> $metadata
#> $metadata$name
#> [1] "model_name"
#> 
#> $metadata$print_name
#> [1] "model name"
#> 
#> $metadata$tags
#> list()
#> 
#> $metadata$text
#> NULL
#> 
#> 
#> $fit
#> $fit$fun
#> NULL
#> 
#> $fit$args
#> NULL
#> 
#> 
#> $forecast
#> $forecast$fun
#> [1] "forecast"
#> 
#> $forecast$args
#> $forecast$args$object
#> [1] "model_fit"
#> 
#> $forecast$args$h
#> [1] "metadata$time$lead_time_newmoons"
#> 
#> $forecast$args$level
#> [1] "metadata$confidence_level"
#> 
#> 
#> 
#> $interpolate
#> $interpolate$needed
#> [1] FALSE
#> 
#> 
#> $datasets
#> $datasets$all
#> $datasets$all$species
#>  [1] "BA"    "DM"    "DO"    "DS"    "NA"    "OL"    "OT"    "PB"    "PE"   
#> [10] "PF"    "PL"    "PM"    "PP"    "RM"    "RO"    "SF"    "SH"    "total"
#> 
#> 
#> $datasets$controls
#> $datasets$controls$species
#>  [1] "BA"    "DM"    "DO"    "DS"    "NA"    "OL"    "OT"    "PB"    "PE"   
#> [10] "PF"    "PM"    "PP"    "RM"    "SF"    "SH"    "total"
#> 
#> 
#> $datasets$exclosures
#> $datasets$exclosures$species
#>  [1] "BA"    "NA"    "OL"    "OT"    "PB"    "PE"    "PF"    "PM"    "PP"   
#> [10] "RM"    "SF"    "SH"    "total"
#> 
#> 
#> 
#> $response
#> $response$link
#> NULL
#> 
#> $response$type
#> NULL
#> 
#> $response$scoring_family
#> NULL
#> 
#> 
#> $time
#> [1] "newmoon"

new_model_metadata()
#> $name
#> [1] "model_name"
#> 
#> $print_name
#> [1] "model name"
#> 
#> $tags
#> list()
#> 
#> $text
#> NULL

new_model_metadata(name = "newmod")
#> $name
#> [1] "newmod"
#> 
#> $print_name
#> [1] "model name"
#> 
#> $tags
#> list()
#> 
#> $text
#> NULL

new_model_controls(metadata = new_model_metadata(name = "newmod", print_name = "New Model"))
#> $metadata
#> $metadata$name
#> [1] "newmod"
#> 
#> $metadata$print_name
#> [1] "New Model"
#> 
#> $metadata$tags
#> list()
#> 
#> $metadata$text
#> NULL
#> 
#> 
#> $fit
#> $fit$fun
#> NULL
#> 
#> $fit$args
#> NULL
#> 
#> 
#> $forecast
#> $forecast$fun
#> [1] "forecast"
#> 
#> $forecast$args
#> $forecast$args$object
#> [1] "model_fit"
#> 
#> $forecast$args$h
#> [1] "metadata$time$lead_time_newmoons"
#> 
#> $forecast$args$level
#> [1] "metadata$confidence_level"
#> 
#> 
#> 
#> $interpolate
#> $interpolate$needed
#> [1] FALSE
#> 
#> 
#> $datasets
#> $datasets$all
#> $datasets$all$species
#>  [1] "BA"    "DM"    "DO"    "DS"    "NA"    "OL"    "OT"    "PB"    "PE"   
#> [10] "PF"    "PL"    "PM"    "PP"    "RM"    "RO"    "SF"    "SH"    "total"
#> 
#> 
#> $datasets$controls
#> $datasets$controls$species
#>  [1] "BA"    "DM"    "DO"    "DS"    "NA"    "OL"    "OT"    "PB"    "PE"   
#> [10] "PF"    "PM"    "PP"    "RM"    "SF"    "SH"    "total"
#> 
#> 
#> $datasets$exclosures
#> $datasets$exclosures$species
#>  [1] "BA"    "NA"    "OL"    "OT"    "PB"    "PE"    "PF"    "PM"    "PP"   
#> [10] "RM"    "SF"    "SH"    "total"
#> 
#> 
#> 
#> $response
#> $response$link
#> NULL
#> 
#> $response$type
#> NULL
#> 
#> $response$scoring_family
#> NULL
#> 
#> 
#> $time
#> [1] "newmoon"

Fit and Cast Functions

At the very least, each new model will need a fitting function, which can be provided using the new_model_fit(), with elements for the function and arguments:

new_model_fit(fun = "arima", args = list(x = "abundance"))
#> $fun
#> [1] "arima"
#> 
#> $args
#> $args$x
#> [1] "abundance"

new_model_controls(metadata = new_model_metadata(name = "newmod", print_name = "New Model"),
                   fit      = new_model_fit(fun = "arima", args = list(x = "abundance")))
#> $metadata
#> $metadata$name
#> [1] "newmod"
#> 
#> $metadata$print_name
#> [1] "New Model"
#> 
#> $metadata$tags
#> list()
#> 
#> $metadata$text
#> NULL
#> 
#> 
#> $fit
#> $fit$fun
#> [1] "arima"
#> 
#> $fit$args
#> $fit$args$x
#> [1] "abundance"
#> 
#> 
#> 
#> $forecast
#> $forecast$fun
#> [1] "forecast"
#> 
#> $forecast$args
#> $forecast$args$object
#> [1] "model_fit"
#> 
#> $forecast$args$h
#> [1] "metadata$time$lead_time_newmoons"
#> 
#> $forecast$args$level
#> [1] "metadata$confidence_level"
#> 
#> 
#> 
#> $interpolate
#> $interpolate$needed
#> [1] FALSE
#> 
#> 
#> $datasets
#> $datasets$all
#> $datasets$all$species
#>  [1] "BA"    "DM"    "DO"    "DS"    "NA"    "OL"    "OT"    "PB"    "PE"   
#> [10] "PF"    "PL"    "PM"    "PP"    "RM"    "RO"    "SF"    "SH"    "total"
#> 
#> 
#> $datasets$controls
#> $datasets$controls$species
#>  [1] "BA"    "DM"    "DO"    "DS"    "NA"    "OL"    "OT"    "PB"    "PE"   
#> [10] "PF"    "PM"    "PP"    "RM"    "SF"    "SH"    "total"
#> 
#> 
#> $datasets$exclosures
#> $datasets$exclosures$species
#>  [1] "BA"    "NA"    "OL"    "OT"    "PB"    "PE"    "PF"    "PM"    "PP"   
#> [10] "RM"    "SF"    "SH"    "total"
#> 
#> 
#> 
#> $response
#> $response$link
#> NULL
#> 
#> $response$type
#> NULL
#> 
#> $response$scoring_family
#> NULL
#> 
#> 
#> $time
#> [1] "newmoon"

Because the arima function already has a defined forecast method, we do not need to update the cast element of the model controls, as it is already pre-loaded:

new_model_forecast()
#> $fun
#> [1] "forecast"
#> 
#> $args
#> $args$object
#> [1] "model_fit"
#> 
#> $args$h
#> [1] "metadata$time$lead_time_newmoons"
#> 
#> $args$level
#> [1] "metadata$confidence_level"

Datasets

By default, any new model is set to run for the three prefab datasets (all, exclosures, controls), for each relevant species (e.g, not including the kangaroo rats in the exclosures):

new_model_datasets()
#> $all
#> $all$species
#>  [1] "BA"    "DM"    "DO"    "DS"    "NA"    "OL"    "OT"    "PB"    "PE"   
#> [10] "PF"    "PL"    "PM"    "PP"    "RM"    "RO"    "SF"    "SH"    "total"
#> 
#> 
#> $controls
#> $controls$species
#>  [1] "BA"    "DM"    "DO"    "DS"    "NA"    "OL"    "OT"    "PB"    "PE"   
#> [10] "PF"    "PM"    "PP"    "RM"    "SF"    "SH"    "total"
#> 
#> 
#> $exclosures
#> $exclosures$species
#>  [1] "BA"    "NA"    "OL"    "OT"    "PB"    "PE"    "PF"    "PM"    "PP"   
#> [10] "RM"    "SF"    "SH"    "total"

If the specific model needs to use interpolated data, the new_model_interpolate() function should be given an argument needed = TRUE, and then a function input, which allows for invoking specialized functions. A simple example that is used for the prefab models is round_na.interp(), which would be implemented as:

new_model_interpolate(needed = TRUE, fun = "round_na.interp")
#> $needed
#> [1] TRUE
#> 
#> $fun
#> [1] "round_na.interp"

which would just be added as the interpolate element in new_model_controls.

The default, however, is needed = FALSE:

new_model_interpolate()
#> $needed
#> [1] FALSE

Model Response

The response distribution for a model is a key aspect to scoring it properly. We store information about the response in the list for evaluation purposes:

new_model_controls()$response
#> $link
#> NULL
#> 
#> $type
#> NULL
#> 
#> $scoring_family
#> NULL

The options for each component are as follows:

link: normal, negative_binomial, poisson type: distribution, empirical scoring_family: normal, nbinom, poisson, sample

new_model_response(link = "normal", type = "distribution", scoring_family = "normal")
#> $link
#> [1] "normal"
#> 
#> $type
#> [1] "distribution"
#> 
#> $scoring_family
#> [1] "normal"

Adding the Model

The specific updates can be called together to generate the model controls list for newmod:

new_controls <- new_model_controls(metadata = new_model_metadata(name = "newmod", print_name = "New Model"),
                                   fit      = new_model_fit(fun = "arima", args = list(x = "abundance")),
                                   response = new_model_response(link = "normal", type = "distribution", scoring_family = "normal"))

Given that the directory is already established, one can use the add_new_model() function to add the model to the directory at main, via the controls list:

new_controls <- new_model_controls(metadata = new_model_metadata(name = "newmod", print_name = "New Model"),
                                   fit      = new_model_fit(fun = "arima", args = list(x = "abundance")),
                                   response = new_model_response(link = "normal", type = "distribution", scoring_family = "normal"))


added <- add_new_model(main = main, new_model_controls = new_controls)


names(read_models_controls(main = main))

#>  [1] "AutoArima"                            "sAutoArima"                           "ESSS"                                
#>  [4] "NaiveArima"                           "sNaiveArima"                          "nbGARCH"                             
#>  [7] "nbsGARCH"                             "pGARCH"                               "psGARCH"                             
#> [10] "pevGARCH"                             "jags_RW"                              "jags_logistic"                       
#> [13] "jags_logistic_covariates"             "jags_logistic_competition"            "jags_logistic_competition_covariates"
#> [16] "newmod"

And the model is directly ready to be portalcasted:

portalcast(main     = main, 
           models   = "newmod", 
           datasets = "all", 
           species  = c("DM", "PP", "total"))

#> ------------------------------------------------------------
#> Forecasting models...
#> ------------------------------------------------------------
#> This is portalcasting v0.51.0
#> ------------------------------------------------------------
#> 
#>   - newmod for all DM
#>     |++++| successful |++++|
#>   - newmod for all PP
#>     |++++| successful |++++|
#>   - newmod for all total
#>     |++++| successful |++++|
#> ------------------------------------------------------------
#> ...forecasting complete.
#> ------------------------------------------------------------

From the setup Stage

Incorporating the model in the establishment of the directory requires adding the controls list to the initial setup_<> call:

main2 <- "~/sandbox2"
setup_sandbox(main                = main2, 
              new_models_controls = list(newmod = new_controls), 
              models              = c(prefab_models(), "newmod"))

Adding a Dataset

A new dataset is added to the pipeline much in the same way as a new model, but with only a single generating function and fewer elements in the controls list:

dataset_controls_template()
#> $metadata
#> $metadata$name
#> [1] "dataset_name"
#> 
#> $metadata$tags
#> list()
#> 
#> $metadata$text
#> NULL
#> 
#> 
#> $fun
#> [1] "prepare_dataset"
#> 
#> $args
#> $args$name
#> [1] "dataset_name"
#> 
#> $args$species
#>  [1] "BA" "DM" "DO" "DS" "NA" "OL" "OT" "PB" "PE" "PF" "PH" "PL" "PM" "PP" "RF"
#> [16] "RM" "RO" "SF" "SH" "SO"
#> 
#> $args$total
#> [1] TRUE
#> 
#> $args$clean
#> [1] FALSE
#> 
#> $args$type
#> [1] "Rodents"
#> 
#> $args$level
#> [1] "Site"
#> 
#> $args$plots
#> [1] "all"
#> 
#> $args$treatment
#> NULL
#> 
#> $args$min_plots
#> [1] 24
#> 
#> $args$min_traps
#> [1] 1
#> 
#> $args$output
#> [1] "abundance"
#> 
#> $args$fillweight
#> [1] FALSE
#> 
#> $args$unknowns
#> [1] FALSE
#> 
#> $args$time
#> [1] "newmoon"
#> 
#> $args$na_drop
#> [1] FALSE
#> 
#> $args$zero_drop
#> [1] FALSE
#> 
#> $args$effort
#> [1] TRUE
#> 
#> $args$filename
#> [1] "rodents_dataset_name.csv"

Metadata

One can create custom controls using a suite of new_data_<> functions, each of which wraps a call to a specific component of the data_controls_template inside an update_list() call:

new_dataset_controls()
#> $metadata
#> $metadata$name
#> [1] "dataset_name"
#> 
#> $metadata$tags
#> list()
#> 
#> $metadata$text
#> NULL
#> 
#> 
#> $fun
#> [1] "prepare_dataset"
#> 
#> $args
#> $args$name
#> [1] "dataset_name"
#> 
#> $args$species
#>  [1] "BA" "DM" "DO" "DS" "NA" "OL" "OT" "PB" "PE" "PF" "PH" "PL" "PM" "PP" "RF"
#> [16] "RM" "RO" "SF" "SH" "SO"
#> 
#> $args$total
#> [1] TRUE
#> 
#> $args$clean
#> [1] FALSE
#> 
#> $args$type
#> [1] "Rodents"
#> 
#> $args$level
#> [1] "Site"
#> 
#> $args$plots
#> [1] "all"
#> 
#> $args$treatment
#> NULL
#> 
#> $args$min_plots
#> [1] 24
#> 
#> $args$min_traps
#> [1] 1
#> 
#> $args$output
#> [1] "abundance"
#> 
#> $args$fillweight
#> [1] FALSE
#> 
#> $args$unknowns
#> [1] FALSE
#> 
#> $args$time
#> [1] "newmoon"
#> 
#> $args$na_drop
#> [1] FALSE
#> 
#> $args$zero_drop
#> [1] FALSE
#> 
#> $args$effort
#> [1] TRUE
#> 
#> $args$filename
#> [1] "rodents_dataset_name.csv"

new_dataset_metadata()
#> $name
#> [1] "dataset_name"
#> 
#> $tags
#> list()
#> 
#> $text
#> NULL

new_dataset_metadata(name = "newdata")
#> $name
#> [1] "newdata"
#> 
#> $tags
#> list()
#> 
#> $text
#> NULL

new_dataset_controls(metadata = new_dataset_metadata(name = "newdata"))
#> $metadata
#> $metadata$name
#> [1] "newdata"
#> 
#> $metadata$tags
#> list()
#> 
#> $metadata$text
#> NULL
#> 
#> 
#> $fun
#> [1] "prepare_dataset"
#> 
#> $args
#> $args$name
#> [1] "dataset_name"
#> 
#> $args$species
#>  [1] "BA" "DM" "DO" "DS" "NA" "OL" "OT" "PB" "PE" "PF" "PH" "PL" "PM" "PP" "RF"
#> [16] "RM" "RO" "SF" "SH" "SO"
#> 
#> $args$total
#> [1] TRUE
#> 
#> $args$clean
#> [1] FALSE
#> 
#> $args$type
#> [1] "Rodents"
#> 
#> $args$level
#> [1] "Site"
#> 
#> $args$plots
#> [1] "all"
#> 
#> $args$treatment
#> NULL
#> 
#> $args$min_plots
#> [1] 24
#> 
#> $args$min_traps
#> [1] 1
#> 
#> $args$output
#> [1] "abundance"
#> 
#> $args$fillweight
#> [1] FALSE
#> 
#> $args$unknowns
#> [1] FALSE
#> 
#> $args$time
#> [1] "newmoon"
#> 
#> $args$na_drop
#> [1] FALSE
#> 
#> $args$zero_drop
#> [1] FALSE
#> 
#> $args$effort
#> [1] TRUE
#> 
#> $args$filename
#> [1] "rodents_dataset_name.csv"

Generating Function and Its Arguments

All of the existing datasets use the same generating function prepare_dataset(), which is quite flexible and ports arguments directly to the generalized summarize_rodent_data() from the portalr package. We therefore include this function as the default in new dataset controls:

new_dataset_fun()
#> [1] "prepare_dataset"

although that can be changed to whatever generating function a user may want to implement.

The arguments to the function can be updated via the new_dataset_args() function:

new_dataset_args(name = "newdata")
#> $name
#> [1] "newdata"
#> 
#> $species
#>  [1] "BA" "DM" "DO" "DS" "NA" "OL" "OT" "PB" "PE" "PF" "PH" "PL" "PM" "PP" "RF"
#> [16] "RM" "RO" "SF" "SH" "SO"
#> 
#> $total
#> [1] TRUE
#> 
#> $clean
#> [1] FALSE
#> 
#> $type
#> [1] "Rodents"
#> 
#> $level
#> [1] "Site"
#> 
#> $plots
#> [1] "all"
#> 
#> $treatment
#> NULL
#> 
#> $min_plots
#> [1] 24
#> 
#> $min_traps
#> [1] 1
#> 
#> $output
#> [1] "abundance"
#> 
#> $fillweight
#> [1] FALSE
#> 
#> $unknowns
#> [1] FALSE
#> 
#> $time
#> [1] "newmoon"
#> 
#> $na_drop
#> [1] FALSE
#> 
#> $zero_drop
#> [1] FALSE
#> 
#> $effort
#> [1] TRUE
#> 
#> $filename
#> [1] "rodents_dataset_name.csv"

Adding the Dataset

These can all be wrapped up together in a call to new_dataset_controls() to create the controls list that is then passed into add_new_dataset():

new_controls <- new_dataset_controls(metadata = new_dataset_metadata(name = "newdata"),
                                     args     = new_dataset_args(name     = "newdata", 
                                                                 filename = "rodents_newdata.csv"))

Given that the directory is already established, one can use the add_new_dataset() function to add the dataset controls to the directory at main, via the controls list, noting which existing models should have the new dataset added to their controls list:


new_controls <- new_dataset_controls(metadata = new_dataset_metadata(name = "newdata"),
                                     args     = new_dataset_args(name     = "newdata", 
                                                                 filename = "rodents_newdata.csv"))

added <- add_new_dataset(main                 = main, 
                         new_dataset_controls = new_controls, 
                         models               = "AutoArima"))


names(read_datasets_controls(main = main))

#>  [1] "all"        "controls"   "exclosures" "newdata"

And the dataset can then be forecast with:

portalcast(main     = main, 
           models   = "AutoArima", 
           datasets = "newdata", 
           species  = c("DM", "PP", "total"))

#> ------------------------------------------------------------
#> Forecasting models...
#> ------------------------------------------------------------
#> This is portalcasting v0.51.0
#> ------------------------------------------------------------
#> 
#>   - AutoArima for newdata DM
#>     |++++| successful |++++|
#>   - AutoArima for newdata PP
#>     |++++| successful |++++|
#>   - AutoArima for newdata total
#>     |++++| successful |++++|
#> ------------------------------------------------------------
#> ...forecasting complete.
#> ------------------------------------------------------------

From the setup Stage

Incorporating the dataset in the establishment of the directory requires adding the controls list to the initial setup_<> call:

main3 <- "~/sandbox3"
setup_sandbox(main                  = main3, 
              new_datasets_controls = list(newdata = new_controls), 
              datasets              = c(prefab_datasets(), "newdata"))