This vignette outlines the codebase and functionality of the portalcasting package (v0.45.0), which underlies the automated iterative forecasting within the Portal Predictions production pipeline. portalcasting has utilities for setting up local versions of the pipeline for developing and testing new models, which are covered in detail in other vignettes.

Installation

To install the most recent version of portalcasting from GitHub:

install.packages("devtools")
devtools::install_github("weecology/portalcasting")

Directory Structure

The package uses a directory tree with two levels to organize the project:

  • main: project folder encompassing all subfolders
  • subs: specific subfolders that organize the project files

structured as

main
│
└──raw
│   <stable version of raw components used to populate other folders>
└──models
│   <model scripts>
└──data
│   <data used for a specific run of models>
└──casts
│   <previous and current model casts>
└──fits
│   <previous and current model fits>
└──dir_config.yaml

To group the project subfolders into a multi-leveled folder, simply add structure to the main input, such as main = "~/project_folder".

Instantiating a Directory

Setting up a fully functional directory for a production or sandbox pipeline consists of two steps: creating (instantiating folders that are missing) and filling (adding files to the folders). These steps can be executed separately or in combination via a general setup_dir function or via specialized versions of setup_dir: setup_sandbox (for creating a pipeline with defaults to facilitate sandboxing) and setup_production (for creating a production pipeline). These functions are general and flexible, but are designed to work well under default settings. To alter the directory configurations in setup_<> and create_dir, use the settings argument, which takes a list of inputs, condensed and detailed in directory_settings.

Creating

The directory is established using create_dir, which takes main as an argument and in sequence creates each of the levels’ folders if they do not already exist. A typical user is likely to want to change the main input (to locate the forecasting directory where they would like it), but general users should not alter the subs structure, and so that option is not easily available. If needed, the subs can be altered via the directory_settings controls.

create_dir also initializes the dir_config.yaml file, which is held within main and contains metadata about the directory setting up process.

Filling

The directory is filled (loaded with files for forecasting) using a series of subdirectory-specific functions that are combined in the overall fill_dir function:

  • fill_raw downloads each of the raw components for the directory, which presently include the source data (rodents), covariate data (weather, NDVI), and previous forecasts’ archive. Upon completion of the downloads, fill_raw updates dir_config.yaml with download versions.
  • fill_casts moves the existing model cast output files from the raw subdirectory to the casts subdirectory.
  • fill_fits moves the existing model fit files from the raw subdirectory to the fits subdirectory.
  • fill_models writes the model scripts into the models subdirectory.
  • fill_data prepares the forecasting data files from the raw downloaded data files and moves them into the data subdirectory.
    • prep_moons prepares and formats the temporal (lunar) data from the raw data.
    • prep_rodents prepares multiple structures of the rodents data for analyses from the raw data.
    • prep_covariates downloads and forecasts covariates data.
    • prep_metadata creates and saves out a YAML metadata list for the forecasting.

Each of these components can be run individually, as well. In particular, fill_data is used to set up the complete set of data for a given model run, and to reset the data to the most up-to-date version after model completion.

Running models

Models are run using a function pipeline similar to the creation and filling function pipelines, with flexible controls through a variety of arguments, but robust operation under default settings.

  • portalcast is the overarching function that controls casting of the Portal data
    • read_moons brings the lunar data in to the function and last_newmoon determines the most recently passed newmoon, which is used to set the forecast origin (end_moons, note that here the plural in end_moons indicates that multiple forecast origins can be input to portalcast) if it wasn’t set by the user.
    • For each end_moons value, portalcast runs:
      • fill_data which ensures that the data files in the data subdirectory are up-to-date for the specifics requested.
      • cast runs (“casts”) each of the requested models for the data
        • models_to_cast collects the file paths to the scripts in the models subdirectory, which are then run using source.

Utilities

To facilitate tidy and easy-to-follow code, we introduce a few important utility functions, which are put to use throughout the codebase.

File paths

file_ext determines the file extension, based on the separating character (sep_char), which facilitates use with generalized URL APIs. path_no_ext provides extension-removing services.

Data IO

portalcasting has a generalized read_data function that allows for toggling among read_rodents, read_rodents_table, read_covariates, read_historical_covariates, read_forecast_covariates, read_moons, and read_metadata, which each have specific loading procedures in place. Similar to the read_data functions, read_casts provides a simple user interface for reading the cast files into the R session.

For saving out, write_data provides a simple means for interfacing with potentially pre-existing data files, with logical inputs for saving generally and overwriting a pre-existing file specifically, and flexible file naming. The type of data saved out is currently restricted to .csv and .yaml, which is extracted from the filename given.

The directory configuration file is a special file, and has its own IO functions separate from the rest: write_directory_config creates the file (from within create_dir, update_directory_config adds downloads information (from inside fill_raw) read_directory_config brings the information from the file into the R session.

messageq

messageq provides a simple wrapper on message that also has a logical input for quieting. This helps switch messaging off as desired while localizing the actual boolean operator code to one spot.