This vignette outlines the codebase and functionality of the portalcasting package (v0.45.0), which underlies the automated iterative forecasting within the Portal Predictions production pipeline. portalcasting has utilities for setting up local versions of the pipeline for developing and testing new models, which are covered in detail in other vignettes.
To install the most recent version of portalcasting from GitHub:
The package uses a directory tree with two levels to organize the project:
main: project folder encompassing all subfolders
subs: specific subfolders that organize the project files
main │ └──raw │ <stable version of raw components used to populate other folders> └──models │ <model scripts> └──data │ <data used for a specific run of models> └──casts │ <previous and current model casts> └──fits │ <previous and current model fits> └──dir_config.yaml
To group the project subfolders into a multi-leveled folder, simply add structure to the
main input, such as
main = "~/project_folder".
Setting up a fully functional directory for a production or sandbox pipeline consists of two steps: creating (instantiating folders that are missing) and filling (adding files to the folders). These steps can be executed separately or in combination via a general
setup_dir function or via specialized versions of
setup_sandbox (for creating a pipeline with defaults to facilitate sandboxing) and
setup_production (for creating a production pipeline). These functions are general and flexible, but are designed to work well under default settings. To alter the directory configurations in
create_dir, use the
settings argument, which takes a list of inputs, condensed and detailed in
The directory is established using
create_dir, which takes
main as an argument and in sequence creates each of the levels’ folders if they do not already exist. A typical user is likely to want to change the
main input (to locate the forecasting directory where they would like it), but general users should not alter the
subs structure, and so that option is not easily available. If needed, the
subs can be altered via the
create_dir also initializes the
dir_config.yaml file, which is held within
main and contains metadata about the directory setting up process.
The directory is filled (loaded with files for forecasting) using a series of subdirectory-specific functions that are combined in the overall
fill_rawdownloads each of the raw components for the directory, which presently include the source data (rodents), covariate data (weather, NDVI), and previous forecasts’ archive. Upon completion of the downloads,
dir_config.yamlwith download versions.
fill_castsmoves the existing model cast output files from the
rawsubdirectory to the
fill_fitsmoves the existing model fit files from the
rawsubdirectory to the
fill_modelswrites the model scripts into the
fill_dataprepares the forecasting data files from the raw downloaded data files and moves them into the
prep_moonsprepares and formats the temporal (lunar) data from the raw data.
prep_rodentsprepares multiple structures of the rodents data for analyses from the raw data.
prep_covariatesdownloads and forecasts covariates data.
prep_metadatacreates and saves out a YAML metadata list for the forecasting.
Each of these components can be run individually, as well. In particular,
fill_data is used to set up the complete set of data for a given model run, and to reset the data to the most up-to-date version after model completion.
Models are run using a function pipeline similar to the creation and filling function pipelines, with flexible controls through a variety of arguments, but robust operation under default settings.
portalcastis the overarching function that controls casting of the Portal data
read_moonsbrings the lunar data in to the function and
last_newmoondetermines the most recently passed newmoon, which is used to set the forecast origin (
end_moons, note that here the plural in
end_moonsindicates that multiple forecast origins can be input to
portalcast) if it wasn’t set by the user.
fill_datawhich ensures that the data files in the
datasubdirectory are up-to-date for the specifics requested.
castruns (“casts”) each of the requested models for the data
models_to_castcollects the file paths to the scripts in the
modelssubdirectory, which are then run using
To facilitate tidy and easy-to-follow code, we introduce a few important utility functions, which are put to use throughout the codebase.
file_ext determines the file extension, based on the separating character (
sep_char), which facilitates use with generalized URL APIs.
path_no_ext provides extension-removing services.
portalcasting has a generalized
read_data function that allows for toggling among
read_metadata, which each have specific loading procedures in place. Similar to the
read_casts provides a simple user interface for reading the cast files into the R session.
For saving out,
write_data provides a simple means for interfacing with potentially pre-existing data files, with logical inputs for saving generally and overwriting a pre-existing file specifically, and flexible file naming. The type of data saved out is currently restricted to
.yaml, which is extracted from the filename given.
The directory configuration file is a special file, and has its own IO functions separate from the rest:
write_directory_config creates the file (from within
update_directory_config adds downloads information (from inside
read_directory_config brings the information from the file into the R session.