Data Retriever Logo Data Retriever

The Data Retriever is a package manager for data. It downloads, cleans, and stores publicly available data, so that analysts spend less time cleaning and managing data, and more time analyzing it.

“Thanks to the Data Retriever I went from idea to results in 30 minutes, and to a submitted manuscript in two months.” – Jean Philippe Gibert

Quick Start

The Data Retriever is written in Python and run using a command line interface or an associated R package. It installs publicly available data into a variety of databases (MySQL, PostgreSQL, SQLite, MS Access) and file formats (csv, json, xml).

Installation

If you have Python installed use pip from the terminal (additional install instructions):

pip install retriever

or install using conda:

conda install retriever -c conda-forge

To install the associated R package:

install.packages('rdataretriever')

Command line interface

List available datasets:

retriever ls

Install the Portal dataset into csv files:

retriever install csv portal

Install the iris dataset into an SQLite database named iris.sqlite:

retriever install sqlite iris -f iris.sqlite

Available install formats are: mysql, postgres, sqlite, access, csv, json, and xml.

Python interface

Import:

import retriever as rt

List available datasets:

rt.dataset_names()

Install the iris dataset into SQLite:

rt.install_sqlite('iris')

R interface

List available datasets:

rdataretriever::datasets()

Install the iris dataset into SQLite:

rdataretriever::install('iris', 'sqlite')

Download and load data on forest fires directly into R:

iris_data <- rdataretriever::fetch('forest-fires-portugal')

See the documentation for more commands, details, and datasets.

Install

This section provides full installation instructions for the Data Retriever. In addition to the approaches described in the Quick Start it also covers how to install using installers that don’t require installing Python directly and how to install from source.

Method 1: Python

The easiest way to install the Data Retriever is using Python (as described in the Quick Start). If you have Python installed run:

pip install retriever

or if your Python installation is based on conda or Anaconda use:

conda install retriever -c conda-forge

If you don’t have Python installed we recommend installing it using the Anaconda installer then running the conda install command above. NOTE: When installing Anaconda through the installer, make sure that the ‘add to PATH’ option is checked.

Method 2: Installers

If you don’t want to install Python you can download installers for Windows, OS X and Ubuntu/Debian Linux.

Windows

  • Click here to download the installer
  • Run the downloaded .exe file
  • Open a new terminal and type retriever update to download the most recent scripts. The terminal needs to be new so that the Data Retriever is on the path.
  • The retriever should now run from the terminal. Type retriever and press enter to see the basic documentation.

OS X

  • Click here to download the app
  • Unzip the .zip file for the most recent release
  • Move the .app file into Applications
  • Run the following commands from the Terminal to add the Data Retriever to your path:
echo "/Applications/retriever.app/Contents/MacOS" > retrieverapp
sudo mkdir -p /etc/paths.d
sudo mv retrieverapp /etc/paths.d
  • Open a new terminal and type retriever update to download the most recent scripts. The terminal needs to be new so that the Data Retriever is on the path.
  • The retriever should now run from the terminal. Type retriever and press enter to see the basic documentation.

Ubuntu/Debian

  • Download and run the .deb file
  • Open a new terminal and type retriever update to download the most recent scripts. The terminal needs to be new so that the Data Retriever is on the path.
  • The retriever should now run from the terminal. Type retriever and press enter to see the basic documentation.

Method 3: Installing from Source

If you want to use the current development version of the software either use pip to install directly from GitHub:

pip install git+https://git@github.com/weecology/retriever.git

or:

  1. Clone the repository
  2. cd into the repository directory and run pip install . (you may need to include sudo at the beginning of the command depending on your system).

More extensive documentation for those that are interested in developing can be found here.

R Package Installation

The R package wraps the command line interface, so the core Data Retriever needs to be installed first by following the instructions above. Then install the R package:

install.packages('rdataretriever')

This is the same method described in the Quick Start.

If you want to install the development version of the package use devtools to install directly from GitHub (if you don’t have devtools installed run install.packages('devtools') first):

devtools::install_github('ropensci/rdataretriever')
Docs
Contribute

The Data Retriever is an open source project and we welcome contributions of all shapes and sizes. Resources for those interested in getting involved include: