Data Retriever
The Data Retriever is a package manager for data. It downloads, cleans, and stores publicly available data, so that analysts spend less time cleaning and managing data, and more time analyzing it.
“Thanks to the Data Retriever I went from idea to results in 30 minutes, and to a submitted manuscript in two months.” – Jean Philippe Gibert
Quick Start
The Data Retriever is written in Python and run using a command line interface or an associated R package. It installs publicly available data into a variety of databases (MySQL, PostgreSQL, SQLite, MS Access) and file formats (csv, json, xml).
Installation
If you have Python installed use pip from the terminal (additional install instructions):
pip install retriever
or install using conda:
conda install retriever -c conda-forge
To install the associated R package:
install.packages('rdataretriever')
Command line interface
List available datasets:
retriever ls
Install the Portal dataset into csv files:
retriever install csv portal
Install the iris dataset into an SQLite database named iris.sqlite:
retriever install sqlite iris -f iris.sqlite
Available install formats are: mysql, postgres, sqlite, access, csv, json, and xml.
Python interface
Import:
import retriever as rt
List available datasets:
rt.dataset_names()
Install the iris dataset into SQLite:
rt.install_sqlite('iris')
R interface
List available datasets:
rdataretriever::datasets()
Install the iris dataset into SQLite:
rdataretriever::install('iris', 'sqlite')
Download and load data on forest fires directly into R:
iris_data <- rdataretriever::fetch('forest-fires-portugal')
See the documentation for more commands, details, and datasets.
Install
This section provides full installation instructions for the Data Retriever. In addition to the approaches described in the Quick Start it also covers how to install using installers that don’t require installing Python directly and how to install from source.
Method 1: Python
The easiest way to install the Data Retriever is using Python (as described in the Quick Start). If you have Python installed run:
pip install retriever
or if your Python installation is based on conda or Anaconda use:
conda install retriever -c conda-forge
If you don’t have Python installed we recommend installing it using the Anaconda installer then running the conda install command above. NOTE: When installing Anaconda through the installer, make sure that the ‘add to PATH’ option is checked.
Method 2: Installers
If you don’t want to install Python you can download installers for Windows, OS X and Ubuntu/Debian Linux.
Windows
- Click here to download the installer
- Run the downloaded
.exe
file - Open a new terminal and type
retriever update
to download the most recent scripts. The terminal needs to be new so that the Data Retriever is on the path. - The retriever should now run from the terminal. Type
retriever
and press enter to see the basic documentation.
OS X
- Click here to download the app
- Unzip the
.zip
file for the most recent release - Move the .app file into
Applications
- Run the following commands from the Terminal to add the Data Retriever to your path:
echo "/Applications/retriever.app/Contents/MacOS" > retrieverapp
sudo mkdir -p /etc/paths.d
sudo mv retrieverapp /etc/paths.d
- Open a new terminal and type
retriever update
to download the most recent scripts. The terminal needs to be new so that the Data Retriever is on the path. - The retriever should now run from the terminal. Type
retriever
and press enter to see the basic documentation.
Ubuntu/Debian
- Download and run the
.deb
file - Open a new terminal and type
retriever update
to download the most recent scripts. The terminal needs to be new so that the Data Retriever is on the path. - The retriever should now run from the terminal. Type
retriever
and press enter to see the basic documentation.
Method 3: Installing from Source
If you want to use the current development version of the software either use pip to install directly from GitHub:
pip install git+https://git@github.com/weecology/retriever.git
or:
- Clone the repository
cd
into the repository directory and runpip install .
(you may need to include sudo at the beginning of the command depending on your system).
More extensive documentation for those that are interested in developing can be found here.
R Package Installation
The R package wraps the command line interface, so the core Data Retriever needs to be installed first by following the instructions above. Then install the R package:
install.packages('rdataretriever')
This is the same method described in the Quick Start.
If you want to install the development version of the package use devtools to
install directly from GitHub (if you don’t have devtools installed run
install.packages('devtools')
first):
devtools::install_github('ropensci/rdataretriever')
Docs
Full documentation is available on Read the Docs and includes details on:
- The Python interface
- The command line interface
- Available datasets
- Adding datasets to the Data Retriever
- The contributor Code of Conduct
- The Developer’s guide for people interested in contributing to project.
- R package documentation is available on CRAN.
Contribute
The Data Retriever is an open source project and we welcome contributions of all shapes and sizes. Resources for those interested in getting involved include: