### Your first graph in R ### Presentation of the survey data ## download.file("https://ndownloader.figshare.com/files/2292169", ## "data/portal_data_joined.csv") surveys <- read.csv(file = "data/portal_data_joined.csv") head(surveys) library(ggplot2) ggplot(surveys) + geom_point(aes(x = hindfoot_length, y = weight, colour = species_id)) ggplot(surveys) + geom_boxplot(aes(x = species_id, y = weight)) ### Creating objects in R ### Vectors and data types ## ## We’ve seen that atomic vectors can be of type character, numeric, integer, and ## ## logical. But what happens if we try to mix these types in a single ## ## vector? ## ## ## What will happen in each of these examples? (hint: use `class()` to ## ## check the data type of your object) ## num_char <- c(1, 2, 3, 'a') ## ## num_logical <- c(1, 2, 3, TRUE) ## ## char_logical <- c('a', 'b', 'c', TRUE) ## ## tricky <- c(1, 2, 3, '4') ## ## ## Why do you think it happens? ## ## ## Can you draw a diagram that represents the hierarchy of the data ## ## types? # * Can you figure out why `"four" > "five"` returns `TRUE`? ## Challenge ## Based on the output of `str(surveys)`, can you answer the following questions? ## * What is the class of the object `surveys`? ## * How many rows and how many columns are in this object? ## * How many species have been recorded during these surveys? ### Factors sex <- factor(c("male", "female", "female", "male")) food <- factor(c("low", "high", "medium", "high", "low", "medium", "high")) levels(food) food <- factor(food, levels=c("low", "medium", "high")) levels(food) min(food) ## doesn't work food <- factor(food, levels=c("low", "medium", "high"), ordered=TRUE) levels(food) min(food) ## works! f <- factor(c(1, 5, 10, 2)) as.numeric(f) ## wrong! and there is no warning... as.numeric(as.character(f)) ## works... as.numeric(levels(f))[f] ## The recommended way. ## The function `plot()` can be used to quickly create a bar plot of a factor. ## For instance, for a factor exprmt <- factor(c("treat1", "treat2", "treat1", "treat3", "treat1", "control", "treat1", "treat2", "treat3")) ## the code `plot(exprmt)` ## gives you a barplot of the number of observations, as shown below. ## * What determines the order in which the treatments are listed in the plot? ## * How can you recreate this plot with "control" listed last instead ## of first? plot(exprmt) ## The data.frame class ## Compare the output of these examples, and compare the difference between when ## the data are being read as `character`, and when they are being read as ## `factor`. example_data <- data.frame(animal=c("dog", "cat", "sea cucumber", "sea urchin"), feel=c("furry", "furry", "squishy", "spiny"), weight=c(45, 8, 1.1, 0.8)) str(example_data) example_data <- data.frame(animal=c("dog", "cat", "sea cucumber", "sea urchin"), feel=c("furry", "furry", "squishy", "spiny"), weight=c(45, 8, 1.1, 0.8), stringsAsFactors=FALSE) str(example_data) ## ## Challenge ## ## There are a few mistakes in this hand crafted `data.frame`, ## ## can you spot and fix them? Don't hesitate to experiment! ## author_book <- data.frame(author_first=c("Charles", "Ernst", "Theodosius"), ## author_last=c(Darwin, Mayr, Dobzhansky), ## year=c(1942, 1970)) ## ## Challenge: ## ## Can you predict the class for each of the columns in the following ## ## example? ## ## Check your guesses using `str(country_climate)`: ## ## * Are they what you expected? Why? why not? ## ## * What would have been different if we had added `stringsAsFactors = FALSE` ## ## to this call? ## ## * What would you need to change to ensure that each column had the ## ## accurate data type? ## country_climate <- data.frame(country=c("Canada", "Panama", "South Africa", "Australia"), ## climate=c("cold", "hot", "temperate", "hot/temperate"), ## temperature=c(10, 30, 18, "15"), ## northern_hemisphere=c(TRUE, TRUE, FALSE, "FALSE"), ## has_kangaroo=c(FALSE, FALSE, FALSE, 1)) ## Sequences and Subsetting data frames ### 1. The function `nrow()` on a `data.frame` returns the number of ### rows. Use it, in conjuction with `seq()` to create a new ### `data.frame` called `surveys_by_10` that includes every 10th row ### of the survey data frame starting at row 10 (10, 20, 30, ...) ### ### 2. Create a data.frame containing only the observation from row 1999 of the --> ### surveys dataset. ### ### 3. Notice how `nrow()` gave you the number of rows in a `data.frame`? Use `nrow()` ### instead of a row number to make a `data.frame` with observations from only the last ### row of the `surveys` dataset. ### ### 4. Now that you've seen how `nrow()` can be used to stand in for a row index, let's combine ### that behavior with the `-` notation above to reproduce the behavior of `head(surveys)` ### excluding the 7th through final row of the `surveys` dataset.