The Tidyverse

Tidy data

Data can be represented in many different ways across multiple tables but the tidyverse packages are specifically designed to work with tidy datasets. Tidy data conforms to the following criteria:

Each variable has its own column
Each row is a single observation
Each value has its own cell

This is the optimal structure when working in R and provides consistency amongst your datasets. Getting your data into R and wrangling it into the correct format is always the first step in your analysis. Fortunately, the tidyr package contains many functions to tidy up your dataset.

We will start by reading in a dataset. The readr package has functions for importing data as tibbles. Tibbles are the tidyverse compatible version of an R dataframe. They have stricter formatting and allow you to perform grouping of variables as we will see in the next section.

library(tidyverse)

#If you already have the data installed on your computer you can read from a file:
surveys <- read_csv("data/surveys_complete.csv")

#Otherwise you can read from a URL
surveys <- read_csv("http://bifx-core3.bio.ed.ac.uk/training/R_dplyr_and_ggplot2/data/surveys_complete.csv")

Discussion

Look at the options available in the read_csv and compare this with the read.table function we saw earlier.
What other readr functions can you find?

This dataset contains observations from a field survey of different organisms at different sites (plots). Let’s inspect the data.

#Type an R objects name to print the contents
surveys

#Use the View function
View(surveys)

#We can look at the structure of the dataset
str(surveys)

## spc_tbl_ [30,463 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ record_id      : num [1:30463] 845 1164 1261 1756 1818 ...
##  $ month          : num [1:30463] 5 8 9 4 5 7 10 11 1 5 ...
##  $ day            : num [1:30463] 6 5 4 29 30 4 25 17 16 18 ...
##  $ year           : num [1:30463] 1978 1978 1978 1979 1979 ...
##  $ plot_id        : num [1:30463] 2 2 2 2 2 2 2 2 2 2 ...
##  $ species_id     : chr [1:30463] "NL" "NL" "NL" "NL" ...
##  $ sex            : chr [1:30463] "M" "M" "M" "M" ...
##  $ hindfoot_length: num [1:30463] 32 34 32 33 32 32 33 30 33 31 ...
##  $ weight         : num [1:30463] 204 199 197 166 184 206 274 186 184 87 ...
##  $ genus          : chr [1:30463] "Neotoma" "Neotoma" "Neotoma" "Neotoma" ...
##  $ species        : chr [1:30463] "albigula" "albigula" "albigula" "albigula" ...
##  $ taxa           : chr [1:30463] "Rodent" "Rodent" "Rodent" "Rodent" ...
##  $ plot_type      : chr [1:30463] "Control" "Control" "Control" "Control" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   record_id = col_double(),
##   ..   month = col_double(),
##   ..   day = col_double(),
##   ..   year = col_double(),
##   ..   plot_id = col_double(),
##   ..   species_id = col_character(),
##   ..   sex = col_character(),
##   ..   hindfoot_length = col_double(),
##   ..   weight = col_double(),
##   ..   genus = col_character(),
##   ..   species = col_character(),
##   ..   taxa = col_character(),
##   ..   plot_type = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

Further Resources

There are cheatsheets available for many tidyverse and rstudio packages that will help you to choose the correct functions.
Take a look at these slides or www.tidyverse.org for more information on the tidyverse.

Key points

The tidyverse is a suite of R packages
Stick to the principles and philosophy of tidy data
Use the readr package to import data as tibbles
Use further tidyverse packages to tidy, re-format and visualise data

The Tidyverse

Learning Objectives

Tidy data

Discussion

Further Resources

Key points