The tidyverse
is a suite of packages
that includes libraries such as dplyr
and
ggplot2
. These packages are designed for
data science and share underlying principles, grammar and data
structures. There are many ways to do the same thing in R, but following
the philosophy of tidy data and using the
tidyverse
packages will keep your datasets
organised and make analysis easier in the long run.
Data can be represented in many different ways across multiple tables but the tidyverse packages are specifically designed to work with tidy datasets. Tidy data conforms to the following criteria:
This is the optimal structure when working in R and provides
consistency amongst your datasets. Getting your data into R and
wrangling it into the correct format is always the first step in your
analysis. Fortunately, the tidyr
package
contains many functions to tidy up your dataset.
We will start by reading in a dataset. The
readr
package has functions for importing
data as tibbles. Tibbles are the tidyverse compatible
version of an R dataframe. They have stricter formatting and allow you
to perform grouping of variables as we will see in the next section.
library(tidyverse)
#If you already have the data installed on your computer you can read from a file:
surveys <- read_csv("data/surveys_complete.csv")
#Otherwise you can read from a URL
surveys <- read_csv("http://bifx-core3.bio.ed.ac.uk/training/R_dplyr_and_ggplot2/data/surveys_complete.csv")
read_csv
and
compare this with the read.table
function we saw
earlier.readr
functions can you find?This dataset contains observations from a field survey of different organisms at different sites (plots). Let’s inspect the data.
#Type an R objects name to print the contents
surveys
#Use the View function
View(surveys)
#We can look at the structure of the dataset
str(surveys)
## spc_tbl_ [30,463 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ record_id : num [1:30463] 845 1164 1261 1756 1818 ...
## $ month : num [1:30463] 5 8 9 4 5 7 10 11 1 5 ...
## $ day : num [1:30463] 6 5 4 29 30 4 25 17 16 18 ...
## $ year : num [1:30463] 1978 1978 1978 1979 1979 ...
## $ plot_id : num [1:30463] 2 2 2 2 2 2 2 2 2 2 ...
## $ species_id : chr [1:30463] "NL" "NL" "NL" "NL" ...
## $ sex : chr [1:30463] "M" "M" "M" "M" ...
## $ hindfoot_length: num [1:30463] 32 34 32 33 32 32 33 30 33 31 ...
## $ weight : num [1:30463] 204 199 197 166 184 206 274 186 184 87 ...
## $ genus : chr [1:30463] "Neotoma" "Neotoma" "Neotoma" "Neotoma" ...
## $ species : chr [1:30463] "albigula" "albigula" "albigula" "albigula" ...
## $ taxa : chr [1:30463] "Rodent" "Rodent" "Rodent" "Rodent" ...
## $ plot_type : chr [1:30463] "Control" "Control" "Control" "Control" ...
## - attr(*, "spec")=
## .. cols(
## .. record_id = col_double(),
## .. month = col_double(),
## .. day = col_double(),
## .. year = col_double(),
## .. plot_id = col_double(),
## .. species_id = col_character(),
## .. sex = col_character(),
## .. hindfoot_length = col_double(),
## .. weight = col_double(),
## .. genus = col_character(),
## .. species = col_character(),
## .. taxa = col_character(),
## .. plot_type = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
readr
package to import data as tibbles