Chapter 6 Tidyverse
tidyverse
is a collection of packages for data analyses. This package is designed to make it easy to install and load multiple tidyverse
packages in a single step. The following packages are included in the core tidyverse: ggplot2
, dplyr
, tidyr
, readr
, purrr
, tibble
, stringr
, forcats
, lubridate
.
The tidyverse also includes many other packages with more specialized usage. They are not loaded automatically with library(tidyverse)
, so you’ll need to load each one with its own call to library()
.
tibble
Package
Create a tibble
, just the same way as data.fram
, only that without row names.
tibble()
does much less than data.frame()
: it never changes the type of the inputs (e.g. it never converts strings to factors!), it never changes the names of variables, it only recycles inputs of length 1, and it never creates row.names()
.
as_tibble()
vs tibble()
:
as_tibble()
turns an existing object, such as a data frame or matrix, into a so-called tibble, a data frame with classtbl_df
.This is in contrast with
tibble()
, which builds a tibble from individual columns.If using
tibble()
on a whole data frame, it would generate a one columntibble
in which the column contains the data frame.tibble
columns are versatile, can be lists, matrices, tibbles, etc.
tibble(
a = list(
c = "three",
d = list(4:5)
)
)
#> # A tibble: 2 × 1
#> a
#> <named list>
#> 1 <chr [1]>
#> 2 <list [1]>
Print tibbles
tbl_df %>% print(n = Inf)
print all rows. print.tbl_df
is useful in terms of explicitly and setting arguments like n
and width
.
n
print the firstn
rows. Whenn=Inf
, it means to print all rows.width
Width of text output to generate. This defaults toNULL
, which means use thewidth
inoptions()
. Whenwidth=Inf
, will print all columns.
Use ?print.tbl_df
to show help page.
Alternatively, use, tbl_df %>% data.frame()
to print the whole table. data.frame
won’t round numbers. Usually tbl
round at the 6-th digit after the decimal point.
print(as_tibble(mtcars), n = 3)
first convert to tibble
, then specify the rows to print.
data.table
package has nice table print settings. You can preview the head and tail at the same time. It doesn’t give you column details, such as data type, but it gives you a feeling of the data structure without using head
and tail
functions twice.
The data.table
R package is being used in different fields such as finance and genomics and is especially useful for those of you that are working with large data sets (for example, 1GB to 100GB in RAM).
data.table
Cheatsheet: https://www.datacamp.com/cheat-sheet/the-datatable-r-package-cheat-sheet
Data Frame and Vector Conversion
reframe
can return an arbitrary number of rows per group, while summarise()
reduces each group down to a single row and mutate
returns the same number of rows as the input.
reframe()
always returns an ungrouped data frame.
reframe()
is theoretically connected to two functions in tibble, tibble::enframe()
and tibble::deframe()
:
enframe()
: vector → data framedeframe()
: data frame → vectorreframe()
: data frame → data frame, with arbitrary number of rows per group.
enframe
and deframe
convert vectors to tibbles and vice verse.
Example Usage:
enframe(1:3)
#> # A tibble: 3 × 2
#> name value
#> <int> <int>
#> 1 1 1
#> 2 2 2
#> 3 3 3
enframe(c(a = 5, b = 7))
#> # A tibble: 2 × 2
#> name value
#> <chr> <dbl>
#> 1 a 5
#> 2 b 7
enframe(list(one = 1, two = 2:3, three = 4:6))
#> # A tibble: 3 × 2
#> name value
#> <chr> <list>
#> 1 one <dbl [1]>
#> 2 two <int [2]>
#> 3 three <int [3]>
deframe(enframe(3:1))
#> 1 2 3
#> 3 2 1
deframe(tibble(a = 1:3))
#> [1] 1 2 3
deframe(tibble(a = as.list(1:3)))
#> [[1]]
#> [1] 1
#>
#> [[2]]
#> [1] 2
#>
#> [[3]]
#> [1] 3
Concatenate list elements into a table
Use
magrittr
’s pipe operatorBut using the new base/native pipe (
|>
) leads to errors performing the same operation:myList |> do.call("rbind", .) #> Error in do.call(myList, "rbind", .) : #> second argument must be a list
The error happens because
|>
always inserts into the first argument and does NOT support dot. A workaround is to use named arguements:Use
bind_rows
fromdplyr
orrbindlist
fromdata.table
:Use
reduce
frompurrr
one row/column tibble
as_tibble_row(x)
and as_tibble_col(x, column_name="value")
convert a vector to one row or one column tibble
; from vetor
to tibble
.
as_tibble(data, rownames="new_col_name")
convert (df) to tibble. Flexible with the format of the input data, can be a range of classes.
data
A data frame, list, matrix, or other object that could reasonably be coerced to a tibble.rownames
the name of a new column. Existing rownames are transferred into this column. IfNULL
then remove the rowname column.
rownames_to_column(.data, var = "new colname")
and column_to_rownames(.data, var = "col to use as rownames")
using one column as row names, or converting row names to one column.
.data
needs to be a data frame; strict with input data type;var
- in
rownames_to_column
: new column name for original rownames in the data.frame, or - in
column_to_rownames
: convert tibble to data frame, and specify which column to use as rownames.
- in