5.1 Data Input & Output

5.1.1 Read Data

Read Fortran

read.fortran(file, format, ..., as.is = TRUE, colClasses = NA)

format Character vector or list of vectors.

Read dta

haven::read_dta() read Stata data file.

data <- read_dta("climate_health_2406yl.dta")
# retrieve variable labels/definitions
var_dict <- tibble(
  "name" = colnames(data),
  "label" = sapply(data, function(x) attr(x, "label")) %>% 
  as.character()
  )
var_dict
                   
var_label(data$gor) # get variable label
val_labels(data$gor) # get value labels

Read fixed width text files

5.1.1.1 Base R functions

read.fwf(file, widths)

widths integer vector, giving the widths of the fixed-width fields (of one line), or list of integer vectors giving widths for multiline records.

read.table(f_name, header=FALSE, row.names, col.names, sep="", na.strings = "NA") a very versatile function. Can be used to read .csv or .txt files.

f_name path to data.
header=FALSE defaults to FALSE, assumes there is no header row in the file unless specified otherwise.
- If there is a header in the first row, should specify header=TRUE.
row.names a vector of row names. This can be
- a vector giving the actual row names, or
- a single number giving the column of the table which contains the row names, or
- character string giving the name of the table column containing the row names.
col.names a vector of optional names for the variables. The default is to use "V" followed by the column number.
sep use white space as delimiter.
- if it is a csv file, use sep=',' to specify comma as delimiter
na.strings = "NA" a character vector of strings which are to be interpreted as NA values.
- A useful setting: na.strings = c("", "NA", "NULL")

read.csv(f_name, header = TRUE, sep = ",", na.strings = "..", dec=".")

header = TRUE whether the file contains the names of the variables as its first line.
sep the field separator string. Values within each row of x are separated by this string.
na the string to use for missing values in the data.
dec the string to use for decimal points in numeric or complex columns: must be a single character.
fileEncoding UTF-8

When reading data from github, you need to pass in the raw version of the data in read.csv(),

R cannot read the display version.

You can get the URL for the raw version by clicking on the Raw button displayed above the data.

read.table(filename, header=FALSE, sep="") is more versatile than read.csv. Useful when you have a data file saved as txt. Default separator is “white space” for read.table, i.e., one or more spaces, tabs, newlines or carriage returns.

# read.table can be used to read txt and csv. Need to specify sep=',' when reading csv.
data <- read.table("https://raw.githubusercontent.com/my1396/course_dataset/refs/heads/main/bonedensity.txt", header=TRUE)
data

data <- read.table("https://raw.githubusercontent.com/my1396/course_dataset/refs/heads/main/bonedensity.csv", header=TRUE, sep=",")

# Alternatively, can use read_csv or read.csv directly
data <- read_csv("https://raw.githubusercontent.com/my1396/course_dataset/refs/heads/main/bonedensity.csv")
data

5.1.1.2 `readr`

The major difference of readr is that it returns a tibble instead of a data frame.

read_delim(f_name, delim = ";", col_names = TRUE, skip = 0) allows you to specify the delimeter as ;.

col_names = TRUE whether the first row contains column names.
skip = 0 number of lines to skip before reading the data. Default is 0, meaning no lines are skipped.

read_delim(f_name, delim = "\t") read tab separated values.

read_tsv() read tab separated values.

Read comma separated values.

readr::read_csv(
  f_name,
  na = c("..", NA, ""),
  locale = locale(encoding = "UTF-8"),
  col_types = cols(Date = col_date(format = "%m/%d/%y"))
)

col_types specify column types. Could be created by list() or cols().

read_csv will automatically guess, if you don’t explicitly specify column types. You can override column types by providing the argument col_types. You don’t need to provide all column types, just the ones you want to override.
By default, reading a file without a column specification will print a message showing what readr guessed they were. To remove this message,
- set show_col_types = FALSE for one time setting, or
- set options(readr.show_col_types = FALSE) for the current sessions’ global options setting. If want to change permanently everytime when R starts, put options(readr.show_col_types = FALSE) in .Rprofile as global options.

read_csv2(f_name, na = c("..", NA, "")) use semicolon ; to separate values; and use comma , for the decimal point. This is common in some European countries.

locale The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can use locale() to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.
locale(date_names = "en", date_format = "%AD", time_format = "%AT", decimal_mark = ".", grouping_mark = ",", tz = "UTC", encoding = "UTF-8", asciify = FALSE)
- decimal_mark indicate the decimal place, can only be , or .
- encoding This only affects how the file is read - readr always converts the output to UTF-8.

5.1.2 Write Data

Save data in uft8 encoding with special language characters

write_excel_csv() include a UTF-8 Byte order mark which indicates to Excel the csv is UTF-8 encoded.

write.csv(x, f_name, row.names=TRUE, fileEncoding ="UTF-8")

x a matrix or data frame. If not one of the types, it is attempted to coerce x to a data frame.
- write_csv(x) x can only be data frame or tibble. Doesn’t support matrix.
```
mat %>% as_tibble(rownames = "rowname") %>% write_csv("mat.csv")
mat %>% write.csv("mat.csv")
```
row.names whether to write row names of x. Defaults to TRUE.

`flextable`

flextable package create tables for reporting and publications.

The main function is flextable which takes a data.frame as argument and returns a flextable. If you are using RStudio or another R GUI, the table will be displayed in the Viewer panel or in your default browser.

The package provides a set of functions to easily create some tables from others objects.

The as_flextable() function is used to transform specific objects into flextable objects. For example, you can transform a crosstab produced with the ‘tables’ package into a flextable which can then be formatted, annotated or augmented with footnotes.