5.1 Data Input & Output
5.1.1 Read Data
Read Fortran
read.fortran(file, format, ..., as.is = TRUE, colClasses = NA)
format
Character vector or list of vectors.
Read dta
haven::read_dta()
read Stata data file.
data <- read_dta("climate_health_2406yl.dta")
# retrieve variable labels/definitions
var_dict <- tibble("name" = colnames(data),
"label" = sapply(data, function(x) attr(x, "label")) %>%
as.character()
)
var_dict
var_label(data$gor) # get variable label
val_labels(data$gor) # get value labels
Read fixed width text files
read.fwf(file, widths)
widths
integer vector, giving the widths of the fixed-width fields (of one line), or list of integer vectors giving widths for multiline records.
read.table(f_name, header=FALSE, row.names, col.names, sep="", na.strings = "NA")
a very versatile function. Can be used to read .csv
or .txt
files.
f_name
path to data.header=FALSE
defaults toFALSE
, assumes there is no header row in the file unless specified otherwise.- If there is a header in the first row, should specify
header=TRUE
.
- If there is a header in the first row, should specify
row.names
a vector of row names. This can be- a vector giving the actual row names, or
- a single number giving the column of the table which contains the row names, or
- character string giving the name of the table column containing the row names.
col.names
a vector of optional names for the variables. The default is to use"V"
followed by the column number.sep
use white space as delimiter.- if it is a
csv
file, usesep=','
to specify comma as delimiter
- if it is a
na.strings = "NA"
a character vector of strings which are to be interpreted asNA
values.- A useful setting:
na.strings = c("", "NA", "NULL")
- A useful setting:
read.csv(f_name, header = TRUE, sep = ",", na.strings = "..", dec=".")
header = TRUE
whether the file contains the names of the variables as its first line.sep
the field separator string. Values within each row ofx
are separated by this string.na
the string to use for missing values in the data.dec
the string to use for decimal points in numeric or complex columns: must be a single character.fileEncoding
UTF-8
When reading data from github, you need to pass in the raw version of the data in read.csv()
,
R cannot read the display version.
You can get the URL for the raw version by clicking on the Raw button displayed above the data.
read.table(filename, header=FALSE, sep="")
is more versatile than read.csv
. Useful when you have a data file saved as txt.
Default separator is “white space” for read.table
, i.e., one or more spaces, tabs, newlines or carriage returns.
# read.table can be used to read txt and csv. Need to specify sep=',' when reading csv.
data <- read.table("https://raw.githubusercontent.com/my1396/course_dataset/refs/heads/main/bonedensity.txt", header=TRUE)
data
data <- read.table("https://raw.githubusercontent.com/my1396/course_dataset/refs/heads/main/bonedensity.csv", header=TRUE, sep=",")
# Alternatively, can use read_csv or read.csv directly
data <- read_csv("https://raw.githubusercontent.com/my1396/course_dataset/refs/heads/main/bonedensity.csv")
data
read_delim(f_name, delim=";")
allows you to specify the delimeter as ;
.
readr::read_csv(f_name, na = c("..", NA, ""),
locale = locale(encoding = "UTF-8"),
col_types = cols(Date = col_date(format = "%m/%d/%y")) )
read comma separated values.
col_types
specify column types. Could be created bylist()
orcols()
.read_csv
will automatically guess, if you don’t explicitly specify column types. You can override column types by providing the argumentcol_types
. You don’t need to provide all column types, just the ones you want to override.By default, reading a file without a column specification will print a message showing what
readr
guessed they were. To remove this message,- set
show_col_types = FALSE
for one time setting, or - set
options(readr.show_col_types = FALSE)
for the current sessions’ global options setting. If want to change permanently everytime when R starts, putoptions(readr.show_col_types = FALSE)
in.Rprofile
as global options.
- set
read_tsv()
read tab separated values.
read_csv2(f_name, na = c("..", NA, ""))
use semicolon ;
to separate values; and use comma ,
for the decimal point. This is common in some European countries.
locale
The locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can uselocale()
to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.locale(date_names = "en", date_format = "%AD", time_format = "%AT", decimal_mark = ".", grouping_mark = ",", tz = "UTC", encoding = "UTF-8", asciify = FALSE)
decimal_mark
indicate the decimal place, can only be,
or.
encoding
This only affects how the file is read - readr always converts the output to UTF-8.
5.1.2 Write Data
Save data in uft8
encoding with special language characters
write_excel_csv()
include a UTF-8 Byte order mark which indicates to Excel the csv is UTF-8 encoded.
write.csv(x, f_name, row.names=TRUE, fileEncoding ="UTF-8")
x
a matrix or data frame. If not one of the types, it is attempted to coercex
to a data frame.write_csv(x)
x
can only be data frame or tibble. Doesn’t support matrix.
row.names
whether to write row names ofx
. Defaults toTRUE
.
5.1.2.1 flextable
flextable
package create tables for reporting and publications.
The main function is flextable
which takes a data.frame
as argument and returns a flextable
. If you are using RStudio or another R GUI, the table will be displayed in the Viewer
panel or in your default browser.
The package provides a set of functions to easily create some tables from others objects.
The as_flextable()
function is used to transform specific objects into flextable
objects. For example, you can transform a crosstab produced with the ‘tables’ package into a flextable which can then be formatted, annotated or augmented with footnotes.