5.1 Data Input & Output
5.1.1 Read Data
Read Fortran
read.fortran(file, format, ..., as.is = TRUE, colClasses = NA)
formatCharacter vector or list of vectors.
Read dta
haven::read_dta() read Stata data file.
data <- read_dta("climate_health_2406yl.dta")
# retrieve variable labels/definitions
var_dict <- tibble(
"name" = colnames(data),
"label" = sapply(data, function(x) attr(x, "label")) %>%
as.character()
)
var_dict
var_label(data$gor) # get variable label
val_labels(data$gor) # get value labels Read fixed width text files
5.1.1.1 Base R functions
read.fwf(file, widths)
widthsinteger vector, giving the widths of the fixed-width fields (of one line), or list of integer vectors giving widths for multiline records.
read.table(f_name, header=FALSE, row.names, col.names, sep="", na.strings = "NA") a very versatile function. Can be used to read .csv or .txt files.
f_namepath to data.header=FALSEdefaults toFALSE, assumes there is no header row in the file unless specified otherwise.- If there is a header in the first row, should specify
header=TRUE.
- If there is a header in the first row, should specify
row.namesa vector of row names. This can be- a vector giving the actual row names, or
- a single number giving the column of the table which contains the row names, or
- character string giving the name of the table column containing the row names.
col.namesa vector of optional names for the variables. The default is to use"V"followed by the column number.sepuse white space as delimiter.- if it is a
csvfile, usesep=','to specify comma as delimiter
- if it is a
na.strings = "NA"a character vector of strings which are to be interpreted asNAvalues.- A useful setting:
na.strings = c("", "NA", "NULL")
- A useful setting:
read.csv(f_name, header = TRUE, sep = ",", na.strings = "..", dec=".")
header = TRUEwhether the file contains the names of the variables as its first line.septhe field separator string. Values within each row ofxare separated by this string.nathe string to use for missing values in the data.decthe string to use for decimal points in numeric or complex columns: must be a single character.fileEncodingUTF-8
When reading data from github, you need to pass in the raw version of the data in read.csv(),
R cannot read the display version.
You can get the URL for the raw version by clicking on the Raw button displayed above the data.
read.table(filename, header=FALSE, sep="") is more versatile than read.csv. Useful when you have a data file saved as txt.
Default separator is “white space” for read.table, i.e., one or more spaces, tabs, newlines or carriage returns.
# read.table can be used to read txt and csv. Need to specify sep=',' when reading csv.
data <- read.table("https://raw.githubusercontent.com/my1396/course_dataset/refs/heads/main/bonedensity.txt", header=TRUE)
data
data <- read.table("https://raw.githubusercontent.com/my1396/course_dataset/refs/heads/main/bonedensity.csv", header=TRUE, sep=",")
# Alternatively, can use read_csv or read.csv directly
data <- read_csv("https://raw.githubusercontent.com/my1396/course_dataset/refs/heads/main/bonedensity.csv")
data5.1.1.2 readr
The major difference of readr is that it returns a tibble instead of a data frame.
read_delim(f_name, delim = ";", col_names = TRUE, skip = 0) allows you to specify the delimeter as ;.
col_names = TRUEwhether the first row contains column names.skip = 0number of lines to skip before reading the data. Default is0, meaning no lines are skipped.
read_delim(f_name, delim = "\t") read tab separated values.
read_tsv() read tab separated values.
Read comma separated values.
readr::read_csv(
f_name,
na = c("..", NA, ""),
locale = locale(encoding = "UTF-8"),
col_types = cols(Date = col_date(format = "%m/%d/%y"))
)col_typesspecify column types. Could be created bylist()orcols().read_csvwill automatically guess, if you don’t explicitly specify column types. You can override column types by providing the argumentcol_types. You don’t need to provide all column types, just the ones you want to override.By default, reading a file without a column specification will print a message showing what
readrguessed they were. To remove this message,- set
show_col_types = FALSEfor one time setting, or - set
options(readr.show_col_types = FALSE)for the current sessions’ global options setting. If want to change permanently everytime when R starts, putoptions(readr.show_col_types = FALSE)in.Rprofileas global options.
- set
read_csv2(f_name, na = c("..", NA, "")) use semicolon ; to separate values; and use comma , for the decimal point. This is common in some European countries.
localeThe locale controls defaults that vary from place to place. The default locale is US-centric (like R), but you can uselocale()to create your own locale that controls things like the default time zone, encoding, decimal mark, big mark, and day/month names.locale(date_names = "en", date_format = "%AD", time_format = "%AT", decimal_mark = ".", grouping_mark = ",", tz = "UTC", encoding = "UTF-8", asciify = FALSE)decimal_markindicate the decimal place, can only be,or.encodingThis only affects how the file is read - readr always converts the output to UTF-8.
5.1.2 Write Data
Save data in uft8 encoding with special language characters
write_excel_csv() include a UTF-8 Byte order mark which indicates to Excel the csv is UTF-8 encoded.
write.csv(x, f_name, row.names=TRUE, fileEncoding ="UTF-8")
xa matrix or data frame. If not one of the types, it is attempted to coercexto a data frame.write_csv(x)xcan only be data frame or tibble. Doesn’t support matrix.
row.nameswhether to write row names ofx. Defaults toTRUE.
flextable
flextable package create tables for reporting and publications.
The main function is flextable which takes a data.frame as argument and returns a flextable. If you are using RStudio or another R GUI, the table will be displayed in the Viewer panel or in your default browser.
The package provides a set of functions to easily create some tables from others objects.
The as_flextable() function is used to transform specific objects into flextable objects. For example, you can transform a crosstab produced with the ‘tables’ package into a flextable which can then be formatted, annotated or augmented with footnotes.