8.2 Process Data
Download stock data using quantmod::getSymbols
getSymbols
loads data in the current environment using the symbol/tick of the asset.
Returned data will have column names symbol.Open
, symbol.High
, symbol.Low
, symbol.Close
, symbol.Volume
, and symbol.Adjusted
.
# Stock prices
stock_prices <- getSymbols(c("SPY", "AAPL"),
src = 'yahoo',
from="2014-12-01", # add an extra month before the start date such that
to="2023-12-31", # the 1st month has non-NA return data;
# Note: need to start from "12-01";
auto.assign=TRUE # return 2 "xts" object in the current env: "SPY" and "AAPL"
)
> stock_prices # returned value is a vector of tickers
[1] "SPY" "AAPL"
# Treasury Bill
Tbill_1m <- getSymbols("DGS1MO", src="FRED", auto.assign=FALSE)
# Market index: OSEBX
mkt_idx_prices <- getSymbols("OSEBX.OL",
from="2014-12-01",
to="2023-12-31",
src="yahoo",
auto.assign = FALSE)
mkt_idx_prices
# from xts to tibble or data.frame
mkt_idx_prices <- data.frame(mkt_idx_prices) %>%
rownames_to_column(var = "Date") %>% # this converts Date to a string
mutate(Date = ymd(Date)) # convert back to Date type
mkt_idx_prices %>% str()
mkt_idx_prices %>% head()
mkt_idx_prices %>% tail()
# for index, adjusted is equal to closing prices
with(mkt_idx_prices, all.equal(OSEBX.OL.Adjusted, OSEBX.OL.Close))
f_name <- "data/market/OSEBX_prices_2015-2023_daily.csv"
write_csv(mkt_idx_prices, f_name)
mkt_idx_prices <- read_csv(f_name)
src
data sources. Options:yahoo
,google
,MySQL
,FRED
,csv
,RData
, andOanda
. Defaults toyahoo
.yahoo
returns 6 columns: OHLC, adjusted price, and volume.google
returns OHLC and volume.FRED
interest rates and other economic series data for US, including• CPIAUCSL (CPI) • POP (Population) • DNDGRA3M086SBEA (Real Consumption) • INDPRO (Industrial Production) • OILPRICE • BAA • DTB3 (3 month T-bills) • DGS10 (10 year Treasuries)
• UNRATE (unemployment rate)
To find the series name, refer to FRED’s website for all available indicators.
Oanda
The Currency Site (FX and Metals)Before doing any analysis you must always check the data to ensure quality.
Do not assume that because you are getting it from a source such as Yahoo! or Google that it is clean.
auto.assgin
Defaults toTrue
, data are loaded silently to the current environment, i.e., the workspace.- If
FALSE
, need to assign the returned results to a variable. Note that only one symbol at a time may be requested when auto assignment is disabled. - Objects loaded by
getSymbols
withauto.assign=TRUE
can be viewed withshowSymbols
and removed by a call toremoveSymbols
.
- If
env = globalenv()
where to create objects. Defaults to the global environment.Setting
env=NULL
is equal toauto.assign=FALSE
.Alternatively, you can create a separate environment to store the downloaded data.
# create a new env called `sp500` sp500 <- new.env() # save the S&P 500 (symbol:^GSPC) to `sp500` getSymbols("^GSPC", env = sp500, src = "yahoo", from = as.Date("1960-01-04"), to = as.Date("2009-01-01"))
To load the variable GSPC from the environment sp500 to a variable in the global environment (also known as the workspace), three options:
periodicity="daily"
periodicity of data to query and return. Defaults to “daily”.Must be one of “daily”, “weekly”, “monthly”.
Download index components data
download a csv file containing all company symbols and names.
Download data
By using the command
tryCatch
we handle unusual conditions, including errors and warnings.In this case, if the data from a company are not available from yahoo finance, the message
Symbol ... not downloadable!
is given.(For simplicity, we only download the symbols starting with
A
.)nasdaq <- new.env() for(i in nasdaq100$Symbol[startsWith(nasdaq100$Symbol, "A")]) { cat("Downloading time series for symbol '", i, "' ...\n", sep = "") status <- tryCatch(getSymbols(i, env = nasdaq, src = "yahoo", from = as.Date("2000-01-01")), error = identity) if(inherits(status, "error")) cat("Symbol '", i, "' not downloadable!\n", sep = "") } # check AAPL time series with(nasdaq, head(AAPL)) # visualize chartSeries(nasdaq$AAPL)
Have a look at the
quantmod
homepage for further examples.See the manual of the
quantmod
package for the whole list of available plot and visualization functions.
Download only the close price
Use getSymbols()[,4]
to subset the close price column.
tickers <- c("0011.HK", "1299.HK", "1083.HK", "0823.HK", "0669.HK", "0992.HK")
portfolioPrices <- NULL
for (Ticker in tickers)
portfolioPrices <- cbind(portfolioPrices,
getSymbols(Ticker, from = "2012-09-01",
to = "2022-08-31",
periodicity = "weekly",
auto.assign=FALSE)[, 4])
colnames(portfolioPrices) <- c("HSBC", "AIA", "TG", "LinkReit", "Techronic", "Lenovo")
portfolioPrices
Technical Indicators
https://bookdown.org/kochiuyu/technical-analysis-with-r-second-edition2/technical-indicators.html
8.2.1 tidyquant
Useful resources:
https://www.tidy-pm.com/s-2data
Nice thing about tidyquant
is that it works directly with tibble
, making it work seamlessly with tidyverse
. This means we can:
- Seamlessly scale data retrieval and mutations
- Use the pipe (
%>%
) for chaining operations - Use
dplyr
andtidyr
:select
,filter
,group_by
,nest
/unnest
,spread
/gather
, etc - Use
purrr
: mapping functions withmap
tq_get(x, get, from, to)
get trading data, such as OHLC, and return as tibble
.
x
A single character string, a character vector or tibble representing a single (or multiple) stock symbol, metal symbol, currency combination, FRED code, etc.get
A character string representing the type of data to get forx
. Possible options:"stock.prices"
: Get the open, high, low, close, volume and adjusted stock prices for a stock symbol from Yahoo Finance (https://finance.yahoo.com/). Wrapper forquantmod::getSymbols()
."dividends"
: Get the dividends for a stock symbol from Yahoo Finance (https://finance.yahoo.com/). Wrapper forquantmod::getDividends()
."splits"
: Get the split ratio for a stock symbol from Yahoo Finance (https://finance.yahoo.com/). Wrapper forquantmod::getSplits()
.tq_get_options()
returns a list of validget
options you can choose from.
- Use
from
andto
to specify the period of interest.
tq_get("AAPL",get = "stock.prices")
# A tibble: 2,687 × 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2014-01-02 19.8 19.9 19.7 19.8 234684800 17.3
2 AAPL 2014-01-03 19.7 19.8 19.3 19.3 392467600 16.9
3 AAPL 2014-01-06 19.2 19.5 19.1 19.4 412610800 17.0
4 AAPL 2014-01-07 19.4 19.5 19.2 19.3 317209200 16.8
5 AAPL 2014-01-08 19.2 19.5 19.2 19.4 258529600 17.0
6 AAPL 2014-01-09 19.5 19.5 19.1 19.2 279148800 16.7
7 AAPL 2014-01-10 19.3 19.3 19.0 19.0 304976000 16.6
8 AAPL 2014-01-13 18.9 19.4 18.9 19.1 378492800 16.7
9 AAPL 2014-01-14 19.2 19.5 19.2 19.5 332561600 17.0
10 AAPL 2014-01-15 19.8 20.0 19.7 19.9 391638800 17.4
# ℹ 2,677 more rows
# ℹ Use `print(n = ...)` to see more rows
# get Facebook data for the past five years
from = today() - years(5)
Stocks <- tq_get("FB", get = "stock.prices", from = from)
Stocks
Mutiple stocks
# get historical data for multiple stocks. e.g. GAFA
tq_get(c("GOOGL","AMZN","FB","AAPL"), get="stock.prices")
tq_index(x)
returns the stock symbols and various attributes for every stock in an index or exchange. Eighteen indexes and three exchanges are available.
tq_index_options()
returns a list of stock indexes you can choose from.
tq_exchange(x)
Get all stocks in a stock exchange in tibble
format.
tq_exchange_options()
returns a list of stock exchanges you can choose from. The options are AMEX
, NASDAQ
and NYSE
.
tq_index("SP500")
# A tibble: 504 × 8
symbol company identifier sedol weight sector shares_held local_currency
<chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
1 AAPL APPLE INC 037833100 2046251 0.0686 - 171416583 USD
2 MSFT MICROSOFT CORP 594918104 2588173 0.0654 - 88389092 USD
3 NVDA NVIDIA CORP 67066G104 2379504 0.0563 - 292528720 USD
4 AMZN AMAZON.COM INC 023135106 2000019 0.0342 - 108870351 USD
5 META META PLATFORMS INC CLASS A 30303M102 B7TL820 0.0242 - 26074570 USD
6 GOOGL ALPHABET INC CL A 02079K305 BYVY8G0 0.0198 - 69889995 USD
7 BRK-B BERKSHIRE HATHAWAY INC CL B 084670702 2073390 0.0187 - 21540450 USD
8 GOOG ALPHABET INC CL C 02079K107 BYY88Y7 0.0166 - 58143623 USD
9 LLY ELI LILLY + CO 532457108 2516152 0.0163 - 9488573 USD
10 AVGO BROADCOM INC 11135F101 BDZ78H9 0.0145 - 51827325 USD
# ℹ 494 more rows
# ℹ Use `print(n = ...)` to see more rows
## This takes forever to run ...
sp_500 <- tq_index("SP500") %>%
tq_get(get = "stock.prices")
sp_500
> dim(sp_500)
[1] 1310963 15
# tq_index loads data for the last 10 years
sp_500$date %>% unique() %>% head()
[1] "2014-01-02" "2014-01-03" "2014-01-06" "2014-01-07" "2014-01-08" "2014-01-09"
sp_500$date %>% unique() %>% tail()
[1] "2024-08-28" "2024-08-29" "2024-08-30" "2024-09-03" "2024-09-04" "2024-09-05"
tq_transmute_fun_options()
to see which functions are available
Calculate monthly return.
# calculate monthly return of single stock
tq_get(c("GOOGL"), get="stock.prices") %>%
tq_transmute(select = adjusted,
mutate_fun = periodReturn,
period = "monthly",
col_rename = "monthly_return")
Plot closing price.
Group by and perform operations on individual stocks
from daily to monthly data
calculate monthly returns.
stats::ts(data, start, frequency)
base R function for time series.
# quarterly data
> ts(1:10, start = c(1959, 2), frequency = 4) # 2nd Quarter of 1959
Qtr1 Qtr2 Qtr3 Qtr4
1959 1 2 3
1960 4 5 6 7
1961 8 9 10
# monthly data
> ts(cumsum(1 + round(rnorm(18), 2)), start = c(1954, 7), frequency = 12)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1954 1.63 2.67 3.21 4.42 4.80 6.56
1955 7.51 8.53 10.12 10.81 12.23 12.52 13.89 13.90 14.75 15.58 16.37 19.34
Difference with zoo
ts
objects are regularly spaced and have numeric times and are good for months and quarters whereaszoo
objects can be irregularly spaced and can use most common index classes.
8.2.2 From tibble to xts
xts(x, order.by)
x
can only contain values in a matrix or atomic vector which places constraints on the values that can be used (generally numeric, but necessarily all of a single mode, i.e., not a mix of numeric and character values)
If Date
is the 1st column
From xts
to tibble
using
data.frame
.Date
will be converted tochr
first, then need to convert back toDate
.using
as_tibble()
.Date
will remain to beDate
.fortify.zoo
takes a zoo object and converts it into a data frame.
Fill missing values
It is very common for daily prices to have missing values. The common practice is to fill missing values using the last non-NA observation.
Subset a period
window(x, start='YYYY-MM-DD', end='YYYY-MM-DD')
extract a subperiod window.
x[between(index(x), start=ymd('YYYY-MM-DD'), end=('YYYY-MM-DD') ), ]
can also be used to subset a subperiod.
To get all observations in March 1970:
It is also possible to specify a range of timestamps using ‘/’ as the range separator, where both endpoints are optional: e.g.,
Change column names
This can be done easily using either names
, colnames
, or setNames
.
my_xts <- with(reg_data, xts(AdjustedPrice, order.by = Date))
my_xts
colnames(my_xts) <- "AdjustedPrice" # opt1
names(my_xts) <- "AdjustedPrice" # opt2
my_xts <- my_xts %>% setNames("AdjustedPrice") # opt3 can be used in a pipe sequence
my_xts
apply.monthly(x, FUN=colSums)
apply one function periodically per column.
- Note that
colSums
is used when calculating the sum of observations per period. - If
FUN=sum
, thensum
is not only applied to each time window, but also the sum of all columns is calculated.
Add new column to xts
: xts$new_column <- col_data
.
Not convenient to do operations on xts
. First convert to data.frame
, then do operations as usual.
to.monthly(x, indexAt='yearmon', name=NULL, OHLC = TRUE, ...)
indexAt
Convert final index to new class or date. Can be set to one of the following:Option Meaning yearmon
The final index will then be yearmon
yearqtr
The final index will then be yearqtr
firstof
the first time of the period lastof
the last time of the period startof
the starting time in the data for that period endof
the ending time in the data for that period OHLC
If an OHLC object should be returned.
prices <- prices %>% na.locf() # fill missing values in daily prices
prices_monthly <- prices %>% to.monthly(indexAt = "last", OHLC = FALSE)
# alternative ways to achieve the same results as the last line
prices_monthly <- prices %>% xts::apply.monthly(last)
prices_monthly <- prices[xts::endpoints(prices, on="months"), ]
head(prices_monthly)
SPY EFA IJS EEM AGG
2012-12-31 127.7356 48.20629 74.81863 39.63340 97.98471
2013-01-31 134.2744 50.00364 78.82265 39.51723 97.37608
2013-02-28 135.9876 49.35931 80.10801 38.61464 97.95142
2013-03-28 141.1512 50.00364 83.39879 38.22143 98.04794
2013-04-30 143.8630 52.51315 83.50081 38.68614 98.99760
2013-05-31 147.2596 50.92775 87.08048 36.81840 97.01658
- Note that
to.monthly
removes rows with missing values; be cautious with that.
endpoints(prices, on="months")
Extract index locations for an xts
object that correspond to the last observation in each period specified by on
.
- Alternatively, one could use
prices %>% apply.monthly(last)
which takes the last day of each month in the time series. Data for all months is returned including those with NA in some of the time series.
period.apply(samplexts, INDEX = endpoints(samplexts, on = "months"), FUN = mean, ...)
Apply a function periodically.
...
Additional arguments forFUN
.
apply.monthly(samplexts, mean)
This has the same results as the code above.
quantmod::monthlyReturn(x, subset=NULL, type='arithmetic', leading=TRUE, ...)
Given a set of prices, return periodic returns.
subset
an xts/ISO8601 style subset string.type
type of returns: arithmetic (discrete) or log (continuous).leading
should incomplete leading period returns be returned
Now we’ll call PerformanceAnalytics::Return.calculate(prices_monthly, method = "log")
to convert to returns and save as an object called asset_returns_xts
.
Note this will give us log returns by the
method = "log"
argument, \(z_t = \Delta \ln P_t = \ln P_t-\ln P_{t-1}=\ln\frac{P_t}{P_{t-1}}\).We could have used
method = "discrete"
to get simple returns, \(r_t = \frac{P_t}{P_{t-1}}-1\). This is the default value.Relationship between \(z_t\) and \(r_t\): \[ \begin{align*} \ln(1+r_t)=z_t \end{align*} \]
asset_returns_xts <- na.omit(Return.calculate(prices_monthly, method = "log"))
head(asset_returns_xts)
SPY EFA IJS EEM AGG
2013-01-31 0.04992311 0.03660641 0.052133484 -0.002935494 -0.0062309021
2013-02-28 0.01267821 -0.01296938 0.016175381 -0.023105250 0.0058910464
2013-03-28 0.03726766 0.01296938 0.040257940 -0.010235048 0.0009849727
2013-04-30 0.01903006 0.04896773 0.001222544 0.012085043 0.0096390038
2013-05-31 0.02333571 -0.03065563 0.041976371 -0.049483592 -0.0202136957
2013-06-28 -0.01343432 -0.02715331 -0.001402974 -0.054739116 -0.0157787232
prices[endpoints(prices, on="months"), ]
converts daily to monthly prices.
endpoints(x, on="month")
x
anxts
objecton
retrieve the last observation of each period. Supported periods include: “us” (microseconds), “microseconds”, “ms” (milliseconds), “milliseconds”, “secs” (seconds), “seconds”, “mins” (minutes), “minutes”, “hours”, “days”, “weeks”, “months”, “quarters”, and “years”.
Calculate returns by tidyverse
lag <- dplyr::lag # have to use dplyr::lag, base R lag has problems
returns <- prices_monthly %>%
group_by(ISIN) %>%
mutate(delta.P = c(NA, diff(AdjustedPrice)), # fill the first obs with NA
lag.P = lag(AdjustedPrice),
Return = delta.P/lag.P,
Return2 = AdjustedPrice/lag(AdjustedPrice)-1,
Return_log = log(AdjustedPrice)-lag(log(AdjustedPrice))
)
Note:
- Base R
lag
anddiff
works perfect withxts
, but not ideal for groupped tibbles in tidyverse. - Be careful whenever call
lag
, better to print check if you get the correct lag as the function from different packages has differing features and output.