8.2 Process Data
Download stock data using quantmod::getSymbols
getSymbols loads data in the current environment using the symbol/tick of the asset.
Returned data will have column names symbol.Open, symbol.High, symbol.Low, symbol.Close, symbol.Volume, and symbol.Adjusted.
# Stock prices
stock_prices <- getSymbols(c("SPY", "AAPL"),
src = 'yahoo',
from="2014-12-01", # add an extra month before the start date such that
to="2023-12-31", # the 1st month has non-NA return data;
# Note: need to start from "12-01";
auto.assign=TRUE # return 2 "xts" object in the current env: "SPY" and "AAPL"
)
> stock_prices # returned value is a vector of tickers
[1] "SPY" "AAPL"
# Treasury Bill
Tbill_1m <- getSymbols("DGS1MO", src="FRED", auto.assign=FALSE)
# Market index: OSEBX
mkt_idx_prices <- getSymbols("OSEBX.OL",
from="2014-12-01",
to="2023-12-31",
src="yahoo",
auto.assign = FALSE)
mkt_idx_prices
# from xts to tibble or data.frame
mkt_idx_prices <- data.frame(mkt_idx_prices) %>%
rownames_to_column(var = "Date") %>% # this converts Date to a string
mutate(Date = ymd(Date)) # convert back to Date type
mkt_idx_prices %>% str()
mkt_idx_prices %>% head()
mkt_idx_prices %>% tail()
# for index, adjusted is equal to closing prices
with(mkt_idx_prices, all.equal(OSEBX.OL.Adjusted, OSEBX.OL.Close))
f_name <- "data/market/OSEBX_prices_2015-2023_daily.csv"
write_csv(mkt_idx_prices, f_name)
mkt_idx_prices <- read_csv(f_name)srcdata sources. Options:yahoo,google,MySQL,FRED,csv,RData, andOanda. Defaults toyahoo.yahooreturns 6 columns: OHLC, adjusted price, and volume.googlereturns OHLC and volume.FREDinterest rates and other economic series data for US, including• CPIAUCSL (CPI) • POP (Population) • DNDGRA3M086SBEA (Real Consumption) • INDPRO (Industrial Production) • OILPRICE • BAA • DTB3 (3 month T-bills) • DGS10 (10 year Treasuries)
• UNRATE (unemployment rate)
To find the series name, refer to FRED’s website for all available indicators.
OandaThe Currency Site (FX and Metals)Before doing any analysis you must always check the data to ensure quality.
Do not assume that because you are getting it from a source such as Yahoo! or Google that it is clean.
auto.assginDefaults toTrue, data are loaded silently to the current environment, i.e., the workspace.- If
FALSE, need to assign the returned results to a variable. Note that only one symbol at a time may be requested when auto assignment is disabled. - Objects loaded by
getSymbolswithauto.assign=TRUEcan be viewed withshowSymbolsand removed by a call toremoveSymbols.
- If
env = globalenv()where to create objects. Defaults to the global environment.Setting
env=NULLis equal toauto.assign=FALSE.Alternatively, you can create a separate environment to store the downloaded data.
# create a new env called `sp500` sp500 <- new.env() # save the S&P 500 (symbol:^GSPC) to `sp500` getSymbols("^GSPC", env = sp500, src = "yahoo", from = as.Date("1960-01-04"), to = as.Date("2009-01-01"))To load the variable GSPC from the environment sp500 to a variable in the global environment (also known as the workspace), three options:
periodicity="daily"periodicity of data to query and return. Defaults to “daily”.Must be one of “daily”, “weekly”, “monthly”.
Download index components data
download a csv file containing all company symbols and names.
Download data
By using the command
tryCatchwe handle unusual conditions, including errors and warnings.In this case, if the data from a company are not available from yahoo finance, the message
Symbol ... not downloadable!is given.(For simplicity, we only download the symbols starting with
A.)nasdaq <- new.env() for(i in nasdaq100$Symbol[startsWith(nasdaq100$Symbol, "A")]) { cat("Downloading time series for symbol '", i, "' ...\n", sep = "") status <- tryCatch(getSymbols(i, env = nasdaq, src = "yahoo", from = as.Date("2000-01-01")), error = identity) if(inherits(status, "error")) cat("Symbol '", i, "' not downloadable!\n", sep = "") } # check AAPL time series with(nasdaq, head(AAPL)) # visualize chartSeries(nasdaq$AAPL)Have a look at the
quantmodhomepage for further examples.See the manual of the
quantmodpackage for the whole list of available plot and visualization functions.
Download only the close price
Use getSymbols()[,4] to subset the close price column.
tickers <- c("0011.HK", "1299.HK", "1083.HK", "0823.HK", "0669.HK", "0992.HK")
portfolioPrices <- NULL
for (Ticker in tickers)
portfolioPrices <- cbind(portfolioPrices,
getSymbols(Ticker, from = "2012-09-01",
to = "2022-08-31",
periodicity = "weekly",
auto.assign=FALSE)[, 4])
colnames(portfolioPrices) <- c("HSBC", "AIA", "TG", "LinkReit", "Techronic", "Lenovo")
portfolioPricesTechnical Indicators
https://bookdown.org/kochiuyu/technical-analysis-with-r-second-edition2/technical-indicators.html
8.2.1 tidyquant
Useful resources:
https://www.tidy-pm.com/s-2data
Nice thing about tidyquant is that it works directly with tibble, making it work seamlessly with tidyverse. This means we can:
- Seamlessly scale data retrieval and mutations
- Use the pipe (
%>%) for chaining operations - Use
dplyrandtidyr:select,filter,group_by,nest/unnest,spread/gather, etc - Use
purrr: mapping functions withmap
tq_get(x, get, from, to) get trading data, such as OHLC, and return as tibble.
xA single character string, a character vector or tibble representing a single (or multiple) stock symbol, metal symbol, currency combination, FRED code, etc.getA character string representing the type of data to get forx. Possible options:"stock.prices": Get the open, high, low, close, volume and adjusted stock prices for a stock symbol from Yahoo Finance (https://finance.yahoo.com/). Wrapper forquantmod::getSymbols()."dividends": Get the dividends for a stock symbol from Yahoo Finance (https://finance.yahoo.com/). Wrapper forquantmod::getDividends()."splits": Get the split ratio for a stock symbol from Yahoo Finance (https://finance.yahoo.com/). Wrapper forquantmod::getSplits().tq_get_options()returns a list of validgetoptions you can choose from.
- Use
fromandtoto specify the period of interest.
tq_get("AAPL",get = "stock.prices")
# A tibble: 2,687 × 8
symbol date open high low close volume adjusted
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 AAPL 2014-01-02 19.8 19.9 19.7 19.8 234684800 17.3
2 AAPL 2014-01-03 19.7 19.8 19.3 19.3 392467600 16.9
3 AAPL 2014-01-06 19.2 19.5 19.1 19.4 412610800 17.0
4 AAPL 2014-01-07 19.4 19.5 19.2 19.3 317209200 16.8
5 AAPL 2014-01-08 19.2 19.5 19.2 19.4 258529600 17.0
6 AAPL 2014-01-09 19.5 19.5 19.1 19.2 279148800 16.7
7 AAPL 2014-01-10 19.3 19.3 19.0 19.0 304976000 16.6
8 AAPL 2014-01-13 18.9 19.4 18.9 19.1 378492800 16.7
9 AAPL 2014-01-14 19.2 19.5 19.2 19.5 332561600 17.0
10 AAPL 2014-01-15 19.8 20.0 19.7 19.9 391638800 17.4
# ℹ 2,677 more rows
# ℹ Use `print(n = ...)` to see more rows
# get Facebook data for the past five years
from = today() - years(5)
Stocks <- tq_get("FB", get = "stock.prices", from = from)
StocksMutiple stocks
# get historical data for multiple stocks. e.g. GAFA
tq_get(c("GOOGL","AMZN","FB","AAPL"), get="stock.prices")tq_index(x) returns the stock symbols and various attributes for every stock in an index or exchange. Eighteen indexes and three exchanges are available.
tq_index_options() returns a list of stock indexes you can choose from.
tq_exchange(x) Get all stocks in a stock exchange in tibble format.
tq_exchange_options() returns a list of stock exchanges you can choose from. The options are AMEX, NASDAQ and NYSE.
tq_index("SP500")
# A tibble: 504 × 8
symbol company identifier sedol weight sector shares_held local_currency
<chr> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
1 AAPL APPLE INC 037833100 2046251 0.0686 - 171416583 USD
2 MSFT MICROSOFT CORP 594918104 2588173 0.0654 - 88389092 USD
3 NVDA NVIDIA CORP 67066G104 2379504 0.0563 - 292528720 USD
4 AMZN AMAZON.COM INC 023135106 2000019 0.0342 - 108870351 USD
5 META META PLATFORMS INC CLASS A 30303M102 B7TL820 0.0242 - 26074570 USD
6 GOOGL ALPHABET INC CL A 02079K305 BYVY8G0 0.0198 - 69889995 USD
7 BRK-B BERKSHIRE HATHAWAY INC CL B 084670702 2073390 0.0187 - 21540450 USD
8 GOOG ALPHABET INC CL C 02079K107 BYY88Y7 0.0166 - 58143623 USD
9 LLY ELI LILLY + CO 532457108 2516152 0.0163 - 9488573 USD
10 AVGO BROADCOM INC 11135F101 BDZ78H9 0.0145 - 51827325 USD
# ℹ 494 more rows
# ℹ Use `print(n = ...)` to see more rows
## This takes forever to run ...
sp_500 <- tq_index("SP500") %>%
tq_get(get = "stock.prices")
sp_500
> dim(sp_500)
[1] 1310963 15
# tq_index loads data for the last 10 years
sp_500$date %>% unique() %>% head()
[1] "2014-01-02" "2014-01-03" "2014-01-06" "2014-01-07" "2014-01-08" "2014-01-09"
sp_500$date %>% unique() %>% tail()
[1] "2024-08-28" "2024-08-29" "2024-08-30" "2024-09-03" "2024-09-04" "2024-09-05"tq_transmute_fun_options() to see which functions are available
Calculate monthly return.
# calculate monthly return of single stock
tq_get(c("GOOGL"), get="stock.prices") %>%
tq_transmute(select = adjusted,
mutate_fun = periodReturn,
period = "monthly",
col_rename = "monthly_return")Plot closing price.
Group by and perform operations on individual stocks
from daily to monthly data
calculate monthly returns.
stats::ts(data, start, frequency) base R function for time series.
# quarterly data
> ts(1:10, start = c(1959, 2), frequency = 4) # 2nd Quarter of 1959
Qtr1 Qtr2 Qtr3 Qtr4
1959 1 2 3
1960 4 5 6 7
1961 8 9 10
# monthly data
> ts(cumsum(1 + round(rnorm(18), 2)), start = c(1954, 7), frequency = 12)
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1954 1.63 2.67 3.21 4.42 4.80 6.56
1955 7.51 8.53 10.12 10.81 12.23 12.52 13.89 13.90 14.75 15.58 16.37 19.34Difference with zoo
tsobjects are regularly spaced and have numeric times and are good for months and quarters whereaszooobjects can be irregularly spaced and can use most common index classes.
8.2.2 From tibble to xts
xts(x, order.by) x can only contain values in a matrix or atomic vector which places constraints on the values that can be used (generally numeric, but necessarily all of a single mode, i.e., not a mix of numeric and character values)
If Date is the 1st column
From xts to tibble
using
data.frame.Datewill be converted tochrfirst, then need to convert back toDate.using
as_tibble().Datewill remain to beDate.fortify.zootakes a zoo object and converts it into a data frame.
Fill missing values
It is very common for daily prices to have missing values. The common practice is to fill missing values using the last non-NA observation.
Subset a period
window(x, start='YYYY-MM-DD', end='YYYY-MM-DD') extract a subperiod window.
x[between(index(x), start=ymd('YYYY-MM-DD'), end=('YYYY-MM-DD') ), ] can also be used to subset a subperiod.
To get all observations in March 1970:
It is also possible to specify a range of timestamps using ‘/’ as the range separator, where both endpoints are optional: e.g.,
Change column names
This can be done easily using either names , colnames , or setNames .
my_xts <- with(reg_data, xts(AdjustedPrice, order.by = Date))
my_xts
colnames(my_xts) <- "AdjustedPrice" # opt1
names(my_xts) <- "AdjustedPrice" # opt2
my_xts <- my_xts %>% setNames("AdjustedPrice") # opt3 can be used in a pipe sequence
my_xtsapply.monthly(x, FUN=colSums) apply one function periodically per column.
- Note that
colSumsis used when calculating the sum of observations per period. - If
FUN=sum, thensumis not only applied to each time window, but also the sum of all columns is calculated.
Add new column to xts: xts$new_column <- col_data.
Not convenient to do operations on xts. First convert to data.frame, then do operations as usual.
to.monthly(x, indexAt='yearmon', name=NULL, OHLC = TRUE, ...)
indexAtConvert final index to new class or date. Can be set to one of the following:Option Meaning yearmonThe final index will then be yearmonyearqtrThe final index will then be yearqtrfirstofthe first time of the period lastofthe last time of the period startofthe starting time in the data for that period endofthe ending time in the data for that period OHLCIf an OHLC object should be returned.
prices <- prices %>% na.locf() # fill missing values in daily prices
prices_monthly <- prices %>% to.monthly(indexAt = "last", OHLC = FALSE)
# alternative ways to achieve the same results as the last line
prices_monthly <- prices %>% xts::apply.monthly(last)
prices_monthly <- prices[xts::endpoints(prices, on="months"), ]
head(prices_monthly)
SPY EFA IJS EEM AGG
2012-12-31 127.7356 48.20629 74.81863 39.63340 97.98471
2013-01-31 134.2744 50.00364 78.82265 39.51723 97.37608
2013-02-28 135.9876 49.35931 80.10801 38.61464 97.95142
2013-03-28 141.1512 50.00364 83.39879 38.22143 98.04794
2013-04-30 143.8630 52.51315 83.50081 38.68614 98.99760
2013-05-31 147.2596 50.92775 87.08048 36.81840 97.01658- Note that
to.monthlyremoves rows with missing values; be cautious with that.
endpoints(prices, on="months") Extract index locations for an xts object that correspond to the last observation in each period specified by on.
- Alternatively, one could use
prices %>% apply.monthly(last)which takes the last day of each month in the time series. Data for all months is returned including those with NA in some of the time series.
period.apply(samplexts, INDEX = endpoints(samplexts, on = "months"), FUN = mean, ...) Apply a function periodically.
...Additional arguments forFUN.
apply.monthly(samplexts, mean) This has the same results as the code above.
quantmod::monthlyReturn(x, subset=NULL, type='arithmetic', leading=TRUE, ...) Given a set of prices, return periodic returns.
subsetan xts/ISO8601 style subset string.typetype of returns: arithmetic (discrete) or log (continuous).leadingshould incomplete leading period returns be returned
Now we’ll call PerformanceAnalytics::Return.calculate(prices_monthly, method = "log") to convert to returns and save as an object called asset_returns_xts.
Note this will give us log returns by the
method = "log"argument, \(z_t = \Delta \ln P_t = \ln P_t-\ln P_{t-1}=\ln\frac{P_t}{P_{t-1}}\).We could have used
method = "discrete"to get simple returns, \(r_t = \frac{P_t}{P_{t-1}}-1\). This is the default value.Relationship between \(z_t\) and \(r_t\): \[ \begin{align*} \ln(1+r_t)=z_t \end{align*} \]
asset_returns_xts <- na.omit(Return.calculate(prices_monthly, method = "log"))
head(asset_returns_xts)
SPY EFA IJS EEM AGG
2013-01-31 0.04992311 0.03660641 0.052133484 -0.002935494 -0.0062309021
2013-02-28 0.01267821 -0.01296938 0.016175381 -0.023105250 0.0058910464
2013-03-28 0.03726766 0.01296938 0.040257940 -0.010235048 0.0009849727
2013-04-30 0.01903006 0.04896773 0.001222544 0.012085043 0.0096390038
2013-05-31 0.02333571 -0.03065563 0.041976371 -0.049483592 -0.0202136957
2013-06-28 -0.01343432 -0.02715331 -0.001402974 -0.054739116 -0.0157787232prices[endpoints(prices, on="months"), ] converts daily to monthly prices.
endpoints(x, on="month")
xanxtsobjectonretrieve the last observation of each period. Supported periods include: “us” (microseconds), “microseconds”, “ms” (milliseconds), “milliseconds”, “secs” (seconds), “seconds”, “mins” (minutes), “minutes”, “hours”, “days”, “weeks”, “months”, “quarters”, and “years”.
Calculate returns by tidyverse
lag <- dplyr::lag # have to use dplyr::lag, base R lag has problems
returns <- prices_monthly %>%
group_by(ISIN) %>%
mutate(delta.P = c(NA, diff(AdjustedPrice)), # fill the first obs with NA
lag.P = lag(AdjustedPrice),
Return = delta.P/lag.P,
Return2 = AdjustedPrice/lag(AdjustedPrice)-1,
Return_log = log(AdjustedPrice)-lag(log(AdjustedPrice))
) Note:
- Base R
laganddiffworks perfect withxts, but not ideal for groupped tibbles in tidyverse. - Be careful whenever call
lag, better to print check if you get the correct lag as the function from different packages has differing features and output.