Thursday, September 20, 2012

Cointegration in R

Essentially, it seeks to find stationary linear combinations of the two vectors
test two series for integration and return the p-value indicating the likelihood of correlation

This runs an augmented Dickey-Fuller test and will return a p-value indicating whether the series are mean-reverting or not. You can use the typical p-value as a test of significance if you like(ie, a p-value below .05 indicates a mean-reverting spread), or you can use an alternate value. This assumes that your two series were observed at the same time points.

gld <- read.csv("http://ichart.finance.yahoo.com/table.csv?s=6947.KL&ignore=.csv", stringsAsFactors=F)
gdx <- read.csv("http://ichart.finance.yahoo.com/table.csv?s=6012.KL&ignore=.csv", stringsAsFactors=F)

Take the intersection of the two zoo objects.  That will create one zoo object with the observations common to both datasets.

#The seventh column contains the adjusted close.
#The first column contains dates.

gld <- zoo(gld[,7], as.Date(gld[,1]))
gdx <- zoo(gdx[,7], as.Date(gdx[,1]))

The merge function can combine two zoo objects. so we merge them.
t.zoo <- merge(gld, gdx, all=FALSE)

# At this point, t.zoo is a zoo object with two columns: gld and gdx.
# Most statistical functions expect a data frame for input,
# so we create a data frame here.
#
t <- as.data.frame(t.zoo)


First we construct the spread, then we test the spread for a unit root. 

It the spread has a root inside the unit circle, the underlying securities are cointegrated.

The lm function builds linear regression models using  ordinary least squares(OLS).
# We build the linear model, m, forcing a zero intercept,
# then we extract the model's first regression coefficient.
m <- lm(gld ~ gdx + 0, data=t)
beta <- coef(m)[1]
cat("Assumed hedge ratio is", beta, "\n")

sprd <- t$gld - beta*t$gdx

The Augmented Dickey-Fuller test is a basic statistical test for a unit root, and several R packages implement that test.  Here, we will use the adf.test function which is implemented in the tseries package.  The function returns an object which contains the test results. In particular, it contains the p-value that we want.
library(tseries)            # Load the tseries package

# Setting alternative="stationary" chooses the appropriate test.
# Setting k=0 forces a basic (not augmented) test. 
ht <- adf.test(sprd, alternative="stationary", k=0)
cat("ADF p-value is", ht$p-value, "\n")

The adf.test function essentially detrends your data before testing for stationarity.
If your data contains a strong trend consider the  fUnitRoots package which contains the adfTest function

if (ht$p.value < 0.05) {
    cat("The spread is likely mean-reverting.\n")
} else {
    cat("The spread is not mean-reverting.\n")
}

gld <- read.csv("http://ichart.finance.yahoo.com/table.csv?s=6947.KL&ignore=.csv", stringsAsFactors=F)

SUMMARY


gld <- read.csv("http://ichart.finance.yahoo.com/table.csv?s=6012.KL&ignore=.csv", stringsAsFactors=F)
gdx <- read.csv("http://ichart.finance.yahoo.com/table.csv?s=4863.KL&ignore=.csv", stringsAsFactors=F)
gld <- zoo(gld[,7], as.Date(gld[,1]))
gdx <- zoo(gdx[,7], as.Date(gdx[,1]))
t.zoo <- merge(gld, gdx, all=FALSE)
t <- as.data.frame(t.zoo)
m <- lm(gld ~ gdx + 0, data=t)
beta <- coef(m)[1]
sprd <- t$gld - beta*t$gdx
ht <- adf.test(sprd, alternative="stationary", k=0)

if (ht$p.value < 0.05) {
     cat("The spread is likely mean-reverting.\n")
 } else {
     cat("The spread is not mean-reverting.\n")
 }(sprd, alternative="stationary", k=0)

No comments:

Post a Comment