Tuesday, September 25, 2012

Intelligent stock market trading using Piotroski F-Score

The Basic of Piotroski F-Score
The scoring is a binary scoring from 9 items taken from a companies financial statement.
its not easy to come up with these items as many companies dont publish or if they do you have to pay to see or do alot of googling to research.

Im using this website: http://quotes.wsj.com

simply search for a quote to open up the page. im searching for DIGI malaysia with the following result page: http://quotes.wsj.com/MY/XKLS/DIGI?mod=DNH_S_cq

Im looking at the financial tab:


these are the items you need to find before scoring:
A. Net profit or Net Income

B. Operating cash flow

C. Total Asset or Asset turnover

D. Total Current Asset

E. Long Term Debt

F. Gross margin

G. Current Liability


So you have all the data you need and ive shown you where to find them. Now for the 9 Items.

NOTE (REMEMBER) F-SCORE is a BINARY Scoring system


1. ROA: Net income before extraordinary items, one point awarded if positive, zero otherwise.
Item A

2. CFO: Cash Flow from Operations, one point awarded if positive, zero otherwise.
Item B

3. ΔROA: This is the net profit divided by the assets. Current year's Return on Asset less the prior year's ROA. One point awarded if positive, zero otherwise.
Item A / Item C

4. ACCRUAL: Current year's net income before extraordinary items less cash flow from operations. The indicator variable F-ACCRUAL equals one if CFO > ROA
Item B > Item A

5. ΔLEVER Decrease in liquidity. If the long term debt divided by the average assets is lower this year than the prior year, then score 1
Item E / Average Item C

6. ΔLIQUID: This variable measures the historical change in the firm's current ratio between the current and prior year. current assets divided by the current liabilities. If this year’s figure is greater than last year, score 1
Item D / Item G

7. EQ-OFFER: This indicator will equal one if the firm did not issue common equity in the year preceding current year, zero otherwise.
i think this is what it means!?!?!

8. ΔMARGIN: The firm's current gross margin ratio. An improvement in margins signifies a potential improvement in factor costs, a reduction of inventory costs, or a rise in the price of the firm's product. F-ΔMargin will equal one if ΔMARGIN is positive, zero otherwise.
Item F

9. Asset Turnover. If asset turnover this year is greater than asset turnover last year, then score 1.

 

YTL DIGI
Item 1 1 1
Item 2 1 1
Item 3 1 1
Item 4 0 0
Item 5 1 1
Item 6 1 1
Item 7 1 1
Item 8 0 0
Item 9 1 1




7 7


Final Score is just adding

After summing the results for all 9 tests, stocks with higher scores are better investments. Piotroski’s stock-picking strategy involves picking the cheapest stocks in the market, and buying those with an F-Score of 8 or 9.

Saturday, September 22, 2012

Philippines Quarterly Gross National Income 2011 onwards


Data Source : 
http://www.nscb.gov.ph/sna/default.asp 
http://www.nscb.gov.ph/sna/rev_qrtrly_GNI/1981_2010_NAP_Linked_Series%20(rev_column).xls
 
 --------------------
Gross National Income
 
--------------------- 
How the CSV looks like:

 


Using R Statistics 

 
 
gross_national_income <- read.csv("C:/Users/user/Desktop/gross_national_income.csv", header=F)
 
z = ts(gross_national_income$V3, start = c(1981, 1), frequency = 4) #frequency 4 quarters
 
#ETS
> plot(forecast(z))
> forecast(z)
 
#ARIMA
> fit <- auto.arima(z)
> fcast <- forecast(fit)
> plot(fcast)
> fcast

Trigonometric Box-Cox ARMA Trend Seasonal (TBATS)
> fit <- tbats(z)
> fcast <- forecast(fit)
> plot(fcast)
> fcast
 
 Prediction VS Actual Results
Blue Highlighted is the Prediction
Pink Highlighted is the Actual from the site
Last Column is the difference.
Values are in Philippines PESOS (i missed by 6193 PHP)
 
2011 Q1 3022107 3015914 6193
2011 Q2 3283005
3225451
57554
2011 Q3 3245533
3094875
150658
2011 Q4 3676553
3541886
134667
 
 
http://nscb.gov.ph/secstat/d_accounts.asp
 

Introduction to Quantitative Methods:Colorado.edu (Slide Summary)

Introduction to Quantitative Methods
Instructor: Elisabeth D. Root


Important notes on Time series analysis

Slides:http://www.colorado.edu/geography/class_homepages/geog_4023_s11/lectures.html

Since “noise” is not understandable, all the useful information is in the trend, seasonality, etc.
-Construct a series from simple assumptions about each of the individual components

Three typical steps in the “reduction-to-noise'' process:
-A data transformation such as taking logarithms of the data
-Removing seasonality and trend to obtain a stationary process.
-Fit a standard time series model

The “reduction-to-noise” procedure does not always proceed in a linear fashion
-One will usually jump around from one attempt after another of trying to develop each of the three components

Steps in a classical time series analysis
1.Do a time plot of the time series
2.Describe the variability of the series seen in the plot:
-Is there a trend? Is the trend in mean and variance? Or only one of them?
-Is there a seasonal pattern? What is the period?
-Is there any additional irregular variability?
3.Use time series plots to determine whether transformations are necessary
4.Transform the data if necessary
-Log or square root transforms
5.Use time plots and test statistics to determine if the series is stationary (constant mean and or variance)
6.Make the series stationary if it is not
7.Fit TS model to series and analyze residuals
8.When a good model is found, forecast the future

Terminology
-Dependence: Correlation of observations of one variable at one point in time with observations of the same variable at prior points in time
 Serial correlation or autocorrelation
-Stationarity: The mean value of the series remains constant over the time series (e.g., no systematic change in the mean, no trend)
 Also, variance should remain constant
-Differencing: data pre-processing step which de-trends the data to achieve stationarity
 Subtract each data point in a series from it’s predecessor
 Most methods in TS analysis are concerned with stationary time series
-Specification: using diagnostic tests, specifying the type of time series model to apply to the series
 Auto-regressive (AR), Moving average (MA), ARMA (combined) or ARIMA (combined integrated)
 Also could have non-linear models

A trend is a long term change in the mean and/or variance of the series
 e.g., if you computed the mean of the series at several different intervals, the mean would be different in each

If the trend is not immediately apparent (usually due to a large error component) we can identify it using a smoothing process
Once we have identified the trend we can model it: Fit a linear regression model to the data

We can remove a trend through a process called differencing
-Fit a linear/quadratic/polynomial function to the trend and subtract the fitted values from each observation
-Subtract each observation from it’s neighbor (Xt-Xt-1)

Often the whole point of modeling a trend is to create a residual series that is used for time series analysis

The simplest trend model for a linear trend
Xt = alpha + beta t + white noise

alpha + beta t = mean of the series at time t

If alpha and beta are assumed constants, the trend is called deterministic
If alpha and beta are assumed random, then the trend is stochastic


Once we have found a trend model, we can use the model to predict future values, or to detrendthe data
Detrended data = Xt-fitted trend (residuals)

Variation (increase or decrease in the series) that is annual in period
In most cases, we estimate the seasonality not to use it to put in the model used for forecasting, but simply to remove it
A logarithmic transformation will convert the series to additive seasonality:Smooth the series

Once we detrend and remove seasonality, all that is left is the random or error component

To make a series X stationary
1.Check if there is variance that changes with time
-Make variance constant with log or square root transformation
-Call the transformed data X*
2.Remove the trend in mean with regular differencing or fitting a trend line
-Call the new series X**
-The correlogram of X** should only have a few significant spikes at small lags
3.If there is a seasonal cycle left in the data, we must seasonally difference the series too
-Call the new series X***

Data:
year,month,mean,interp
1958,3,315.71,315.71
.
.
.
2010,1,388.63,388.63

mloa<-read.table("http://www.colorado.edu/geography/class_homepages/geog_4023_s11/monaloa.txt", header=T, sep=",")

names(mloa) #shows the column names
mlco2<-ts(mloa$interp, st=c(1958,3), end=c(2010,1), fr=12)

plot(mlco2, ylab="Mean CO2 (PPM)")
ts.plot(mloa$interp, ylab="Mean CO2 (PPM)")

One way to look at seasonality
> boxplot(mlco2~cycle(mlco2))

Classical decomposition
mlco2.dec<-decompose(mlco2, type="mult")
plot(mlco2.dec)


Original AirPassenger series is a non-stationary variance (step 1 of making series stationary)

library("fpp")
logAP<-log(AirPassengers)
plot(logAP, ylab="Air Passengers (1000s)")

transforming it using logarithmic function. Notice the Y Axis in both plots

What is a stationary time series?
“…a time series is said to be stationary if there is no systematic change in mean (no trend), if there is no

systematic change in variance and if strictly periodic variations have been removed.” (Chatfield: 13)

Correlogram has very few significant spikes at very small lags and cuts off drastically/dies down quickly (at 2 or 3 lags)

describe the “quality” of nonstationarity of a series

Stochastic = inexplicable changes in direction
-Often found in economic processes, sometimes climate
-“Random walk” process
*Use differencing and autoregressive models

Deterministic = plausible physical explanation for a trend or seasonal cycle
-Increase in population, orbit of the earth
*Use regression models

Checking for stationarity

logAP<-log(AirPassengers)
kpss.test(logAP, null = "Trend")
kpss.test(AirPassengers, null = "Trend")

adf.test(logAP, alternative = "stationary")

adf.test(AirPassengers, alternative = "stationary")

1 > 2 No 1-2 = -1 if negative NO
2 > 1 Yes 2-1 = 1 if positive YES

KPSS: 0.01 > 0.05 No Can Reject means series non-stationary
ADF: 0.8 > 0.05 Yes Cannot Reject Means series is non-stationary

KPSS and ADF are use opposite hypothesis
Example:
> kpss.test(logAP, null = "Trend")

    KPSS Test for Trend Stationarity

data:  logAP
KPSS Trend = 0.121, Truncation lag parameter = 2,
p-value = 0.09626

is 0.09626 > 0.05
0.09626 - 0.05 = 0.04626
Yes means series is stationary

Smoothing is often used to remove an underlying signal or trend (such as a seasonal cycle)
Common method is the centered moving average

> acf(gwn, main = "ACF")
> qqnorm(gwn)
> pacf(gwn, main = "PACF")

>tsdisplay(z.ts)  #provides a time plot along with an ACF and PACF for a TS object

Autoregressive (AR) and moving average (MA)
The AR model includes lagged terms of the time series itself
The MA model includes lagged terms on the noise or residuals
How do we decide which to use?
?ACF and PACF

The autocorrelation function (ACF) is a set of correlation coefficients between the series and lags of itself over time
The partial autocorrelation function (PACF) is the partial correlation coefficients between the series and lags of itself over time

If the PACF displays a sharp cutoff while the ACF decays more slowly (i.e., has significant spikes at higher lags), we say that the series displays an "AR signature“

NOTE:*The lag at which the PACF cuts off is the indicated number of AR terms

In Simple terms
> library("fpp")
> livestock
> tsdisplay(livestock)
if the ACF is gradually decreasing over time
AND
PACF suddenly drops or goes up this is an AR signature

The diagnostic patterns of ACF and PACF for an AR(1) model are:
-ACF: declines in geometric progression from its highest value at lag 1
-PACF: cuts off abruptly after lag 1

The opposite types of patterns apply to an MA(1) process:
-ACF: cuts off abruptly after lag 1
-PACF: declines in geometric progression from its highest value at lag 1

For the ARMA(1,1), both the ACF and the PACF exponentially decrease
Much of fitting ARMA models is guess work and trial-anderror!

In most cases, the best model turns out a model that uses either only AR terms or only MA terms

ARIMA Modeling
What is an ARIMA model?
Type of ARMA model that can be used with some kinds of non-stationary data
-Useful for series with stochastic trends
-First order or “simple” differencing
-Series with deterministic trends should be differenced first then an ARMA model applied

The “I” in ARIMA stands for integrated, which basically means you’re differencing

R Code for choosing the best ARIMA model

> get.best.arima <- function(x.ts, maxord = c(1,1,1))
{
best.aic <- 1e8
n <- length(x.ts)
for (p in 0:maxord[1]) for(d in 0:maxord[2]) for(q in 0:maxord[3])
{
fit <- arima(x.ts, order = c(p,d,q))
fit.aic <- -2 * fit$loglik + (log(n) + 1) * length(fit$coef)
if (fit.aic < best.aic)
{
best.aic <- fit.aic
best.fit <- fit
best.model <- c(p,d,q)
}}
list(best.aic, best.fit, best.model)
}


then enter the following into R
> get.best.arima(TS, maxord=c(2,2,2))

See the result for the best ARIMA fit

When fitting ARIMA models with R, an intercept term is NOT included in the model if there is any differencing

gas<-read.table("http://www.colorado.edu/geography/class_homepages/geog_4023_s11/gas.dat", header=F, sep=",")
plot(gas, xlim=c(1973,1991))

nobs=length(gas)
gas.pred <- predict(gas.arima, n.ahead=36, newxreg=(nobs+1):(nobs+36))
lines(exp(gas.pred$pred), col="red")

Detecting the trend with an ARIMA model is implicit
-Can’t calculate the exact slope of the trend line

-If the autocorrelation at the seasonal period is positive, consider adding an SAR term to the model
-If the autocorrelation at the seasonal period is negative, consider adding an SMA term to the model

Cross-Correlation
How can we study the relationship between 2 or more time series?

The cross correlation function (CCF) is helpful for identifying lags of the x-variable that might be useful predictors of yt

*When one or more xt+h are predictors of yt , and h (the sig. lag) is negative, is sometimes said that x leads y
*When one or more xt+h are predictors yt, and h is positive, it is sometimes said that x lags y

In some problems, the goal may be to identify which variable is leading and which is lagging
We will want to use values of the x-variable to predict future values of y

ccf(x-variable name, y-variable name)

If you wish to specify how many lags to show, add that number as an argument of the command
ccf(x,y, 50)
-----------------------------------------------------

STL is a very versatile and robust method for decomposing time series. STL is an acronym for “Seasonal and Trend

decomposition using Loess”, while Loess is a method for estimating nonlinear relationships.

SUMMARY

gld = read.csv("http://ichart.finance.yahoo.com/table.csv?s=6947.KL&ignore=.csv", stringsAsFactors=F)
z.ts = ts(gld$Adj.Close, st=2009)

tstkr <- ts(as.numeric(z.ts), deltat=1/12) #convert to univariate series
library("tseries")

fit <- stl(tstkr, t.window=15, s.window="periodic", robust=TRUE)
library("forecast")
fcast <- forecast(fit, method="naive")
plot(fcast, ylab="New orders index")

Thursday, September 20, 2012

Cointegration in R

Essentially, it seeks to find stationary linear combinations of the two vectors
test two series for integration and return the p-value indicating the likelihood of correlation

This runs an augmented Dickey-Fuller test and will return a p-value indicating whether the series are mean-reverting or not. You can use the typical p-value as a test of significance if you like(ie, a p-value below .05 indicates a mean-reverting spread), or you can use an alternate value. This assumes that your two series were observed at the same time points.

gld <- read.csv("http://ichart.finance.yahoo.com/table.csv?s=6947.KL&ignore=.csv", stringsAsFactors=F)
gdx <- read.csv("http://ichart.finance.yahoo.com/table.csv?s=6012.KL&ignore=.csv", stringsAsFactors=F)

Take the intersection of the two zoo objects.  That will create one zoo object with the observations common to both datasets.

#The seventh column contains the adjusted close.
#The first column contains dates.

gld <- zoo(gld[,7], as.Date(gld[,1]))
gdx <- zoo(gdx[,7], as.Date(gdx[,1]))

The merge function can combine two zoo objects. so we merge them.
t.zoo <- merge(gld, gdx, all=FALSE)

# At this point, t.zoo is a zoo object with two columns: gld and gdx.
# Most statistical functions expect a data frame for input,
# so we create a data frame here.
#
t <- as.data.frame(t.zoo)


First we construct the spread, then we test the spread for a unit root. 

It the spread has a root inside the unit circle, the underlying securities are cointegrated.

The lm function builds linear regression models using  ordinary least squares(OLS).
# We build the linear model, m, forcing a zero intercept,
# then we extract the model's first regression coefficient.
m <- lm(gld ~ gdx + 0, data=t)
beta <- coef(m)[1]
cat("Assumed hedge ratio is", beta, "\n")

sprd <- t$gld - beta*t$gdx

The Augmented Dickey-Fuller test is a basic statistical test for a unit root, and several R packages implement that test.  Here, we will use the adf.test function which is implemented in the tseries package.  The function returns an object which contains the test results. In particular, it contains the p-value that we want.
library(tseries)            # Load the tseries package

# Setting alternative="stationary" chooses the appropriate test.
# Setting k=0 forces a basic (not augmented) test. 
ht <- adf.test(sprd, alternative="stationary", k=0)
cat("ADF p-value is", ht$p-value, "\n")

The adf.test function essentially detrends your data before testing for stationarity.
If your data contains a strong trend consider the  fUnitRoots package which contains the adfTest function

if (ht$p.value < 0.05) {
    cat("The spread is likely mean-reverting.\n")
} else {
    cat("The spread is not mean-reverting.\n")
}

gld <- read.csv("http://ichart.finance.yahoo.com/table.csv?s=6947.KL&ignore=.csv", stringsAsFactors=F)

SUMMARY


gld <- read.csv("http://ichart.finance.yahoo.com/table.csv?s=6012.KL&ignore=.csv", stringsAsFactors=F)
gdx <- read.csv("http://ichart.finance.yahoo.com/table.csv?s=4863.KL&ignore=.csv", stringsAsFactors=F)
gld <- zoo(gld[,7], as.Date(gld[,1]))
gdx <- zoo(gdx[,7], as.Date(gdx[,1]))
t.zoo <- merge(gld, gdx, all=FALSE)
t <- as.data.frame(t.zoo)
m <- lm(gld ~ gdx + 0, data=t)
beta <- coef(m)[1]
sprd <- t$gld - beta*t$gdx
ht <- adf.test(sprd, alternative="stationary", k=0)

if (ht$p.value < 0.05) {
     cat("The spread is likely mean-reverting.\n")
 } else {
     cat("The spread is not mean-reverting.\n")
 }(sprd, alternative="stationary", k=0)

Wednesday, September 19, 2012

Singular spectrum analysis basics in R

Using the following R package
 http://cran.r-project.org/web/packages/Rssa/index.html

and my time series as the variable: z.ts

Summary: Singular spectrum analysis for time series
Anatoly Zhigljavsky

Singular Spectrum Analysis a technique of times series analysis and forecasting.
Aim is to decompose the original series into a sum of smaller number of interpretable components such as:
slowly varying trend, oscillatory components and a structureless noise.

based on singular value decomposition (SVD) of a specific matrix constructed upon time series.

SSA is a model-free technique because no assumptions such as parametric model nor stationary type

condition is required.

Basic SSA
X= construct the trajectory matrix (lagged vectors)
this matrix is a Hankel Matrix, all elements along the diagonal are equal

the SVD of matrix XX(transpose) yields a collection of L eigenvalues and eigenVectors.

Basic SSA can be used for smoothing,filtration, noise reduction, extraction of trends of different

resolution, extraction of periodicities in the form of modulated harmonics, gap-filling

One of the requirements of SSA is a continuous time series with no Gaps.





in R statistics
s <- new.ssa(z.ts)

suitable grouping of the elementary time series is required via looking at the eigenplots of the

decomposition

plot(s, type = "series", groups = list(1:4)) #Plot the first 4 reconstructed components
plot(s, type = "values") #Plot the eigenvalues

examine the so-called w-correlation matrix
# Calculate the w-correlation matrix between first 10 series
w <- wcor(s, groups = 1:10)
print(w)
plot(w)

reconstruction of the time-series using the selected grouping
# Reconstruct the series, grouping elementary series 2, 3 and 4, 5.
r <- reconstruct(s, groups = list(1, c(2,3), c(4,5)))
plot(r$F1, col = "black")
lines(r$F1 + r$F2, col = "red")
lines(r$F1 + r$F2 + r$F3, col = "blue")

Summary of steps to get the graph

s <- new.ssa(z.ts) # Perform the decomposition using the default window length
summary(s)        # Show various information about the decomposition
plot(s)           # Show the plot of the eigenvalues
f <- reconstruct(s, groups = list(1, c(2, 3), 4)) # Reconstruct into 3 series
plot(z.ts)         # Plot the original series
lines(f$F1, col = "blue")            # Extract the trend
lines(f$F1+f$F2, col = "red")        # Add the periodicity
lines(f$F1+f$F2+f$F3, col = "green") # Add slow-varying component

Forcast
# Produce 5 forecasted values and confidence bounds of the series using
# the first 3 eigentriples as a base space for the forecast.
bforecast(s, group = 1:3, len = 5)

Monday, September 17, 2012

Introductory Time Series with R

---------------Notes and shortcuts------------
> data(AirPassengers)
> AP = AirPassengers
> AP

> www ="http://www.massey.ac.nz/~pscowper/data/pounds_nz.dat"
> z = read.table(www, header=T)
> z.ts = ts(z, st=1991, fr=4)

Reading Columns of a series
plot(YTL[,3]) //3rd column of the series

 
gld = read.csv("http://ichart.finance.yahoo.com/table.csv?s=6012.KL&ignore=.csv", stringsAsFactors=F) 
 


colnames(gld)
[1] "Date"      "Open"      "High"     
[4] "Low"       "Close"     "Volume"   
[7] "Adj.Close"
 
gld$Open          #Access the Open column 
gld$Adj.Close     #Access last column from CSV file
 
 

ts.plot(z.ts, predict( HoltWinters(z.ts),n.ahead=4*12), lty=1:2)

---------CHAPTER 1----------


When a variable is measured sequentially in time over a fixed interval (or sampling interval ) the data form a time series.

Observations that have been collected in the past over fixed sampling intervals
form an historical time series.

A sequence of random variables taken over fixed sampling intervals is
sometimes referred to as a discrete-time stochastic process

> data(AirPassengers)
> AP = AirPassengers


Seasonal data is removed via Aggregate
Cycle function extracts the seasons for each item of data

> plot(aggregate(AP), ylab="Annual passengers/1000’s")
> boxplot(AP ~ cycle(AP), names=month.abb)
> layout(1:2)
> plot(aggregate(AP)); boxplot(AP ~ cycle(AP))

Read external data
> www = "http://www.massey.ac.nz/~pscowper/data/cbe.dat"
> cbe = read.table(www, header=T)
> cbe[1:4,]

to create  a time series data
ts(1:120, start=c(1990, 1), end=c(1993, 8), freq=12)

For a data file like above
> elec.ts = ts(cbe[,3], start=1958, freq=12)
> beer.ts = ts(cbe[,2], start=1958, freq=12)
> choc.ts = ts(cbe[,1], start=1958, freq=12)

plot to see each data frame converted to a time series
plot(cbind(elec.ts, beer.ts, choc.ts),
main="Chocolate, Beer, and Electricity Production: 1958-1990")

The three series constitute a multiple time series. There are many functions
in R for handling more than one series. to obtain the intersection of two series which overlap in time use ts.intersect

ap.elec = ts.intersect(AP, elec.ts)


The term non-stationary is used to describe a time series that has underlying
trends or seasonal effects

covariance
a measure of linear association between two variables
sum((x - mean(x))*(y - mean(y)))/(n - 1)
cov(x,y)
if y tends to decrease as x increases the covariance will tend to be negative
if x and y tend to increase together, resulting in a positive covariance

Correlation measures the linear association between a pair of variables (x, y),
cor(x, y)
A value of +1 or -1 indicates an exact linear association
a value of 0 indicating no linear association.

In R the lag k autocorrelation can be obtained from the autocorrelation
function acf. By default it is called the correlogram.
acf(x)

For example, the lag 1 autocorrelation for x is:
> acf(x)$acf[2]

pairs of values taken at a time difference of lag 1 will tend to have a high correlation when there are trends in the data.

Correlation is dimensionless, so there are no units for the y-axis
The dotted lines represent the 5% significance level for the statistical test:Pk = 0.
Any correlations that fall outside these lines are ‘significantly’ different from zero.

**Usually a trend in the data will show in the correlogram as a slow decay in the autocorrelations,

mean and variance -> ‘centre’ and the ‘spread’

stationary time series {xt}, If the mean and autocovariance do not change with time, then {xt} is a called a second-order stationary process. this means variance and autocorrelation do not change with time

A simple additive decomposition model also called a classical decomposition model.
xt = mt + st + et

xt is the observed series, mt is the trend, st is the seasonal effect, and et is the remainder or residual series

In R the functions decompose and stl estimate trends and seasonal effects using methods based on moving averages and polynomials respectively.

plot(decompose(elec.ts))
Nesting the functions within plot, using plot(decompose()) or plot(stl()), produces a single figure showing the original series xt and the decomposed series: mt, st and et

If the seasonal effect tends to increase as the trend increases a multiplicative model may be more appropriate
If the residual series is also multiplicative the data can be transformed using the natural logarithm to produce an additive model


In forcasting One approach to this is to use the past values as dependent variables in a predictive model that gives more weight to the more recent observations. This approach is used in a model based on exponential smoothing

Holt-Winters procedure provides a prediction for xn+1 by updating the trend and seasonal effect
> z.hw = HoltWinters(z.ts, beta=0, gamma=0)
> z.hw$alpha
> plot(z.hw)

In general, exponential smoothing provides good predictions when the underlying process is a random walk or a random walk with moving average error terms.

The predict function in R can be used with the fitted model to make forecasts into the future
> AP.hw = HoltWinters(AP, seasonal="mult")
> plot(AP.hw)
> AP.predict = predict(AP.hw, n.ahead=4*12)
> ts.plot(AP, AP.predict, lty=1:2)


SUMMARY
ts produces a time series object
ts.plot produces a time plot for one or more series
ts.intersect creates the intersection of one or more time series
cycle returns the season for each value in a series
time returns the times for each value in a series
acf returns the correlogram
decompose decomposes a series into the components:trend, seasonal effect and residual
HoltWinters estimates the parameters of the Holt-Winters or exponential smoothing model
predict forecasts future values



---------CHAPTER 2---------
A residual error et is the difference between an observed value and a model predicted value at time t. also called white noise.

Time series simulated using a model are sometimes called a synthetic series to distinguish them from an observed historical serie

rnorm(100) is used to simulate 100 independent standard Normal variables, which is equivalent to simulating a Gaussian white noise series of length 100
> set.seed(1)
> e = rnorm(100)
> plot(e, type="l")

set.seed is used to provide a starting point (or seed) in the simulations, thus ensuring that the simulations can be reproduced

Histogram
> x = seq(-3,3, len=1000)
> hist(rnorm(100), prob=T); points(x, dnorm(x), type="l")

Correlogram of a simulated white noise series. The underlying autocorrelations are all zero (except at lag 0); the statistically significant value at lag 7 is due to sampling variation
> acf(rnorm(100))


Differencing adjacent terms of a series can transform a non-stationary series to a stationary series

The following commands can be used to simulate random walk data for x.
> x = e = rnorm(1000)
> for (t in 2:1000) x[t] = x[t-1] + e[t]
> plot(x, type="l")

The ‘for’ loop then generates the random walk using Equation (2.3)

**The first order differences of a random walk are a white noise series,
so the correlogram of the series of differences can be used to assess whether a given series is a random walk.
> acf(diff(x))

> z.hw = HoltWinters(z.ts, gamma=0)
> acf(resid(z.hw))

The partial autocorrelation at lag k is the correlation that results after removing the effect of any correlations due to previous terms.

Hence, a plot of the partial autocorrelations can be useful when determining the order of an underlying AR process. In R, the function pacf can be used to calculate the partial autocorrelations of a time series and produce a plot of the partial
autocorrelations against lag (the ‘partial-correlogram’).
> set.seed(1)
> x = e = rnorm(100)
> for (t in 2:100) x[t] = 0.7*x[t-1] + e[t]
> plot(x); acf(x); pacf(x)

An AR(p) model can be fitted to data in R using the ar function
> x.ar = ar(x, method="mle")

The method “mle” used in the fitting procedure above is based on maximising the likelihood function

Akaike Information Criteria (AIC; Akaike (1974)), which penalises models with too many parameters
AIC = -2 × log-likelihood + 2 × number of parameters

acf(z.ar$res[-1])
"-1" is used in the vector of residuals to remove the first item from the residual series. (For a fitted AR(1) model the first item has no predicted value, because there is no observation at t = 0;

> global.ar = ar(aggregate(global.ts, FUN=mean), method="mle")
> mean(aggregate(global.ts, FUN=mean))
> global.ar$order; global.ar$ar
> acf(global.ar$res[-(1:global.ar$order)], lag=50)


set.seed sets a seed for the random number generator enabling a simulation to be reproduced
rnorm simulates a white noise series
diff creates a series of first-order differences
ar gets the best fitting AR(p) model
pacf extracts partial autocorrelations and partial-correlogram


-----------CHAPTER 3 --------------
A trend is stochastic(random) when the underlying cause is not understood, and can only be attributed to high serial correlation with random error. Trends of this type, which are common in *financial series, can be simulated in R using models such as the *random walk or *autoregressive process (Chapter 2)

Deterministic trends and seasonal variation can be modelled using regression(so-called because there is a greater understanding of the underlying cause of the trend.)

linear model will often provide a good approximation even when the underlying process is non-linear

differencing turns out to be a very useful general procedure as it can ‘remove’ both stochastic and deterministic trends

Linear models are usually fitted by minimising the sum of squared errors, which can be achieved in R using the function lm.
> set.seed(1)
> z = e = rnorm(100, sd=20)
> for (t in 2:100) z[t] = 0.8*z[t-1] + e[t]
> t = 1:100
> x = 50 + 3*t + z
> x.lm = lm(x ~ t)
> coef(x.lm)
> confint(x.lm)

an approximate 95% confidence interval for the estimated parameters extracted
The function summary can be used to obtain a standard regression output to observe this, type summary(x.lm) after entering the commands above.

> summary(x.lm)$coef
> summary(x.lm)$sigma
> summary(x.lm)$r.sq

the coefficients and their standard errors with respective t-tests, the residual standard deviation, and R2 can all be extracted

> acf(resid(lm(temp ~ time(temp))))
If the autocorrelation in the residual series is not accounted for, then the standard errors used to create confidence intervals, such as those above, or any t-ratios, will be under-estimated.

> library(nlme)
> x.gls = gls(x ~ t, cor = corAR1(0.8))
The primary use of GLS is therefore in the estimation of the standard errors and adjusting the confidence intervals for the parameters.


TO BE CONTINUED PAGE 64


SUMMARY
lm fits a linear (regression) model
coef extracts the parameter estimates from a fitted model
confint returns a (95%) confidence interval for the parameters of a fitted model
summary gets basic summary information
gls fits a linear model using generalised least squares (allowing for autocorrelated residuals)located in the nlme library
factor returns variables in the form of ‘factors’ or indicator variables
predict returns predictions (or forecasts) from a fitted model
nls fits a non-linear model by least squares

Saturday, September 8, 2012

Indicators i use in ChartNexus

OBV:buying and selling pressure
price move higher if OBV is rising=buy pressure
price move lower if OBV is falling=sell pressure




MFI:uses price and volume to measure buying and selling pressure
below 20 is oversold
above 80 is overbought
above 50 is bull
below 50 is bear

RSI:speed and change of price movement
Overbought above 70
Oversold below 30

stochastic:follows speed and momentum of price
below 20% is oversold
above 80% is overbought

MACD:Moving Average Convergence-Divergence (momentum Indicator)
positive MACD means upside momentum is increasing
negative MACD means downside momentum is increasing

Ichimoku:
Trend is up when price are above cloud
Trend is down when price are below the cloud
Trend is flat when price is below the cloud.
uptrend is strong if SpanA is rising and above the leading spanB(green cloud)
downtrend is strong when SpanA is falling and below the leading spanB(red cloud)

ADX Average directional index:strength and weakness of a trend (not direction)
strong bull if +DI is greater than -DI (buy signal)
strong bear if -DI is greater than +DI
strong trend when ADX is above 25
no trend when ADX is below 20

Chaikin Money Flow (MFI)
above 0 shows buying pressure
below 0 shows selling pressure

Rate of change: price change over time
prices rise when ROC remains positive
price fall when ROC is negative

Price Channels:Show 20 day high and low.
stock is near is highest high for the past 20 days, may go down (oversold)
stock near lowest low of past 20 days may go up (overbought)

Using R as a Tool and getting your dataset

Download R
statistical computing and graphics tool.
www.r-project.org/

Download R Packages from http://cran.r-project.org/web/package/available_packages_by_name.html
forcast package with these dependencies
tseries, fracdiff, zoo, Rcpp (≥ 0.9.10), RcppArmadillo (≥ 0.2.35)

Download RStudio
http://rstudio.org/
An IDE for R use RStudio interface to download and install the packages (See image)


Find your Stock Symbol
http://finance.yahoo.com/
Search by entering the name and click "Get Quotes"


Download your Dataset using this URL (change to your desired stock symbol)
http://ichart.finance.yahoo.com/table.csv?s=4677.KL&a=0&b=1&c=2000&d=08&e=9&f=2012&g=d&ignore=.csv

s=4677.KL : my stock code
after s= goes the ticker symbol, after a= the start month (minus 1), after b= the start day, c= the start year and so on. The final g= parameter lets you choose between getting historical stock information on a daily, weekly, or monthly basis.

You have your Tool and Data. Lets Try it (Variables are CASE SENSITIVE):

Rename the download CSV file to the name of the stock, in my case table.csv->YTL.csv
Import the data to RStudio
YTL <- read.csv("C:/Users/user/Desktop/YTL.csv")

Convert the dataset to a timeseries
YTL <- ts(YTL[,-1], start=2000, frequency=4)

start is the year we are observing the data
frequncy is the number of observations per unit of time.

Plot and View the Summary
plot(YTL)
summary(YTL)

Any Correlation between Volume and Closing Price
cor.test(YTL[,"Volume"],YTL[,"Close"])

Scatterplot matrix of the variables:
 pairs(as.data.frame(YTL))

Monthly Plot:
monthplot(YTL[,"Close"])
monthplot(YTL[,"Volume"])