S&P 500 Exponential GARCH Volatility Model Using R

In this post we are going to discuss the  S&P 500 Exponential GARCH Asset Volatility model. The volatility model that we will develop in this post for S&P 500 can also be used for other indices like Dow Jones, Nasdaq, FTSE 100, DAX , CAC 40, Hang Seng etc as well as stocks like Apple, Google, Facebook etc. There are many day traders who love to trade S&P 500 index as well as the other stock indices that have been mentioned above. Did you read the book How To Make More Money Than God-Hedge Funds and Making of New Elite? Asset volatility is an important subject in quantitative finance as it has got very important applications. If you want a job as a quant at a hedge fund than you should know in intimate detail the different asset volatility models. The most important application of asset volatility models is in options pricing. Most hedge funds love to trade options. Asset volatility models are also very important in calculating Value at Risk for managing the risk of a portfolio. Most hedge funds use this Value at Risk measure a lot in quantifying the risk to their asset portfolio.

You should read this post if you are thinking of becoming an options trader. This is something very important for you to understand. We talk of asset volatility most of the the time. But we cannot observe it directly. We can only observe the price of the asset and its derivative which will most of the time be options. So you should keep this in mind We cannot measure asset volatility directly. We can only calculate it or to be more precise estimate it with the price of the asset and the options contract on that asset. Read this post on how to develop your artificial intelligence machine learning stock trading system.

The first asset volatility model was the Autoregressive Conditional Heteroscedastic ARCH Model. This model had some limitations so the General Autoregressive Conditional Heteroscedastic GARCH Model was developed. You should read this post that explains the ARIMA and GARCH models. Most of the time you will hear about the GARCH model. In this post however we will talk of a modification of this GARCH model which is known as the Exponential GARCH model.The exponential GARCH model is also known as the EGARCH model. There are many more asset volatility models that have been developed that includes IGARCH, TGARCH and NGARCH.

Characterisitics of Asset Volatility

Now as said above we cannot observe volatility directly. But we can observe some of its characteristics. The most important ones are the volatility clusters. Volatility clusters is a characteristic that means volatility is high for some periods of time and low for other periods of time. You can observe this thing at the time of economic news release when the currencies and stocks become highly volatility. This high volatility lasts for sometime. The stocks become volatile when there is an important announcement by the FED. This volatility persists for sometime. Market tries to absorb the new information which is the reason for this high volatility. FOMC Meeting minutes release time is when stock market can become highly volatile. Now quantitative finance revolves around return and volatility. Read this post on how to predict stock return using support vector machines.

The second think that you should know is the there are no volatility jumps.Volatility evolves over time in a continuous manner.  It is not like that suddenly the market becomes highly volatile. So there are no volatility jumps.Thirdly volatility cannot become infinite. Volatility stays within a certain range. Now this is something important for you to understand.Volatility reacts differently to a big price increase as compared to a big price decrease. Big price drop has a much higher volatility as compared to a big price increase. ARCH and GARCH models cannot cater for this leverage effect. ARCH and GARCH have the same volatility for the big price increase as well as big price decrease. Due to this reason EGARCH and TGARCH models were developed as they introduce an asymmetric behavior in asset volatility. As said before we cannot observe volatility directly. We cannot only estimate it with the stock price and the options contract prices on that stock. Before we continue there are three types of volatility that we can calculate.

We will be using R in doing the modelling. R is a powerful data science scripting language that is very popular with the quants. You should have R and RStudio installed on your computer. You should master R if you want to become a quant. Read this post on how to do algorithmic trading with R. When it comes to time series analysis R is very powerful. Python cannot beat R when it comes to time series analysis.

Types of Volatility Measures

There are three types of volatility measure that we use:

1. Volatility as condition standard deviation of the stock daily returns. This is what we will estimate in this post. Volatility of a stock return is estimated on an annualized basis. Since there are 252 trading days in a year we can calculate the annualized volatility by multiplying the daily volatility by sqr(252) where sqr is the square root.

2. Implied volatility is estimated through the stock options pricing formula like the Black Scholes. You should read this post in which I have posted videos that explain this Black Scholes formula in detail. Read this post as it discusses how this formula is derived, what are the assumptions made while deriving the formula and what are the limitations. Since it is formula dependent there are certain limitations to this approach of calculating the implied volatility. You can read this post in which I have given the python code for calculating Black Scholes Options Pricing formula.

3. We can also calculate daily volatility from the high frequency trading data intraday returns. For example we can use the 5 minute time interval to estimate the daily volatility. This is known as Realized Volatility. In the next post we can discuss in detail how we are going to calculate the realized volatility from the high frequency trading data.

S&P 500 eGARCH(2,1) Volatility Model Using rugarch R package

This is how we are going to build the model. First we need to download S&P 500 daily price data from Yahoo Finance.

> #load library
> library(quantmod)
> #create an environment
> SP500 <- new.env()
> #first we download the daily time series data from Yahoo Finance
> getSymbols("^GSPC", env = SP500, src = "yahoo",
+            from = as.Date("2012-01-04"), to = as.Date("2017-02-01"))
[1] "GSPC"
> sp500 <- SP500$GSPC
> head(sp500)
           GSPC.Open GSPC.High GSPC.Low GSPC.Close GSPC.Volume GSPC.Adjusted
2012-01-04   1277.03   1278.73  1268.10    1277.30  3592580000       1277.30
2012-01-05   1277.30   1283.05  1265.26    1281.06  4315950000       1281.06
2012-01-06   1280.93   1281.84  1273.34    1277.81  3656830000       1277.81
2012-01-09   1277.83   1281.99  1274.55    1280.70  3371600000       1280.70
2012-01-10   1280.77   1296.46  1280.77    1292.08  4221960000       1292.08
2012-01-11   1292.02   1293.80  1285.41    1292.48  3968120000       1292.48
> class(sp500)
[1] "xts" "zoo"
> dim(sp500)
[1] 1278    6
> # Plot the daily price series and volume
> chartSeries(sp500,theme="white")

Below is the plot for the S&P 500 daily time series.

S&P 500 eGARCH Model

Now we take the Box Ljung test of the daily log returns of S&P 500 daily time series of the closing price.

> #calculate the log returns
> #calculate the log returns
> sp500lr <-log(sp500$GSPC.Close+1)
> #Box Ljung Test for serial correlation of log returns
> Box.test(sp500lr,lag=1,type="Ljung")

	Box-Ljung test

data:  sp500lr
X-squared = 1271.6, df = 1, p-value < 2.2e-16

You can see from the above Box Ljung test that the daily log returns of S&P 500 daily time series are not serially correlated. Now we will be using rugarch package. This is a powerful package. You should practice with this R package a lot if you are planning to calculate the volatility of stock indices on a daily basis. You can download rugarch package 107 page PDF from R cran site. It contains full details on the different methods that you can call using this package. We can model different types of GARCH models with this package. First we load this package than we create spec.

> #load rugarch library
> library(rugarch)
> 
> #Fit the EGARCH model
> 
> # create a cluster object to be used as part of this demonstration
> cluster <- makePSOCKcluster(15)
> #ask R to show specs
> spec <- ugarchspec()
> show(spec)

*---------------------------------*
*       GARCH Model Spec          *
*---------------------------------*

Conditional Variance Dynamics 	
------------------------------------
GARCH Model		: sGARCH(1,1)
Variance Targeting	: FALSE 

Conditional Mean Dynamics
------------------------------------
Mean Model		: ARFIMA(1,0,1)
Include Mean		: TRUE 
GARCH-in-Mean		: FALSE 

Conditional Distribution
------------------------------------
Distribution	:  norm 
Includes Skew	:  FALSE 
Includes Shape	:  FALSE 
Includes Lambda	:  FALSE

You can see from above that the package is suggesting a standard GARCH(1,1) package with ARFIMA(1,0,1). Let’s see how many models that are possible.

> #number of possible models
> nrow(expand.grid(GARCH = 1:14, VEX = 0:1, VT = 0:1, 
+                  Mean = 0:1, ARCHM = 0:2, 
+                  ARFIMA = 0:1, MEX = 0:1, DISTR = 1:10))
[1] 13440

You can see with the above specifications there are 13440 models that we can fit. Below we fit a eGARCH model with order(2,1) meaning it is second order autoregressive and first order moving average. Most of the time order(1,1) is sufficient for modelling purpose. Since this is a post meant for training and exploration, we choose order(2,1).

> #fit the rugarch eGarch model with student t distribution
> spec = ugarchspec(variance.model = list(model = 'eGARCH', 
+                                         garchOrder = c(2, 1)), 
+                   distribution = 'std')
> setstart(spec) <- list(shape = 5)
> egarch1<- ugarchfit(spec, sp500lr[1:1000, , drop = FALSE], solver = 'hybrid')
> egarch1

*---------------------------------*
*          GARCH Model Fit        *
*---------------------------------*

Conditional Variance Dynamics 	
-----------------------------------
GARCH Model	: eGARCH(2,1)
Mean Model	: ARFIMA(1,0,1)
Distribution	: std 

Optimal Parameters
------------------------------------
        Estimate  Std. Error     t value Pr(>|t|)
mu      7.153286    0.012530  5.7091e+02 0.000000
ar1     0.999666    0.000000  4.0443e+07 0.000000
ma1     0.000532    0.025793  2.0638e-02 0.983535
omega  -0.938738    0.006589 -1.4247e+02 0.000000
alpha1 -0.396606    0.024005 -1.6522e+01 0.000000
alpha2  0.051018    0.024828  2.0548e+00 0.039897
beta1   0.902905    0.000105  8.5776e+03 0.000000
gamma1 -0.064610    0.054744 -1.1802e+00 0.237911
gamma2  0.131327    0.046940  2.7978e+00 0.005145
shape  13.351651    4.896861  2.7266e+00 0.006400

Robust Standard Errors:
        Estimate  Std. Error     t value Pr(>|t|)
mu      7.153286    0.001466  4.8796e+03 0.000000
ar1     0.999666    0.000000  3.3707e+07 0.000000
ma1     0.000532    0.023802  2.2364e-02 0.982158
omega  -0.938738    0.006467 -1.4516e+02 0.000000
alpha1 -0.396606    0.021293 -1.8626e+01 0.000000
alpha2  0.051018    0.023691  2.1535e+00 0.031280
beta1   0.902905    0.000105  8.5584e+03 0.000000
gamma1 -0.064610    0.053296 -1.2123e+00 0.225408
gamma2  0.131327    0.046005  2.8546e+00 0.004309
shape  13.351651    4.401599  3.0334e+00 0.002418

LogLikelihood : 3527.377 

Information Criteria
------------------------------------
                    
Akaike       -7.0348
Bayes        -6.9857
Shibata      -7.0350
Hannan-Quinn -7.0161

Weighted Ljung-Box Test on Standardized Residuals
------------------------------------
                        statistic p-value
Lag[1]                     0.1285  0.7200
Lag[2*(p+q)+(p+q)-1][5]    1.6689  0.9931
Lag[4*(p+q)+(p+q)-1][9]    3.7400  0.7503
d.o.f=2
H0 : No serial correlation

Weighted Ljung-Box Test on Standardized Squared Residuals
------------------------------------
                         statistic p-value
Lag[1]                       0.021  0.8848
Lag[2*(p+q)+(p+q)-1][8]      2.104  0.8425
Lag[4*(p+q)+(p+q)-1][14]     5.127  0.7536
d.o.f=3

Weighted ARCH LM Tests
------------------------------------
            Statistic Shape Scale P-Value
ARCH Lag[4]     1.065 0.500 2.000  0.3021
ARCH Lag[6]     1.963 1.461 1.711  0.4990
ARCH Lag[8]     4.395 2.368 1.583  0.3205

Nyblom stability test
------------------------------------
Joint Statistic:  2.5149
Individual Statistics:             
mu     0.5614
ar1    0.1489
ma1    0.1131
omega  0.2158
alpha1 0.4151
alpha2 0.3839
beta1  0.2224
gamma1 0.5211
gamma2 0.3963
shape  0.3781

Asymptotic Critical Values (10% 5% 1%)
Joint Statistic:     	 2.29 2.54 3.05
Individual Statistic:	 0.35 0.47 0.75

Sign Bias Test
------------------------------------
                   t-value    prob sig
Sign Bias            1.975 0.04850  **
Negative Sign Bias   1.512 0.13092    
Positive Sign Bias   1.749 0.08066   *
Joint Effect         5.516 0.13767    


Adjusted Pearson Goodness-of-Fit Test:
------------------------------------
  group statistic p-value(g-1)
1    20     48.96    0.0001861
2    30     59.90    0.0006356
3    40     69.84    0.0017471
4    50     71.60    0.0192722


Elapsed time : 24.87401

Above you can see the full details of the eGARCH(2,1) model that we have fitted to S&P 500 daily log return series. R took around 25 seconds to do all the calculations which is pretty fast. Below are some plots of this eGARCH(2,1) model. This is the news impact curve for the egarch1 model.

S&P 500 eGARCH Model

This is the auto correlation function of the squared standardized residuals of egarch1.

S&P 500 eGARCH Model

This is the auto correlation function plot of standardized residuals of egarch1 model.

S&P 500 eGARCH Model

This is the standard QQ plot of the residuals of egarch1. You can see the residuals are still flat tailed and not strictly normal, So our egarch1 model may not be very good.

S&P 500 eGARCH Model

This is the plot of the empirical density of the standardized residuals of egarch1 model.

S&P 500 eGARCH Model

We try to build 2 more models as more streaming data comes in.

> spec <-getspec(egarch1)
> setfixed(spec) <- as.list(coef(egarch1))
> #filter new data that has streamed in
> egarch2 <- ugarchfilter(spec, sp500lr[1:1250, ], n.old = 1000)
> egarch3 <- ugarchfilter(spec, sp500lr[1001:1250, ])

Now we can also do simulation on our fitted egarch1 model.

> #simulating GARCH models
> sim <- ugarchsim(egarch1, n.sim = 1000, m.sim = 25, rseed = 1:25)
> show(sim)

*------------------------------------*
*       GARCH Model Simulation       *
*------------------------------------*
Model : eGARCH
Horizon:  1000
Simulations: 25
              Seed Sigma2.Mean Sigma2.Min Sigma2.Max Series.Mean Series.Min
sim1             1    9.13e-05   5.97e-06   0.000964        7.27       7.04
sim2             2    9.24e-05   5.25e-06   0.000646        7.05       6.86
sim3             3    9.84e-05   3.39e-06   0.001159        7.01       6.85
sim4             4    8.90e-05   9.27e-06   0.000599        7.06       6.96
sim5             5    8.42e-05   3.01e-06   0.000666        7.06       6.86
sim6             6    9.08e-05   3.60e-06   0.000782        7.16       7.00
sim7             7    8.90e-05   2.42e-06   0.001390        7.20       6.89
sim8             8    8.36e-05   5.22e-06   0.000903        7.30       7.06
sim9             9    7.44e-05   6.62e-06   0.000446        7.35       7.13
sim10           10    1.00e-04   6.28e-06   0.001035        7.22       6.81
sim11           11    9.40e-05   6.81e-06   0.000688        6.98       6.76
sim12           12    7.96e-05   4.16e-06   0.000845        7.21       7.02
sim13           13    7.70e-05   4.42e-06   0.000658        7.21       6.99
sim14           14    1.03e-04   4.41e-06   0.000839        7.04       6.77
sim15           15    7.82e-05   3.96e-06   0.000768        7.40       7.15
sim16           16    8.41e-05   2.65e-06   0.001233        7.27       7.04
sim17           17    7.57e-05   1.89e-06   0.000778        7.36       7.07
sim18           18    1.20e-04   6.17e-06   0.002533        6.94       6.54
sim19           19    9.14e-05   3.87e-06   0.001020        7.04       6.88
sim20           20    8.09e-05   4.41e-06   0.000627        7.25       7.07
sim21           21    7.24e-05   3.16e-06   0.000532        7.31       7.03
sim22           22    9.12e-05   6.72e-06   0.000951        7.12       6.86
sim23           23    1.01e-04   5.22e-06   0.000879        7.08       6.83
sim24           24    7.90e-05   8.37e-06   0.000641        7.26       7.04
sim25           25    9.80e-05   4.06e-06   0.001675        7.21       6.85
Mean(All)        0    8.87e-05   4.85e-06   0.000930        7.17       6.93
Actual           0    6.83e-05   4.69e-06   0.000782        7.46       7.15
Unconditional    0    6.33e-05         NA         NA        7.15         NA
              Series.Max
sim1                7.50
sim2                7.29
sim3                7.16
sim4                7.18
sim5                7.27
sim6                7.28
sim7                7.38
sim8                7.50
sim9                7.50
sim10               7.42
sim11               7.19
sim12               7.40
sim13               7.39
sim14               7.23
sim15               7.58
sim16               7.48
sim17               7.56
sim18               7.18
sim19               7.18
sim20               7.45
sim21               7.55
sim22               7.37
sim23               7.32
sim24               7.45
sim25               7.42
Mean(All)           7.37
Actual              7.66
Unconditional         NA

Below is the plot of this simulation.

S&P 500 eGARCH Model

We can do forecasting as well. Volatility calculations are very important in options price calculations. In a future post I will show you how we are going to use our volatility model in pricing an options contract. Now this was a short introduction to how to make volatility models for stock indices like S&P 500. I have developed a few courses on R that you can take a look at. First course is R for Traders. In this course I give you an introduction to how you are going to use R for statistical analysis that you can use in your trading. The second course is Machine Learning Using R for Traders. In this course I build upon the first course R for Traders. I explains some important machine learning algorithms and then show you how to build predictive models using those machine learning algorithms using R. The third course is Algorithmic Trading With R. This course build upon the first 2 courses. I show you how we are going to build algorithmic trading systems using R and APIs provided by brokers.

By admin

I have done masters from Harvard University. I am interested in day trading currencies, options and stocks.