In a previous post of mine, I analyzed how PCA can be used to
identify market characteristics. In this post we will take a bottom up approach
to identify pair trading opportunities. Any pair trading model has two
components to it. The first step is the identification of good pairs. The
second step is identification of divergence in these good pairs to initiate a
trade. We will see how PCA can be used to perform these steps:
Methodology:
The following steps are applied on all the possible intra-sector pairs:
- Demeaned daily returns of the stock in each pair are calculated. A matrix of sized 400*2 is constructed where 400 is the number of observations(days) and 2 is the number of stocks(a pair). PCA is performed on this matrix to get the principle components.
- The variance explained by the first principal component is the first short listing parameter. The higher is this variance the more related the stocks are. 80%+ variance explained by the first component is generally considered good.
- The next step is to calculate the distribution of returns around the first component. This is called the daily error. The auto-correlation of this error is the second parameter. Values less than -0.1 are favorable. Negative auto-correlation signifies that the error is mean reverting.
- To check for divergence, we look at sum of last N days daily error. N=4 is generally good. If the sum of last N day daily error is above a threshold than it is a good entry point.
- Book profit, stop loss and maximum holding period criteria are applied to exit a pair once it has been entered.
Example:
I have taken  ICICBANK-AXISBANK pair to
illustrate the method. The data is from October 2012 to May 2014.  A total of 400 days. Following is the plot of
cumulative returns of these stocks since Oct 2012. We can see these stocks tend
to move together. 
The following is the plot of normalized difference in the cumulative
returns for these two stocks: 
This spread looks mean reverting. Using ADF test we can see
that the spread is stationary at 99% level. Now we apply PCA to this pair. Demeaned daily returns of  these stocks are calculated and the principal components are estimated. Following is the plot of principal components for the given pair:
We
see that the variance explained by the primary component is around 86%. This is
high value. The
auto-correlation of daily error is around -0.1(significant at
95% levels). This means that the error is oscillating in nature. We can conclude that these stocks form a good pair.
To identify trade entry points we look at the distribution of returns around the primary principal component:
Whenever the last four day cumulative error(shown in read) goes above a threshold(shown in green) the corresponding mean reverting position is established in the pair. 
As per back test the above algorithm seems promising. Still there are some things we need to keep in mind which can undermine the accuracy of our trading model:
- As the PCA ignores the mean value of returns, we might end up trading on a non-stationary spread. This can be handled by ignoring pairs in which the constituent stocks have significantly different average returns over the look back period.
- Also this approach looks only at short term divergences only. It ignores traditional long term divergences around which many co-integration based pair trading models are based. This can be partially tackled by using multiple look back (longer and shorter) for error identification.
- The correlation of spread with market needs to be taken into account before entering any position.




