Wednesday, May 7, 2014

K-means clustering based stock classification

K-means clustering is one of the simplest techniques used for classification. It partitions n observations into k clusters in which each observation belongs to the cluster with nearest center. Mathematically, K-means clustering tries to find the set of μ such that the following expression should be minimized.
Here d(x,y) is the distance function. Typical distance functions used are squared euclidean, sum of absolute differences and correlation. μi is the center(mean/median as per the definition of distance function) of the observations in Si.

In line with my previous post on Factor analysis based stock classification, we will attempt to classify stocks into groups to uncover hidden trends if any exists.

Classification of LIX15 stocks:
LIX15 is an Indian equity market index that consists of 15 highly liquid stocks traded on NSE. The observations matrix consists of normalized daily returns of these 15 stocks sampled from February to November 2013. K-means clustering is applied on the data using squared euclidean distance function. Following is the result of a two cluster classification:

Cluster 1
Cluster 2
AXISBANK
CAIRN
BANKBARODA
MCDOWELL-N
HINDALCO
TATAMOTORS
IDFC

JINDALSTEL

JPASSOCIAT

JSWSTEEL

MARUTI

RCOM

SBIN

TATASTEEL

YESBANK


The result are clusters with disproportionate size and non obvious interpretations. Interestingly enough the stocks in cluster 2 are the stocks which do not show any significant loading on factors during the factor analysis. Hence prima facie k-means has classified LIX15 constituents into two groups, one that moved with the broad market and the other which exhibited heavy idiosyncratic movements during the analysis period. Following is the outcome of a three cluster classification:

Cluster 1
Cluster 2
Cluster 3
CAIRN
AXISBANK
MCDOWELL-N
HINDALCO
BANKBARODA
TATAMOTORS
JINDALSTEL
IDFC

JPASSOCIAT
MARUTI

JSWSTEEL
SBIN

RCOM
YESBANK

TATASTEEL



The clusters roughly corresponds with sectorial themes. 


Fundamental theme
Cluster 1
Metal stocks
Cluster 2
Financial services stocks
Cluster 3
Erratic/heavily idiosyncratic stocks

Classification of BANKNIFTY stocks:
As with the LIX15 analysis, a two cluster based classification is performed on the BANKNIFTY constituents.Following are the resulting clusters:

Cluster 1
Cluster 2
AXISBANK
BANKBARODA
HDFCBANK
BANKINDIA
ICICIBANK
CANBK
INDUSINDBK
PNB
KOTAKBANK
SBIN
YESBANK
UNIONBANK

The fundamental interpretation of the resulting clusters is quite clear. 


Fundamental theme
Cluster 1
Private sector banks
Cluster 2
Public sector banks

Conclusion:
Using clustering techniques, we have been able to group stocks. These grouping tend to convey a particular fundamental meaning. Among the LIX15 constituents the major classification is on the sectorial line. Among the BANKNIFTY constituents the classification lies along the public vs private ownership lines. These conclusions are in line with the one obtained from factor analysis based classification of stocks..

2 comments:

  1. Nice blog!

    btw...What is the distance function in your analysis?

    ReplyDelete
  2. Thanks.
    I have used squared euclidean distance function for this analysis.

    ReplyDelete