K-means clustering is one of the simplest techniques used for classification. It partitions n observations into k clusters in which each observation belongs to the cluster with nearest center. Mathematically, K-means clustering tries to find the set of μ such that the following expression should be minimized.
Here d(x,y) is the distance function. Typical distance
functions used are squared euclidean, sum of absolute differences and correlation. μi is the center(mean/median as per the definition of distance function) of the observations in Si.
In line with my
previous post on Factor analysis based stock classification, we will attempt
to classify stocks into groups to uncover hidden trends if any exists.
Classification of
LIX15 stocks:
LIX15 is
an Indian equity market index that consists of 15 highly liquid stocks traded
on NSE. The observations
matrix consists of normalized daily returns of these 15 stocks sampled from
February to November 2013. K-means clustering is applied on the data using
squared euclidean distance function. Following is the result of a two cluster classification:
Cluster 1
|
Cluster 2
|
AXISBANK
|
CAIRN
|
BANKBARODA
|
MCDOWELL-N
|
HINDALCO
|
TATAMOTORS
|
IDFC
|
|
JINDALSTEL
|
|
JPASSOCIAT
|
|
JSWSTEEL
|
|
MARUTI
|
|
RCOM
|
|
SBIN
|
|
TATASTEEL
|
|
YESBANK
|
The result are clusters with disproportionate size and non obvious interpretations. Interestingly enough the stocks in cluster 2 are the stocks which do not show any significant
loading on factors during the factor analysis. Hence prima facie k-means has classified LIX15 constituents into two groups, one that moved with the broad market and the other which exhibited heavy idiosyncratic movements during the analysis period. Following is the outcome of a three cluster classification:
Cluster 1
|
Cluster 2
|
Cluster 3
|
CAIRN
|
AXISBANK
|
MCDOWELL-N
|
HINDALCO
|
BANKBARODA
|
TATAMOTORS
|
JINDALSTEL
|
IDFC
|
|
JPASSOCIAT
|
MARUTI
|
|
JSWSTEEL
|
SBIN
|
|
RCOM
|
YESBANK
|
|
TATASTEEL
|
The clusters roughly corresponds with sectorial themes.
|
Fundamental theme
|
Cluster 1
|
Metal stocks
|
Cluster 2
|
Financial services stocks
|
Cluster 3
|
Erratic/heavily idiosyncratic stocks
|
Classification of BANKNIFTY stocks:
As with the LIX15 analysis, a two cluster based classification
is performed on the BANKNIFTY constituents.Following are the resulting clusters:
Cluster 1
|
Cluster 2
|
AXISBANK
|
BANKBARODA
|
HDFCBANK
|
BANKINDIA
|
ICICIBANK
|
CANBK
|
INDUSINDBK
|
PNB
|
KOTAKBANK
|
SBIN
|
YESBANK
|
UNIONBANK
|
The fundamental interpretation of the resulting clusters is
quite clear.
Fundamental theme
|
|
Cluster 1
|
Private sector banks
|
Cluster 2
|
Public sector banks
|
Conclusion:
Using clustering techniques, we have been able to group stocks. These grouping tend to convey a particular fundamental meaning. Among the LIX15 constituents the major classification is on the sectorial line. Among the BANKNIFTY constituents the classification lies along the public vs private ownership lines. These conclusions are in line with the one obtained from factor analysis based classification of stocks..
Nice blog!
ReplyDeletebtw...What is the distance function in your analysis?
Thanks.
ReplyDeleteI have used squared euclidean distance function for this analysis.