The present study mainly aims at clustering of pre-monsoon thunderstorm (TS) and non thunderstorm (NTS) days over Kolkata (22032´ N, 88020´ E) (India) using hard k-mean technique, backward selection procedure and fuzzy c- mean algorithm (FCM). The study involves the numerical values of the parameters observed at 0000 UTC and is performed in two stages. In the first stage , the hard c-mean technique is applied to cluster the days of a semi-supervised data set in the above mentioned two categories and the backward selection procedure is used to find the best possible combination of the theoretically influential atmospheric parameters that play the dominant role in the categorization on basis of performance score (PC). Though FCM the technique is usually applied to supervised data set, but here, in the second stage of this study, this technique is applied to the semi-supervised data set of parameters to justify the result obtained in the first stage.
The final iteration in the first stage shows that the combination of maximum vertical velocity and P-PLCL at 1000 hpa level performs best in detecting the thunderstorm days so far the present data set is concerned. It is interesting to note that this finding is also supported by FCM in the second stage of the study, where in the final iteration the center of the cluster consisting of thunderstorm days moves closer to the parameters , maximum vertical velocity and P-PLCL at 1000 hpa level (the parameters, P and PLCL represent respectively the pressure at the reference level and that at the corresponding lifting condensation level which is also considered as the cloud base) than that of the other cluster containing the non- thunderstorm days.