Clustering with fuzzy supervised algorithm

In GK-algorithm, modified Mahalanobis distance with preserved volume was used. However, the added fuzzy covariance matrices in their distance measure were not directly derived from the objective function. A Fuzzy C-Means algorithm based on Mahalanobis distance (FCM-M) was proposed to improve those limitations of GG and GK algorithms, but it is not stable enough when some of its covariance matrices are not equal. In this paper, an improved Supervised Clustering Algorithm Based on FCM by taking a new threshold value and a new convergent process is proposed. The experimental results of real data sets show that our proposed new algorithm has the best performance. Not only replacing the common covariance matrix with the correlation matrix in the objective function in the Supervised Clustering Algorithm.


Introduction
To overcome the drawback due to Euclidean distance, we could try to extend the distance measure to Mahalanobis distance (MD).However, Krishnapuram and Kim (1999)pointed out that the Mahalanobis distance can not be used directly in clustering algorithm.Gustafson-Kessel (GK) clustering algorithm and Gath-Geva (GG) clustering algorithm were developed to detect non-spherical structural clusters.In GK-algorithm, a modified Mahalanobis distance with preserved volume was used.However, the added fuzzy covariance matrices in their distance measure were not directly derived from the objective function.In GG algorithm, the Gaussian distance can only be used for the data with multivariate normal distribution.To add a regulating factor of each covariance matrix to each class in the objective function, and deleted the constraint of the determinants of covariance matrices in the GK algorithm, the Fuzzy C-Means algorithm based on Mahalanobis distance (FCM-M).Fuzzy partition clustering is a branch in cluster analysis and it is widely used in pattern recognition.Among many well-known fuzzy partition clustering algorithms, Bezdek Hence, those fuzzy partition clustering algorithms can only be used for the data set with the same super spherical shape for each class.Instead of using Euclidean distance measure, Gustafson and Kessel (1979) proposed the G-K algorithm which employs the Mahalanobis distance.It is a fuzzy partition clustering algorithm which can be used for the classes with different geometrical shapes in the data set.However, without the prior information of the shape volume for each class, the G-K algorithm can only be utilized for the classes with the same volume.In other words, if any dimension of a class is greater than the number of samples in the class, the estimated covariance matrix of the class may not be fully ranked.Hence, the algorithm will induce the singular problem for the inverse covariance matrix.This is an important issue need be addressed when we use the G-K algorithm for clustering.To overcome the issues, a new solution is proposed.A regulating factor of the covariance matrix is added to each class in the objective function, and the constraint of the determinant of the covariance matrices defined in the G-K algorithm is removed.Furthermore, the FCM-AM algorithms included two algorithms, FCM-M and FCM-CM, proposed by our previous works (Hsiang-Chuan Liu, Jeng-Ming Yih, Shin-Wu Liu, 2007Liu, , 2008)).

Fuzzy Partitions Clustering Algorithms Based on Euclidean Distance
The popular fuzzy c-means algorithm based on Euclidean distance function converges to a local minimum of the objective function, which can only be used to detect spherical structural clusters.Gustafson-Kessel clustering algorithm and Gath-Geva clustering algorithm were developed to detect non-spherical structural clusters.However, Gustafson-Kessel clustering algorithm needs added constraint of fuzzy covariance matrix, Gath-Geva clustering algorithm can only be used for the data with multivariate Gaussian distribution.The objective of a fuzzy clustering algorithm is to partition the data into clusters so that the similarity of data objects within each cluster is maximized and the similarity of data objects among clusters is minimized.In the objective function based methods, the objective function is a function of data matrix, membership matrix and prototypes of clusters.It measures the overall dissimilarity of data objects within each cluster.Hence, by minimizing the objective function, we can obtain the best partition of the data set.

Fuzzy C-Means Algorithm
To overcome the drawback due to Euclidean distance, we could try to extend the distance measure to Mahalanobis distance (MD).However, Krishnapuram and Kim (1999) pointed out that the Mahalanobis distance can not be used directly in clustering algorithm.Fuzzy C-Means Algorithm (FCM) which objective function of FCM is given.

Fuzzy Clustering
Clustering technique plays an important role in data analysis and interpretation.Fuzzy clustering is a branch in clustering analysis and it is widely used in the pattern recognition field.Fuzzy clustering algorithms can only be used to detect the data classes with the same super spherical shapes.To overcome the drawback due to Euclidean distance, we could try to extend the distance measure to Mahalanobis distance (MD).However, Krishnapuram and Kim (1999) pointed out that the Mahalanobis distance can not be used directly in clustering algorithm.Gustafson-Kessel (GK) clustering algorithm and Gath-Geva (GG) clustering algorithm were developed to detect non-spherical structural clusters.In GK-algorithm, a modified Mahalanobis distance with preserved volume was used.However, the added fuzzy covariance matrices in their distance measure were not directly derived from the objective function.In GG algorithm, the Gaussian distance can only be used for the data with multivariate normal distribution.To add a regulating factor of each covariance matrix to each class in the objective function, and deleted the constraint of the determinants of covariance matrices in the GK algorithm, the Fuzzy C-Means algorithm based on Mahalanobis distance (FCM-M) was proposed, and then For improving the stability of the FCM-M clustering results, Replace all of the covariance matrices with the same common covariance matrix in the objective function in the FCM-M algorithm.

The FCM-AM Algorithm and Its Special Cases
Using the Liu-algorithm, we can obtain the objective function of the Fuzzy C-Means algorithm based on Alternative Mahalanobis distances (FCM-AM) as following.

FCM-NM Algorithm
In this study, a fuzzy clustering algorithm called Fuzzy C-Means algorithm based on normalized Mahalanobis distance (FCM-NM) is used, which was improved by normalizing for each feature in the objective function and also replacing the threshold in the FCM-CM algorithm.We can obtain the objective function of FCM-NM as following:

Experiment Real Data
In this study, Linear algebra test for university students is designed by author.The instrument consists of 19 dichotomous items which measure 6 concepts.The data set used in the experimental study is an educational data from university students in Taiwan.There are 231 university students from Taiwan in this test.The tool consist of six concepts, its contents are shown in Table 1.Vector space and the property of R n 5 Eigen-value and eigenvector 6 Geometry of linear algebra Applying fuzzy clustering algorithms as we mentioned above, the clustering performances of each algorithm are calculated with same fuzzier m = 2 and the clustering accuracies are compared and shown in Table 2.  2. From this table, we can find that the performance of FCM algorithm always worse than FCM-AM for above dataset.Although the performance of FCM-NM algorithm is better than which of FCM-AM algorithm in the dataset.In other words, our proposed two algorithms, FCM-NM and FCM-AM are better than FCM algorithm.Hence, the new algorithm, FCM-NM, has the best performance.

Conclusion
Clustering technique plays an important role in data analysis and interpretation.It groups data into clusters so that the data objects within a cluster have high similarity in comparison to one another, but are very dissimilar to those data objects in other clusters.The well-known FCM is based on Euclidean 's Fuzzy C-Means (FCM) (1981), Pal, Pal and Bezdek's Possibility C-Means (PCM) (1993), and Pal, Pal and Bezdek's Fuzzy Possibility C-Means (FPCM) (1997) are all based on Euclidean distance measure for clustering.

Table 1 .
The content of Concepts

Table 2 .
The content of Concepts distance function, which can only be used to detect spherical structural clusters.GK algorithm and GG algorithm were developed to detect non-spherical structural clusters.However, the former needs added constraint of fuzzy covariance matrix, the later can only be used for the data with multivariate Gaussian distribution.Three improved Fuzzy C-Means algorithm based on different Mahalanobis distance, called FCM-M, FCM-CM, and FCM-NM were proposed by our previous works.In this paper, a further improved Fuzzy C-Means algorithm based on a normalized Mahalanobis distance (FCM-NM) by taking a new convergent process is proposed.The experimental result of the real data set shows that our proposed new algorithm has the best performance.