Context Quantization based on Minimum Description Length and Hierarchical Clustering

The code length of a source can be reduced effectively by using conditional probability distributions in a context model. However, the larger the size of the context model, the more difficult the estimation of the conditional probability distributions in the model by using the counting statistics from the source symbols. In order to deal with this problem, a hierarchical clustering based context quantization algorithm is used to combine the conditional probability distributions in the context model to minimize the description length. The simulation results show that it is a good method for quantizing the context model. Meanwhile, the initial cluster centers and the number of classes do not need to be determined in advance any more. Thus, it can greatly simplify the quantizer design for the context quantization problem.


Introduction
Entropy coding based on the probability distribution of an information source is used in the lossless compression of the source.The set of probability distributions of the current symbol which are conditioned on the past observations is called the context model.Clearly, it is the model that determines the rate at which we can encode the symbol sequence.According to the information theory [1], we know that the conditional entropy is less than or equals to the unconditional entropy and the more conditions may result in the lower conditional entropy, which is described by The reduction of the entropy of a source, which is the lower bound of the average code length, will increase the possibility of the compression of the source.However, too large a modeling context spreads the counting statistics too thin among all possible modeling samples to achieve a good conditional probability estimate.This phenomenon is commonly called "context dilution" [2].Context quantization is an approach to tackle this problem, where the total number of conditional probability distributions is reduced by combine similar distributions with some kind of clustering algorithm.In [3], the authors designed a vector quantization like context quantization algorithm which is called the minimum conditional entropy context quantization MCECQ algorithm and the conditional probability distributions are merged with K-means clustering algorithm.A similar method is proposed in [4] to which the optimization objective is the maximum mutual information (MMI) between the current symbol and the contexts.In these algorithms, K-means clustering is used to implement the context quantization.However, the number of classes and initial cluster centers should be set in advance and the algorithm is easy to be trapped into local optima.In [5], Forchhammer et.proposed the minimum adaptive code length context quantization (MCLCQ) algorithm which is implemented with the dynamic programming algorithm.The most important advantage of this algorithm is that it does not need the predetermined number of classes and the initial cluster centers.Meanwhile, the optimality is ensured by using the dynamic programming.Another way to implement this context quantization technique is to use the shortest path algorithm [6].However, these two context quantization methods can only be applied to binary source coding.
In the light of the above observation, we want to find a clustering algorithm that does not need to know the optimal number of classes and the initial cluster centers in advance.In fact, the clustering algorithm is divided into partitional clustering algorithms and hierarchical clustering algorithms.K-means algorithm is the most well-known partitional clustering algorithm, where the K initial cluster centers can be determined by using the randomly selected K objects.An iterative procedure is used to assign the objects to their most appropriate classes according to a distance measure between an object and its cluster centers until a given criterion is met.The computational complexity of K-means is moderate and generally, the similarity among the classes is low.The hierarchical clustering algorithms can be divided into agglomerative hierarchical clustering and split hierarchical clustering algorithms [7].The most commonly used are the agglomerative hierarchical clustering algorithms.In this kind of algorithms, each object is treated as a class at first.And then, they are merged one by one, according to some distance measure, until all objects are merged into one class or the algorithm can be stopped by a given termination condition.The agglomerative hierarchical clustering is used in statistical classification with a large data sample set initially.In [8], it is used in the segmentation of image pixels and in [9] it is applied to the fingerprint recognition.In this paper, a hierarchical clustering based context quantization algorithm is proposed.The optimal number of classes and the initial cluster centers do not need to be determined in advance and good context quantization results can still be obtained.

Hierarchical Clustering
It is known that the agglomerative hierarchical clustering algorithm is a popular method that is mostly used in the hierarchical clustering.The various similarity measures between the classes can be applied in practice, the four mostly used similarity measures are as follows: (1)The minimum distance The most important advantage of the hierarchical clustering is that the number of classes and the corresponding clusters are directly obtained at each level of the tree structured clustering procedure.However, the computational complexity of hierarchical clustering is higher than that of partitional clustering.
A basic hierarchical clustering algorithm will eventually combine all objects into one class.However, if the optimal number of classes is reached, we need to terminate the hierarchical clustering procedure with a reasonable criterion.The description length introduced by Rissanen in [10] can be used as the criterion.Since it reflects not only the complexity of a statistical model, but also the average code length of a source sequence coded based on this model.

Description Length
In fact, (7) can be calculated in the form of the factorial operation However, the computation of ( 8) is inefficient due to the high cost of the factorial calculation.Instead, the Striling formula ( 9) is used in our work to calculate the factorials in (8) such that the computational complexity can be alleviated.
The total description length of the sequence t x x ,..., 1 can then be calculated by adding together the description lengths for all possible context events: . From [10], the cost for transmitting the specific subsequence in the source sequence From the above analysis, it is apparent that the cost for encoding a subsequence includes two parts, i.e., the cost for encoding the symbols in the subsequence:

Context Quantization based on the Minimum Description Length
As stated in section II, a suitable similarity measure between classes should be defined to enable the hierarchical clustering.Here, we propose a description length based similarity measure to fulfill this requirement.
In fact, this increment reflects the difference of the description length before and after the merging of the two probability distributions.According to (10), this increment can be represented by Thirdly, it may take a negative value.This feature means that in the context quantization procedure for a given source sequence, the optimal number of classes can be obtained by minimizing the total description length for the sequence since during the evaluation of the fitness of the merging of two probability distributions, a negative increment means the merging will reduce the total description length while a positive one will not.In this work, the context quantization is actually implemented by merging the conditional probability distributions .In this way, the clustering operation will be stopped when no merging of two distributions can result in a negative mn L ' . The total description length of the remaining conditional probability distributions will reach the minimum at this time.These remaining classes actually characterize the final clustering results and the optimal number of classes is found accordingly.The context quantization based on hierarchical clustering and the minimum description length is listed as follows Step4.If no candidate pair is found, stop the algorithm and output the quantization result.Otherwise, go back to Step 2 for the next iteration.

Simulation
In our simulations, the original 256 by 256 gray scale images with 8 bits per pixel are quantized to images of the same size but with 3 bits per pixel (8 gray levels) and we quantize 7 gray scale images (8 bits per pixel) into 8 gray level per pixel images as the testing source sequence.Using the quantized images as the source sequences is to simplify the simulation.Among the 7 quantized images, two images (Girl and Barb) are used as the training sequences to estimate the 2-order conditional probabilities , where have 64 possible combinations.Context quantization is implemented using the proposed algorithm to merge these 64 conditional probability distributions.After context quantization, we can obtain the mapping scheme in which each conditional probability distribution is mapped into its corresponding cluster.We use this mapping scheme to help the coding of the test images.Three images (Lena, Woman and Baby) are used as the test sequences and the conditional probability distributions in the quantized context model are applied to drive an arithmetic encoder to implement the compression.The resulting total numbers of bits in all test images are listed in Table I.For comparison, the coding results based on context quantization implemented by the K-means clustering algorithm (MCECQ) are listed in Table II.

Conclusion
Context quantization in high order entropy coding is applied more and more widely.With the employment of hierarchical clustering and the description length, the proposed context quantization algorithm successfully address the problems that the optimal number of classes and the initial cluster centers have to be given in advance, which are faced in the MCECQ algorithm.In addition, simulation results show that better coding results can be obtained with the context quantizer designed with the proposed algorithm.
the distance between the classes p and ' p ; i m is the mean of class i c , j m is the mean of class j c , i n is the number of objects in class i c , j n is the number of objects in class j c .With a properly defined distance measure between two classes, the agglomerative hierarchical clustering algorithm which minimizes the given distance measure at each level can be implemented as follows: z Step1.Initialize each object as a class, calculate the distance between each pair of objects.z Step2.Merge the two classes into a new class with the minimum distance.z Step3.Recalculate the distances among the new class and all other classes.z Step4.If the remaining number of classes is not equal to one, go back to Step 2 for the next iteration. Where

N 1 ,
denotes the number of those symbols, in the sequence t x x ,..., with context event n c and taking on value i .Apparently, t N is the sum of the counts in the vector

L
denote the description length of the merged probability distributions estimated from the counts vector obtained by adding together the two counts vectors for estimating the description length can be observed when these two conditional probability distributions are merged: possible n c ) by the hierarchical clustering algorithm and every conditional probability distribution can be viewed as a class at the beginning of the clustering.Thus, we use the above mentioned increment mn L ' as the distance measure between two conditional probability distributions in the hierarchical clustering operations.During the clustering procedure, a probability distribution merging and is finally merged with the one that can minimize mn L '

Table 1 .
The Results For The Test Images By Hierarchical

Table 2 .
The The optimal number of classes of the context quantizer by the proposed algorithm is 27 based on the given training sequences.In Table , various numbers of classes are tested and the related context quantizers are designed accordingly by using the MCECQ algorithm.It can be found that the best coding results for all test images are also obtained when the number of classes is set to 27 for the same training sequences.The coding results for the three test images are almost the same as those in Table .However, due to the fact that initial cluster centers are randomly selected, the context quantizers are designed by multiple executions of the algorithm for each given number of classes and the one with the best coding results is chosen.For the proposed algorithm, only one execution is needed although its computational complexity is higher than that of the MCECQ algorithm.That means it takes a longer time for the MCECQ algorithm to design a good context quantizer than the proposed algorithm.Therefore, the good coding results indicate that the design objective is achieved by the proposed hierarchical clustering based context quantization algorithm.