A Novel Technique for Shape Feature Extraction Using Content Based Image Retrieval

With the advent of technology and multimedia information, digital images are increasing very quickly. Various techniques are being developed to retrieve/search digital information or data contained in the image. Traditional Text Based Image Retrieval System is not plentiful. Since it is time consuming as it require manual image annotation. Also, the image annotation differs with different peoples. An alternate to this is Content Based Image Retrieval (CBIR) system. It retrieves/search for image using its contents rather the text, keywords etc. A lot of exploration has been compassed in the range of Content Based Image Retrieval (CBIR) with various feature extraction techniques. Shape is a significant image feature as it reflects the human perception. Moreover, Shape is quite simple to use by the user to define object in an image as compared to other features such as Color, texture etc. Over and above, if applied alone, no descriptor will give fruitful results. Further, by combining it with an improved classifier, one can use the positive features of both the descriptor and classifier. So, a tryout will be made to establish an algorithm for accurate feature (Shape) extraction in Content Based Image Retrieval (CBIR). The main objectives of this project are: (a) To propose an algorithm for shape feature extraction using CBIR, (b) To evaluate the performance of proposed algorithm and (c To compare the proposed algorithm with state of art techniques.


Introduction
Content based image retrieval system is a very important task in digital image processing based systems. Normally the data base of images becomes stacky as the no. of images are added on during the scanning processing like in schools, colleges, municipality, registrar offices etc. This is due to the fact that in each government offices, all the records are being digitized for fast retrieval of the same. This gives rise to stack of images building up in image data base. Now, the problem occurs when images belonging to a query property are sought from the data base. This becomes a very tedious task for a large data base. Here, comes the need of the content based image retrieval system. The image data base can be organized in way so as to flag the images with some keywords. These keywords may in future become the index for cbir systems. However, in the presented approach, the cbir system consists of an engine that itself determines the contents of the image based on input property and from the data base images as well. The data base images those show close resemblances to that of the query image in terms of features set are retrieved and the process is termed as content based image retrieval system.

Proposed Algorithm
The proposed work consists of the following steps: Step I: Image Database generation and

ImagAcquisition in Matlab Environment
The image data base is prepared by taking the images of different nature and the same are rotated at different angles for validation and testing of the algorithm. Moreover, the images in the proposed database are different from the base database as they are rotated at different angles ,their size is also increased or decreased. Also, the proposed database includes various other types of images in addition to the base database. The reason for this is to take into account the Scaling, Translation and Orientation invariance.The images are read in jpeg format. Following image show a snap shot of the data base used for the purpose.

Figure 1. All database images
Step II: Database Images Path Setting In this step ,the path for various database images and the text file is set. The path for database images is set to fetch the various images from the database. Moreover, the text files are created to store the various results in different text files. Then the DIR command is used with the size command to calculate the total number of images in the image database. Further, the various arrays are created to store the corresponding values during program execution. Also, as the program uses 22 features for extraction , the various vaulues are set for diffrent parameters.
Step III: Enter the Query Image, Margin and Threshold The Query Image number , Margin and Threshold is dynamically entered by the user. Image database consisits of total 45 different image. The margin is ranging from 0 to 100 and the threshold from 1 to 256. The user enter the various values according to his/her choice.
Step IV: Convert RGB to Gray Scale.
In this step the original image i.e RGB image or Gray Scale is converted into Gray image as a preprocessing technique. By nature, the jpeg image format is in 24-bit color. The jpeg image format is converted to gray scale conversion using rgb2gray() command in matlab. Thereby giving the gray scale i.e. 8-bit image. Step V: Image Binarization The gray image is converted into binary image by threshold decomposition, thereby isolating the foreground comprising the yarn from the background. This binary image is further used for feature extraction. Most of the histogram based thresholding techniques work on the assumption that foreground area is comparable to that of background area. Generally in a yarn image the mode of foreground is much less compared to that of the background. Hence, in order to make foreground mode comparable to that of the background, attenuation of the background mode is performed on the histogram H of the cleaned image. Local and global thresholding approaches may be utilized to produce a binary image from the gray scale image. from 1 to 255. The algorithms will not work properly for threshold >=200 as the threshold increases it will result in information loss for darker images. So , for darker images the threshold must be low and for lighter images it may be high.
Step VI: Statistical Feature Set Generation Basically, the base algorithm [1] only uses 5 features but in the proposed algorithm, a total 22 features are extracted from the database images including the query image namely: Area, Figure Aspect As the feature set is increased the retrieval efficiency in terms of accuracy is also increased. The complexity of proposed algorithm is also increased.
Step VII: Feature Normalization All features are normalized with respect to mean radius of the pattern . It makes all the statistical features independent of size of the pattern. The set of described statistical features may be termed as figures of merit to classify an object.
Step Step X: The step II and step VII are repeated for all data base and query image.
Step XI: Retrieve the data base images with minimum standard deviation falling in the error margin range.
The error margin must be between 0 to 100. This can be changed according to the requirement. If the error margin is increased then accuracy will decrease accordingly. Moreover , as the margin increases the number of images retrieved will also increases.The retrieved images are then classified into different categories in order to further classify into relevant categories.

Step XII: SVM Classifier
The SVM Classifier is used to classify all the database images and the query image into different catagories i.e according to their releance and non-relevance. It constructs a separating hyper plane that maximizes the margin between the images relevant to query and the images not relevant to the query.

Step XIII: Classify retrieved images into different classes
The SVM is used to classify the images into separate classes. Basically , it is a binary classifier. The features used to differentiate the images or retrieving the images from database are the key factors that improves the width of the hyperplane. The SVM will create a hyperplane based on the grouping of retrieval and rest of the images. Higher the width of hyperplane ,better the resolution of CBIR system.

Experimental Results
The proposed algorithm is applied on sample database images .In base algorithm [1], the number of features used to extract various features is only five as compared to twenty two features as in case of proposed algorithm. Thereafter, the various database images are processed at varying margins and thresholds. Furthermore, the margin is ranging from 0 to 100 and the threshold from 1 to 256. The algorithms will not work properly for threshold >=200 as the threshold increases it will result in information loss for darker images. So , for darker images the threshold must be low and for lighter images it may be high. Moreover , as the margin increases the number of images retrieved will also increases. The completion time decreases as the threshold increases for both base and proposed algorithm. The The difference in completion time is too large between base and proposed algorithms. Moreover, the percentage of relevant images for both algorithms is almost similar i.e. 100 %. The comparison between the proposed and the base algorithm, 7.4% has the same accuracy, 26.66% has decreased amount of accuracy as compared to the base algorithm and 65.92% has the increased accuracy for margin 40. With the increase in the margin , it has been observed that the accuracy of the proposed algorithm as compared to the base algorithm remains either same or decreased in majority. Now, for margin above than 60, it has been observed that the accuracy of the proposed algorithm decreases a lot as compared to the base algorithm and there is a subsequent rise in the completion time also. The accuracy of the proposed algorithm decreases due to the increase in the margin. As the features has been increased in the proposed algorithm , the accuracy needs to be increased but the rise in the margin put the opposite effect. The ratio of the decrease in the accuracy for majority cases has been seen as compared to the base algorithm.

Conclusion
A CBIR system using Legendre Moments is proposed in this work. The Legendre moments based features are quite unique and exhibit fair image retrieval performance when tested on real time data base of images. The features are normalized with respect to rotation and size so that the image when rotated at some angle appears same as at the original angle and size. The support vector machine classifier classifies the retrieved images further into two categories: one with most similar and others. This way a content based retrieval system becomes two tier image retrieval system. In one tier, the images are retrieved using the global features and in second tier they are further fine tuned to their respective category using the support vector machine classifier. It has been observed that the average retrieval efficiency is increased as the feature set increases. Also, it has been observed that the classification efficiency of the proposed CBIR system increased with the increase in the number of training samples. The result tables shows the accuracy of the algorithm increases below the margin of 50. But this margin accuracy starts to decrease above the margin of 50. Furthermore, the main drawback of the proposed algorithm is that the completion time increases with the increase in the feature set.

Future Scope
The proposed system finds application in the area where images are retrieved frequently for research purposes. The future work may be proceeded in time or speed of the operation of the algorithm .The presented work is on model basis and speed of the operation has not been taken care of. However, the effectiveness of the algorithm increases if the speed of operation improves. Further, as the size of the image or data base increases, the speed deteriorates. In the presented work, accuracy of the algorithm is of prime importance.