Text Character Extraction Implementation from Captured Handwritten Image to Text Conversionusing Template Matching Technique

. Images contain various types of useful information that should be extracted whenever required. A various algorithms and methods are proposed to extract text from the given image, and by using that user will be able to access the text from any image. Variations in text may occur because of differences in size, style,orientation, alignment of text, and low image contrast, composite backgrounds make the problem during extraction of text. If we develop an application that extracts and recognizes those texts accurately in real time, then it can be applied to many important applications like document analysis, vehicle license plate extraction, text-based image indexing, etc and many applications have become realities in recent years. To overcome the above problems we develop such application that will convert the image into text by using algorithms, such as bounding box, HSV model, blob analysis,template matching, template generation.


INTRODUCTION
Handwritten text from image has been a subject of study and research for many years.There are different methods currently employed to improve the visibility of text in text documents.We use different algorithms to improve the accuracy of the existing system.In this method to improve readability of text documents through manually enhancing text strokes is proposed [1].We develop such application that will convert the image into text by using algorithms, such as bounding box, HSV model,blob analysis, template matching, and template generation.
To develop an interface to train the application for understands the difficult and unreadable handwritingCollect handwritten characters, symbols as an input data and train the application.Give handwritten character input as an image and extract the character.Generate the document file from the given image.We assume the input as A-Z or a-z.To simplify the recognition process, certain pairs of similar upper and lower case characters are combined into singleClasses like Cc, Kk, 00, Ii, Pp, Ss, Uu, Vv, Ww, Xx, Yy, and Zz etc [2].The database that is used for these experiments consists of about 10,000 isolated hand printed characters.Connected component analysis was used to locate blobs that were about the size of characters.The characters were manually identified and Stored in the database [2].
In case of offline character recognition the handwritten character is typically scanned in form of a paper document and made available in the form of a gray scaleimage to the recognition algorithms [3].

Literature Study
To improve the text of historical document through the multi-resolution Gaussian and to improve the readability of the text two algorithms are used LU algorithm and Gaussian filter.Advantage is to detect text of different scale.Disadvantage is work for image of small resolution (Oliver A. Nina) [1].
Text images are recognized in the text file using a character based bi-gram language model for that bi-gram language model is used .Advantage is to improve the recognition performance.Disadvantage is the presented approach will be verified on text images only (Yanwei Wang,Xiaoqing Ding,Fellow,IEEE and Changsong Liu) [2].
To translate images of typewritten or handwritten characters into electronically editable format by preserving font properties OCR algorithm is used.But it can't recognize multiple font size.In this paper used algorithm is Optical Character Recognition (OCR).Advantage is decrease some possible human errors and high speed of recognition.Disadvantage is multiple font and size characters and handwritten characters are not recognize (Faisal Mohammad,Jyoti Anarase,Milan Shingote and Pratik Ghanwat) [3].
A handwritten Arabic text written by each writer can be seen as a having a specific texture and characterization of texture image is based on run length of image gray level.In this paper two methods are used Gray Level Run Length (GLRL), Gray Level Co-occurrence Matrices Disadvantage is only text area is detected (A.J.Jadhav ,Vaibhav Kolhe and Sagar Peshwe) [5].
Image is captured by the camera, that image contains many text formats.This text information can be extracted from the image by using edge detection, collected component and texture based algorithms, two algorithms are used Compare Prewitt edge detector and Gaussian edge detector.Advantage is only character extraction removes non text image.Disadvantage is work only for natural scene image (Prof.Amit Choksi ,Nihar Desai,Ajay Chauhan,Vishal Revdiwala and Kaushal Patel) [6].
To extract text lines and words from handwritten document Segmentation Algorithm and Viterbi Algorithm are presented .The line segmentation algorithm is based on locating the optimal succession of text and gap areas within vertical zones by applying Viterbi algorithm.Word segmentation is based on a gap metric that exploits the objective function of a softmargin linear SVM that separates successive connected components.Advantage is it provides 97 percent accuracy.Disadvantage is it does not provide generalize well to variations encountered in handwritten documents (Vassilis Papavassilioua,Themos Stafylakisa,Vassilis Katsourosa andn George Carayannisa) [7].

Proposed system : Text extraction using template matching
In proposed system,we are using different algorithms to improve the accuracy of the text document or text image [16].It uses the image as input can be captured by laptop camera.After that image is matched using the template matching.It uses the filter to filtering image means to remove the noise, low frequency then it takes only the high frequency images.Then the image is cropped by using bounding box.Then convert image into Gray scale using HSV model, HSV means Hue, Saturation and Value .Extract the character from the image using the blob algorithm.Blob means separate the word into character and each character is called blob.Then matches the character with the trained dataset using template matching algorithm .Then generate the character sentence by using template generator.Then generate document file(.docfile).Source:http://docs.opencv.org/2.4/doc/tutorials/imgproc/histograms/template_matching/template_matching.ht ml

Bounding Box
The "bounding box" of a limited geometric object is the box with minimum area or minimumvolume, that contains a given geometric object.For any collection of linear objects like points segments ,polygon ,lines ,etc their bounding box is given by the minimum and maximum coordinate values for the point set S of all the object's n vertices [15].Suppose finding the area of a rectangle, area means the "level ground, an open space,"The number of square units it takes to completely fill a rectangle.Formula: Width × Height Try this Drag the orange dots to move and resize the rectangle.The area of a rectangle is given by multiplying the width times the height.As a formula: where w is the width h is the height so the formula is w*h. Source:http://www.mathopenref.com/coordtriangleareabox.html

Blob analysis
Blob Analysis is a fundamental technique of machine vision based on analysis of consistent image regions.The numbers of words are divided into characters, the performance of the algorithm both in terms of accuracy and efficiency is increases.Hence the number of words should be decided based on scanned image.

HSV model
The HSV color model, also called HSB (Hue, Saturation, Brightness), defines a color space in terms of three constituent components: -Hue is the color type (such as red, magenta, blue, cyan, green or yellow).Hue ranges from 0-360 deg.
-Saturation refers to the intensity of specific hue.Saturation ranges are from 0 to 100%.In this work saturation is presenting in range 0-

Template Generator
Template Generator is used for automatic generation of templates of numerals of a certain font.After applying template matching, template generator is used to form the word from the separated characters in blob analysis.

Template Matching
First image is captured with the help of camera or browse image.Then template matching algorithm is used to calculate the size of the image, if image size is greater than 15cm x 15cm then don't accept the image, otherwise accept it.

Bounding Box
The output of Template Matching module i.e. 15cm x 15cm size image is taken as an input for Bounding Box.For discard the unwanted portion of the image use bounding box algorithm.By applying bounding box algorithm it take only text region from the image and remaining part from the image is discarded.

HSV Model
In HSV image is calculate Hue, Saturation and Value separately as a pixel.HSV model is used to convert the image gray scale.HSV model takes the input from Bounding Box.Then combine all these pixel as output.In output image display the text by black color and make background as white.

Blob Analysis
After getting gray scale image separate the word into the character using blob analysis.Each character is called as a blob.Each character is match with the training database.

Template Generator
In template generator input is taken from template matching, each generated character is combined into word and generate the.doc file.

CONCLUSION
Though we have large no of algorithms and methods for text extraction from image butnone of them provide accurate result .Pattern Matching can be implemented successfully for English language text character recognition.The system has image processing modules for converting text image to generate .docfile.The expected experiment recognition rate will be good.

Future Scope
As it is restricted only for English language in future make this application other than English language.In future make this application for Android mobile phones as an App.And will be train for graphical symbols also.

Figure 1
Figure 1 Architectural Diagram

Figure 2 :
Figure 2: Example of Bounding Box Source:Nafiz Arica and Fatos T. Yarman-Vural,"An Overview of Character Recognition Focused on Off-Line Handwriting".

Figure 5
Figure 5 Before applying Figure 6 Afterapplying BoundingBox Bounding Box

Figure 9 Figure 10 Figure 11 .
Figure 9 Before applying Blob Analysis

Figure 12 .
Figure 12.Final output using word doc