Image segmentation method of rail head defects and area measurement of selected segments

. The operation safety of railway transport, which is the most important economic and social factor, is largely determined by the technical condition of the rail track and measures to maintain the quality of its track management system. One of the system elements for ensuring the accident-free operation of the track is the technical diagnosis of rails using a method complex of non-destructive control of rails, such as acoustic (ultrasonic), magnetic, combined, etc., and monitoring of the track using methods of measuring the geometry of the rail track and its disturbances. When the wheel interacts with the rail, especially on high-speed and load-stressed sections, defects and damage inevitably occur in the rails. A rather large share of such defects are on the rolling surface of the rail head. Formed defects develop rapidly, which seriously complicates the safety of train traffic. Therefore, accurate and quick detection of defects on the rolling surface of the rail head is very important. However, it is quite difficult to detect defects on the rolling surface of the rail by the acoustic (ultrasound) method due to the violation of tight contact between the rolling surface of the rail head and the piezoelectric transducer. In this case, it is quite convenient to detect surface defects of the rail head using video control. The article provides a comparative analysis of segmentation methods. There has been presented the method of image segmentation of main rail defects based on general contour preparation and parallel-hierarchical (PH) transformation using their classification. The parallel-hierarchical transformation method allows to increase the segmentation accuracy of individual areas in the original image compared to similar ones. The algorithm of pyramidal generalized-contour preparation and the criterion system allows, by calculating the threshold for each level of the gray scale, to present the study of the image with the corresponding contour preparations at the segmentation level. Modeling of recursive generalized-contour preparation and PH transformation method for image segmentation problem of rail head defects shows that, compared to the segmentation method based on the increase of areas, the accuracy of image segmentation is better. A modified method of calculating the image contour area based on the coding of lines forming the boundaries of the black and white areas of the two-gradation image has been given.


Introduction
Accelerated movement of passenger trains with freight traffic is used on the railways of Ukraine.Such operating conditions need stricter requirements for railway track maintenance and a more reliable diagnostics system of rails and switch elements.
Traditional methods of non-destructive inspection of rails, such as acoustic (ultrasonic), magnetic, combined or do not detect surface defects of the rail head and the application is economically inefficient.
Therefore, many scientists and specialists study and develop other methods of non-destructive testing that will allow more reliable and cost-effective detection of defects on the rolling surface of the rail head [1].
The combined application of ultrasonic control lasers for checking surface defects in the rail is considered in [2].The article [3] describes the method of using four linear lasers and eight video cameras to detect defects on the rail surface.The method for detecting semantic segmentation was developed in [4] using various collected data about cracks on the rail surface and deep learning using a neural network.Firstly, in order to identify defects on the rolling surface of the rail head, it is necessary to select the rails themselves from the obtained image of the upper railway track structure.In this case, it is possible to apply image segmentation.
Segmentation is the division of an image into parts based on certain features that characterize these parts or the whole image.Image segmentation is an important stage of image processing, as it allows meaningful image analysis and information extraction [5,6].Image segmentation can be done in different ways.Brightness segmentation, contour segmentation, shape, texture, or other segmentation are used depending on the type of images.[7][8][9][10].Dividing the image into many sections gives it more significance and facilitates its processing [11,12].Growing areas in the specified ways provides fairly accurate segmentation of simple elements with a small number of objects, but without texture.However, this method does not give good results for more complex elements.To overcome this shortcoming the idea of expanding segments has been improved with a restriction on merging, based on the Bayesian evaluation of features for each segment.
Models of threshold limitation, morphological gradient, and region expansion models have become widespread for image segmentation problems [5,6].Segmentation algorithms can be noted, such as: the WaterShed (watershed) segmentation algorithm for a small number of local minima in the image, but in the case of a large number of them, excessive division into segments occurs; the MeanShift (shift) segmentation algorithm, which groups objects with similar characteristics.Pixels with similar characteristics are combined into one segment, and at the output we get an image with homogeneous areas.FloodFill segmentation algorithm (filling or "flooding" method) can be used to select elements according to the color.In this case, , it is nessesery to chose the initial pixel and set the interval of changing the color of neighboring pixels relative to the original one.The interval can be asymmetrical.An algorithm will combine pixels into one segment (filling them with one color) if they fall into the specified range.The output will be a segment filled with a certain color and its area in pixels.GrabCut segmentation algorithm (cutting) is an interactive object selection algorithm developed as a more convenient alternative to the magnetic lasso (the user had to circle its contour with the mouse to select an object).For the algorithm to work, it is enough to place the object together with part of the background in a rectangle.But difficulties may arise during segmentation if there are colors inside the bounding rectangle and they are in large numbers not only on the object, but also on the background.
This article considers only the leading part of the existing algorithms.As a result of image segmentation, areas are highlighted that combine pixels according to selected features.FloodFill is suitable for filling objects of similar colour.GrabCut copes well with separating a specific thing from the background.Using the MeanShift implementation and OpenCV, pixels that are close in colour and coordinates will be clustered.WaterShed is suitable for images with a simple texture.Therefore, the segmentation algorithm should be chosen according to the specific task [13][14][15][16][17][18][19].
Comparing the above-mentioned methods, it has been found that the most common method for practical implementation is the method of increasing elements.According to this method, the image is presented in the form of a set of structurally connected elements, that is, the spatial relationship of image elements is taken into account, unlike threshold processing.It makes it possible to calculate image characteristics in the segmentation process, which can be used in further processing in recognition solving, in contrast to the morphological method.However, the main disadvantage of the element expansion method is the problem of choosing starting points, which prevents the complete automation of the segmentation process, and the problem of threshold selection can be attributed to the disadvantages of threshold methods.At the same time, the method is empirical and does not have a clear theoretical justification.
Analysis of the theoretical research on existing methods that model the segmentation process and mathematical models used in the segmentation process, and their practical application in industry research of various images, allows us to conclude that there is no sufficiently effective universal segmentation method.To date, many segmentation methods and algorithms for their implementation have been developed, but, unfortunately, those that satisfy the specified accuracy and reliability are quite complex and require a lot of time to implement.At the same time, models that differ in simple implementation and speed of operation do not provide the required accuracy and reliability [15].
Therefore, an actual problem is the development of theoretically based simple and fast methods that allow segmentation of halftone images in arbitrary configuration and placement and provide the required accuracy and reliability, which is, for example, a method of image processing with generalized contour preparation and parallel-hierarchical transformation [18][19], proposed for processing images of rail head cracks.

Mathematical model of the segmentation method
The paper proposes a method of direct image processing of rail removal for further processing.
The original superstructure image of the railway track is converted to halftone, to reduce the influence of colors on further processing.
Next, the histogram smoothing method is used to improve image contrast for being rails more contrasting by smoothing the distribution of pixel intensities.
There is an image in grayscale and with dimensions W x H.The number of pixel brightness quantization levels (the number of bins) is N. Therefore, each level has  = ×  pixels.Let () the density function of the distribution of illumination intensity on the input image ()and the desired density of the distribution is Denote the integral laws of the distribution of random variables  and  by () and ().According to the condition of probabilistic equivalence, there is () = (), as follows After that, it is necessary to evaluate the integral distribution law ().We find the histogram of the image and carry out its normalization by dividing the value of each bin by the total number of pixels, then the value of the bins can be considered as an approximate value of the distribution density function   ( = 0,1, … ,255).The approximate value of the distribution function can be written as follows We present the formalization of histogram equalization.The input image is in grayscale.
1. Find the histogram of image h.

Build the distribution function
according to the histogram.
3. We update the pixels of the image according to the rule.
Gaussian blurring method is used to reduce noise and remove small details.Gaussian blur is based on the use of a Gaussian function as a kernel for image convolution.
The equation of the Gaussian function in two dimensions has the following form: is the distance from the origin of coordinates along the horizontal axis,  is the distance from the origin of coordinates along the vertical axis, and  is the standard deviation of the Gaussian distribution.
Applying this formula in two dimensions a surface creates whose contours are concentric circles with a Gaussian distribution from the central point.The values of this distribution are used to construct a convolution matrix that is applied to the original image.The new value of each pixel is set as the weighted average value of that pixel's neighborhood.The source pixel value receives the largest weight (has the highest Gaussian value), and neighboring pixels receive lower weights as their distance to the source pixel increases.This results in a blur that preserves boundaries and edges better than other, more similar blur filters.
Convolutions are used to highlight horizontal and vertical borders on the image.Convolutional filters called "horizontal filters" are used to detect horizontal borders in the image.These filters have a dimension of 5x5, have negative weights on the outer rows, positive weights on the inner ones, and zero on the middle, similar to the contour preparation method, which creates positive, negative, and zero contour preparations.If there are horizontal boundaries in the image (for example, the boundaries between rails), then they will stand out as bright lines or edges in the image.Vertical filters, also called "vertical filters", are used to detect vertical boundaries in an image.These filters are also 5x5 in dimension, have negative weights on the outer columns, positive weights on the inner columns, and zero on the middle one.If there are vertical borders in the image (for example, separate columns between rails), then they will stand out as bright lines or edges in the image.
The basic idea of convolutions is that they convolute over an input image (or other kind of data) with a certain filter or kernel, performing multiplication and summation of values.The result of this operation forms an output image (or feature map) that displays important features detected in the input data.
The Kenny operator is used to highlight the contours of the boundaries found using convolutions.
First, the image is smoothed using a Gaussian filter.This helps reduce noise and remove fine details.
Then the smoothed image is filtered using a Sobel kernel in both the horizontal and vertical directions to obtain the first derivative in the horizontal () and vertical () directions.From these two images, we can find the edge gradient and direction for each pixel as follows: The direction of the gradient is always perpendicular to the edges.It is rounded to one of the four corners representing the vertical, horizontal and two diagonal directions.
The found gradients are checked, and only the local maxima of the gradient remain as possible boundary locations.This helps to reinforce real boundaries, similar to marginal ones.
Threshold filtering is then applied, where highgradient pixels are retained and low-gradient pixels are discarded.This threshold can be set by the user or automatically selected by the algorithm, as it is done when searching for a threshold in contour preparation.
Detected border pixels can be additionally processed by one of the border search algorithms, for example, using the operation of connecting components or morphological operations to highlight the contours of objects.
Since the rails have straight lines, it makes sense to use the Gaff transform method, which is for detection of straight lines in an image.
Each border point is represented as a point (, ) on the image.
All possible parameters (position, angle and/or radius) defining a geometric shape (for example, a straight line or a circle) that can pass through this point are generated for each boundary point.
An "accumulation matrix" or "accumulation space" is created, where each possible set of parameters (position, angle, radius, etc.) is represented as a vote or value in this space.
The vote in the accumulation matrix increases for each border point that meets certain parameters.
There are local maxima in the accumulation matrix.These peaks indicate which parameters, quantity, position, etc., best correspond to the detected geometric shape.
After detecting local maxima in the accumulation matrix, it can be possible to select those that correspond to the real boundaries in the image.Usually, the peaks, that exceed a certain threshold or some mark, are selected.
Some lines can be superimposed on each other, or located at a small distance from each other after applying the Gaff transform method, Therefore, it is advisable to filter the lines according to the distance from each other.Lines are grouped by distance from each other, thus forming groups of closely spaced lines.After that, the median line is calculated in each group.Thus, as a result, we get one line from each group.
The method of generalized contour preparation is the simplest method for a compact description of halftone images [19].This method is based on the transformation of multi-gradation in intensity images into two-gradation images by means of generalized contours formation.This method of converting multi-gradation images into twogradation, in comparison with other applied methods, ensures high formation accuracy of the mutual correlation function due to a greater degree of information storage and fulfillment of normalization and centering conditions for the conversion of correlated images: equality of average intensities, equality of intensity ranges and their equal centering according to medium intensity.
The essence of the method is the detection of intensity differences between readings of the halftone image, the formation of two-level detection signals in the set of contour preparations of the image [19].
We modify this method, introducing for this purpose the pyramidal procedure of generalized contour preparation.
Let's consider the main aspects of the network PH comparison of images based on their previous spatially connected preparation.When implementing the proposed method, preliminary boundary processing of the compared images and informative description of the task with the help of the mask function of the reference image are extremely important [19].
Let the continuous reference image (RI) (, ) as a result of discretization be represented by a matrix (, ) containing   ×   counts  , , where  = ,   − 1, ,  ∈   .The total difference threshold (GDR)  0 is equal to the minimum reference value  , , which is perceived by the image comparison device as the reference of the object.
The local difference threshold (LDT)  ∧ is determined by the minimum value of the difference    between two readings   1 , 1 and   2 , 2 , which are distinguished by the comparison device.For RI, the expression for LDT can be written in the following form: where The mask function (mask)  , = � , � of the reference image  э (, ), ,  ∈   э is determined where   э is the reference area of the reference image.
To form binarized preparations, there can be used the procedure (10) of forming positive, negative and zero preparations.As a result, we get a three-level binarized representation of the image.Moreover, we do coding with negative preparations for the darkest gray gradations of the image, with positive preparations for the lightest, and with zero preparations for intermediate gray gradations.Directly, the coding procedure itself is simple and consists in representing the converted drugs with two bits of information.To calculate the specified preparations, the threshold for the preparation operation is selected based on the fulfillment of condition (10), which evenly distributes contour preparations over the field, which is masked according to expression (11) or the entire image.If the image masking operation is used, it is possible to increase the sensitivity of the proposed method to the formation of contour preparations.
With this representation of the image by three types of contour preparations, there is formed a fairly rough description of it.This is due to the fact that a wide range of gradations at the gray image falls into the zone of zero contour preparations, and they are all coded in the same way, that is, zero preparations.In this case, more and less dark parts of the image fall into the same zone for coding, which leads to the loss of informative parts of the image.In order to eliminate this effect, a multi-level procedure for the formation of contour preparations is proposed, which consists in the fact that after the first step of contour preparation, the pixels of the transformed image provided by negative preparations are excluded for the second step.Then, in the second step, the threshold for preparation is selected according to expression (10) for those pixels of the image that have zero and positive preparations in the first step of preparation.Next, the generalized contour preparation operation is performed similarly to the first step.That is, positive, negative and zero contour preparations are formed in relation to the new calculated threshold.
At an arbitrary k-th step, zero and positive preparations are used for the preparation operation, with pixels of negative preparations of the k-1 step excluded.It is obvious that the operation of pyramidal generalizedcontour preparation can be performed with the exception of bright areas of the image, which are represented by positive preparations, with the subsequent exclusion at each new step of image areas with their own distribution of positive preparations.
Mathematically, the selection of the threshold for k steps of the pyramidal generalized contour preparation can be represented by the following criterion system.
×  (−1) (−1) ×  (−1) where   (0) ,   (−1) ,   (1) is the distribution of zero, negative and positive preparations at the k-th step of the pyramidal generalized contour preparation (k=1,2,... ,n), n -the number of segmentation levels, t -the number of gray scale levels.The criterion system of the image (12) allows, by calculating the threshold for each level t of the gray scale, to present the studied image with appropriate contour preparations at k levels of segmentation.

Description of the pyramidal generalized-contour preparation algorithm
Using the criterion system of the image ( 12), the algorithm of pyramidal generalized-contour preparation can be described as follows.At the first level of segmentation (k=1), by sorting through all levels of gray t, we determine such a value of t at which  1 (0) ×  1 (−1) ×  1 (1) = .At an arbitrary k-th level, while also sorting through all levels of gray t, we determine such a value of t at which   (0) ×   (−1) ×   (1) = .Therefore, its own threshold value is calculated at each level of segmentation, using the criterion system of the form (12).It achieves the adaptability property of the pyramidal generalized-contour preparation algorithm, i.e. for each new formed image, a certain threshold is calculated, due to the distribution of halftones in it according to the gray scale.Thus, the operation of pyramidal generalized-contour preparation consists in the sequential formation of areas (segments) of the image with negative (positive) preparations, which correspond to segments with less (more) dark gradations of gray.This sequential formation of image segments is a multi-level segmentation process of image areas with negative (positive) distribution of contour preparations.
It has been clear from the description of automatic segmentation algorithms for their implementation, the operations of determining the brightness of pixels, as well as their differences, are necessary.In addition, a comparison operation is necessary: "more than", "less than", "equal to" for brightness differences with a fixed threshold value and a masking (shifting) operation in the case of image processing by scanning it with a reference mask image.Then summation operations are needed to calculate the number of positive, negative and zero preparations and an operation to calculate the maximum of their product.As it is obvious from the listed operations, there are no time-consuming and slow computational operations for the implementation of the proposed methods, typical of, for example, orthogonal transformations, which makes it promising to use this approach for real-time multimedia processing of crack images with their subsequent classification.
Experimental studies of multi-level segmentation show that the threshold is correctly calculated when preparing images.Moreover, using a criterion expression of the type (12) to estimate the distribution of contour preparations, it is possible to describe the contour presentation of images with sufficient accuracy.
At the first level of segmentation, which corresponds to the segmentation based on contour preparation, contour preparations are selected and dark, light and intermediate image areas corresponding to these preparations are formed.
At the second level, multi-level segmentation is implemented and the darkest areas of the image corresponding to negative preparations are removed.With this removal of the darkest areas, it becomes possible to study lighter areas.The described procedure corresponds to the operation of recursive contour preparation.According to the criterion ratio of the form (12), all pixels of the image, except for the removed ones, are prepared.
It is clear that at the third level of segmentation, all pixels of the image prepared at the second level, which also correspond to negative preparations, are removed from further consideration.If necessary, similarly to the above, it is possible to segment image areas at the following segmentation levels.The required number of levels for segmentation of image parts depends on their complexity and the distribution of pixel luminances on the gray scale.It has been studied that three levels of segmentation are necessary for crack image processing.For the practical implementation of multi-stage image segmentation, the number of levels is set by the user or can be formed at the training stage.
To increase the accuracy of the segmentation formation at individual image areas for the original image, it has been proposed to use the method of parallelhierarchical (PH) transformation [19].For this purpose, the image is scanned by a window of a certain size, for example -(8×8), and at each level of the PH network, a recursive generalized contour preparation of the form (12) is used.In this case, each branch of the PH network implements a high-pass filter with filtering of image elements that correspond to zero preparations.Instead of tail elements of the network, tail images of the appropriate size are formed, each of which depends on the number of convolutions at each step of the algorithm at a certain level and the method of convolutions formation.The network effect of filtering is achieved due to the successive increase in the size of the image itself at each level of the PH network, for which the generalizedcontour preparation operation is performed.

The method of calculating the area of the image contour
The modified method for calculating the area of the image contour developed in the research is based on the method of encoding lines that form the boundaries of the black and white areas of the two-gradation image, which was proposed by Freeman.This method does not aim to minimize the number of binary digits spent describing the line more efficiently.The description obtained as a result of encoding (chain code) is convenient to use for further processing of the image.An example of a chain code (outer image contour) is shown in Fig. 1.,b.The chain code is: 0-0-0-1-7-1-2-1-3-3-4-4-3-5-6-5-5-6-7.The calculation was made from the lower left pixel.
The chain consists of straight segments that connect adjacent elements of the image in vertical, horizontal and diagonal directions.So, for example, a vertically oriented segment that connects four adjacent elements will have the code 222 (Fig. 1,a).To determine the area at each step of the algorithm, when finding the vertical direction of the contour line, the elementary area occupied by one pixel (the area of the square S) is taken into account.When finding the diagonal direction of the contour line, the area occupied by half the square is taken into account -0.5S.The area of the outer contour of the image is calculated at the first step of the algorithm, the first inner contour generated by the outer contour is analyzed at the second step (Fig. 1,b).Moreover, the rule for calculating elementary areas S is preserved for all subsequent steps of the algorithm, when finding the vertical and diagonal directions of the contour line of the image.
Similarly, the area of the second outer contour, which is generated by the first outer contour of the image, is calculated.That is, at each step of the algorithm, the area of the image contour, which is generated by its previous contour, is calculated.So, the area of the outer contour of the image is shown in Fig. 1,b: Area ( 1 The area  1 is calculated starting from the lower left edge of the image contour (Fig. 1, b).
The area  2 of the first outer image contour: The area  3 of the second outer contour of the image: The total area  of the image contour (Fig. 1,b) consists of the sum of the areas generated by the outer and two inner contours.That is: For comparison, if the area of the image contour is calculated, for example, by taking into account only the total number of pixels that make up the image contour, then  = 36.Moreover, in the general case, the error increases even more with an increase in the number of diagonally oriented segments on the subsequent internal contours of the image.The described algorithm calculates the areas of the first  1 and second  2 contours of the image and determines their percentage ratio

Experimental studies of detection of rail heads
A comparison is made with the work of Kang Zhao [20], who developed a system for selecting segments of rail heads.
His paper also describes a direct image processing method to detect rail heads.The algorithm is based on the image thresholding method.The thresholding method is applied to the investigated image, resulting in a binary image.After receiving the binary image, the original image and the binary are calculated.The result is an image with a highlighted head of the rail.
The disadvantage of such an algorithm is its high dependence on the brightness of other elements in the image.Since the output image may contain bright elements that do not belong to the rail, the algorithm may mistakenly save them as part of the rail head.
In the paper, a software implementation of the proposed direct image processing algorithm was created.Color images of rails with a size of 500 by 500 pixels were chosen as subjects.After applying the Gaff transformation method and filtering the found straight lines, the head of the rail is highlighted in the images.
Next, segments are separated between the lines and perspective deformation is performed for each of them to transform it into a rectangular image.Since the heads of the rails are relative to the entire rail, the average brightness level is determined for the selected segments.As a result, the segment with the highest level of brightness is selected.As a result of the experiment, it was determined that the speed of processing depends on the perspective of the rail in the photo.If the rail is photographed from above perpendicular to the head, then such images are processed faster, since less time is spent on perspective deformation.In the experiment, 10 different images of rails were used, which on the graph are represented by numbers from 1 to 10, on some of which the rail was photographed from above, and on other images the rail was photographed from different angles.On the graphic, the images corresponding to the numbers 1, 4, 5, 6, 10 contain rails photographed from above.All others are images where the rails are photographed at an angle (Fig. 2).It was also determined that this method can work with images of rails in different perspectives, regardless of exactly how the rail is placed in the image.

Fig. 2. The graph of the dependence of image processing speed on the rail perspective
The conducted studies revealed the result not to be affected by other elements in the image, since the algorithm focuses on clear longitudinal lines and does not heed other details, as it happens in the method proposed by Kang Zhao [20].

Conclusions
The article proposes a method of recursive generalizedcontour preparation [18,21] and a PH transformation [19] method for the problem of image segmentation in rail head defects, which allows to overcome the shortcomings inherent in the segmentation method based on increasing elements due to the fact that the threshold calculation is adapted to the image itself and an algorithm for its recursive determination is developed.The positive effect is achieved due to the use of a generalized-contour preparation operation for each branch of the PH network, and the formation of a hierarchical structure of a highfrequency filter in the PH network, the frequency characteristics of which decrease with each level of the network.A modified method of calculating the area of the image contour has been proposed, which is based on the coding of the lines forming the boundaries of the black and white areas of the two-gradation image.

Fig. 1 .
Fig. 1.An example of calculating the image contour area