Early tube leak detection system for steam boiler at KEV power plant

Tube leakage in boilers has been a major contribution to trips which eventually leads to power plant shut downs. Training of network and developing artificial neural network (ANN) models are essential in fault detection in critically large systems. This research focusses on the ANN modelling through training and validation of real data acquired from a sub-critical boiler unit. The artificial neural network (ANN) was used to develop a compatible model and to evaluate the working properties and behaviour of boiler. The training and validation of real data has been applied using the feed-forward with back-propagation (BP). The right combination of number of neurons, number of hidden layers, training algorithms and training functions was run to achieve the best ANN model with lowest error. The ANN was trained and validated using real site data acquired from a coal fired power plant in Malaysia. The results showed that the Neural Network (NN) with one hidden layers performed better than two hidden layer using feed-forward back-propagation network. The outcome from this study give us the best ANN model which eventually allows for early detection of boiler tube leakages, and forecast of a trip before the real shutdown. This will eventually reduce shutdowns in power plants.


Introduction
A thermal power plant works using steam to drive the turbines.These power plants consist of various essential components working together including the boiler, condenser, burner, turbines and also coal bunkers.All these systems has to work together in order to ensure smooth operations in the plant.One of the main components that is prone to damages and failures is the boiler.The boiler consists of various sections including the economisers, steam drum, superheaters, fluidising grid and boiler tubes.These tubes contain large amount of water travelling through inside at high velocity.Water is heated at extremely high temperatures and pressures to ensure efficient and reliable rotation of the turbines and generation of stable electricity.Eventually the boiler tubes wear off and can no longer withstand the high pressures and temperatures.Any debris or foreign object flowing in the tubes at high velocity will easily cause tube damage or failure.Besides that, water-side corrosion, fire-side corrosion and lack of quality control may also cause tube failures.Undetected tube failures, may cause even more damages to other tubes.
Artificial Neural Network (ANN) is a computational model which is considered a part of statistical learning algorithms.Generally, the ANN is inspired from the biological neural network available in human brains.ANN is considered one of the main branches of artificial intelligence, apart from genetic algorithms, expert systems and fuzzy logic.ANN as a processor has natural ability to keep knowledge and large amount of data.These data are required to ensure that training is done effectively.Synaptic weights also known as inter-neuron connection strengths stores knowledge during the learning process.Artificial neural networking does not require detailed information on the system.Instead it is a continuous learning process which learns about the relationship between the input parameters and variables.The learning process acquires information from the large spreadsheet of data available.The three sections involved in the multilayer feed-forward neural network are the input layer, hidden layers and output layer.The neurons in an ANN are all connected with suitable synaptic weights.The training process basically modifies the set of connection weights through storing of knowledge.An input is introduced, the desired outputs are determined and the weights are adjusted during the learning process of the network.he incoming connections allow the sending of weighted activation of other nodes to a particular node.The activation function eventually calculates the weightage of each node introduced.Combinations of linear transfer function, log-sigmoid transfer function and tan-sigmoid transfer function will produce varied output values.
A node is activated as the input nodes enter the hidden layers.As the input nodes increase, they pass through activation functions.The output is produced when the network uses the input pattern and reads it.Difference in the connection weights causes error.The connection weights have to be modified in order to decrease such errors.The input patterns are run through all over again until the error value reaches the goal value In a study on artificial intelligent system for steam boiler diagnosis based on superheater by Firas [1], training was done using prepared data.As a result, the FDDNN is able to detect existing fault even quickly.The output is considered faulty if the value is close to 1.In another study on multidimensional minimisation training algorithms for steam boiler drum level trip using artificial intelligent monitoring system, Firas [2] used Two different interpretation algorithms.They are the Broyden-Fletcher-Goldfarb-Shanno quasi-Newton and Levenberg-Marquart.Real data is obtained from a coal-fired thermal power plant in Perak.Firas and Vinesh [3] implemented the feed-forward back-propagation network (BP) as supervised system.From this algorithm, weights that maximises performance is identified.In this case, the input and output is provided.Jankowska [4] in an approach to early boiler tube leakage detection with artificial neural network, used two main structure in building the ANN model of flue gas which are the linear nets (LIN, without hidden layers), and feedforward Multilayer Perceptron (MLP).Shahul [5] in a study on automatic detection and analysis of boiler tube leakage system, used BPNN algorithm.From the results of this case study, tube leakages from holes of different diameters and distances were successfully detected.The methods involved include; data recording and preprocessing, and feature extraction.
Based on a study on detection and location of sound emission caused by tube by Jenshan et al. [6] , a 660MW boiler is taken as a model.The localisation of different positions of the leakage source in the furnace of the boiler's body is simulated.Yang Zhao [7] in a study on pattern recognition-based chillers fault detection method using support vector data description (SVDD) proposed a pattern recognition-based chiller fault detection method using a novel one-class classification algorithm.Basically, the objective is to find a minimum-volume hypersphere in a high dimensional feature space.Both SVDD-based method and PCA-based methods are used and compared.Using pattern recognition, it is easier to apply the same pattern for different cases.The algorithms and theories that backs up pattern recognition is quite reliable with a large amount of documented information and tools available.Jayanta [8] in a research on use of artificial neural network in pattern recognition, summarises and compares some of the well-known methods used in various stages of a pattern recognition system using ANN and identify research topics and applications included in the field.For this study, automatic recognition, description, classification and grouping of patterns were important factors taken into account.From this study it is known that there are pros and cons in the application of pattern recognition.The application of ANN in each pattern recognition case has always performed better result than that of without implementing ANN.Zafei [9] proposes time domain features as a proper alternative to frequency features.The ANNs are trained with subset of experimental data for known machine conditions.Through sufficient training, the efficiency of the proposed method is evaluated using remaining set of data.
The main objective of this research is to design an ANN model for detection and diagnosis of fault in a boiler unit.This would eventually minimise forced outage and trips in power plants caused by tube failures and damages.Apart from that this study allows understanding on the behaviour of boiler operation variables and common causes of tube leakages.This research focuses on the early detection of boiler trip due to boiler tube leakage by implementing ANN modelling through training and validation.The ANN model is designed to forecast a trip before the real shutdown.This will give time for plant operators to take measureable actions to avoid the real shut down.

Data preparation
Acquiring real data is essential in order to proceed with the training and simulation using the Neural Network.The real data is acquired from a working unit of a coal fired power plant.This power plant consists of 2 oil and gas fired units and 4 coal, gas and oil fired units with a maximum power generation capacity of 2400 MW.The unit selected was Unit 4 boiler which is part of a subcritical unit.Based on the trip reports, the most suitable and boiler-related trip is chosen.There had been a shutdown due to boiler tube leakage from 30th July 2013 to 6th August 2013.Data is collected from one day before the trip and one day after the trip with time interval of 1 minute.The large amount of data is processed and is shortlisted to 26 important variables.They are shortlisted based on plant operator experience on identifying the essential variables that contributed to boiler trips of the particular unit.The sequence of data processing is shown in Figure 1.The real data collected from the power plant unit is enormously large and contains thousands of values.From the big spreadsheet of data, there were missing data and invalid values available.These values are irrelevant to be input into ANN directly.Therefore these invalid and missing data are visually screened and identified.The missing data were treated with missing data treatment.In this process, the invalid and missing values identified were replaced with values using interpolation of available values.The interpolation was done using the following formula.
ܸ ଵ ‫ݐܽ‬ ‫ݐ‬ ଵ = ‫ݐ‬ ‫ݐ(+‬ ଷ − ‫ݐ‬ ‫݁݉݅ݐ/)‬ ‫ݏ݈ܽݒݎ݁ݐ݊݅‬ (1) In order to ensure that the data can be fed to the neural network, the values in hundreds are normalised into values between 0 and 1 using the normalisation equation. Normalization In this particular study, training and validation is classified into 70% training and 30% validation.Only 70% of data from the real data is used to train the ANN model.The ANN model is trained and validated using different combinations of training algorithms, activation functions, number of hidden layers and number of neurons.Through training, the training algorithm with the minimal root mean square error (RMSE) will be selected for the next step, validation.

ANN Topologies
The feed-forward NN structure selection will be done based on the numbers of hidden layers of the network, mainly one hidden layer and two hidden layer. 1 to 10 neurons for each hidden layers will allow variable outputs.These combinations of hidden layers and number of neurons result in different outputs.
Only up to 2 hidden layers are used because using greater number of hidden layers may cause constant values of RMSE.For each hidden layer, different combinations of number of neurons are applied.Each hidden layer can fill up to 10 neurons as more than 10 neurons caused RMSE value of more than 0.5, which is the set goal RMSE.

Training algorithms
The feed-forward back-propagation training algorithm is the most effective due to its non-linear solutions.These training algorithms are few of the faster and reliable algorithms available in the NN MATLAB.The training algorithms are Resilient Backpropagation, Levenberg-Marquardt, Scaled Conjugate Gradient, and BFGS quasi-Newton.The Resilient Backpropagation has moderate memory requirement, is faster than standard steepest descent algorithm and is able to eliminate effects of having small slope at extreme ends of sigmoid transfer function.The Levenberg-Marquardt algorithm provides solution to minimising a function in least square curve fitting and nonlinear programming.Scaled conjugate gradient has faster convergence than steepest descent directions.BFGS quasi-Newton on the other hand requires more computation in each iteration, and more storage than the conjugate gradient methods.

Activation functions
In this study, three activation functions were used to define the outputs from the given inputs.The three activation functions include linear transfer function, logsigmoid transfer function and tan-sig transfer function.Linear transfer function returns the value passed to it to calculate the neuron's output.The training of the neuron allows the neuron to learn a function and find a linear approximation to a nonlinear function.However a linear network is not able to perform a nonlinear process.Being the most common transfer function in back-propagation networks, log-sigmoid function is differentiable.The network outputs can take on any value in case of linear output neurons.The activation functions carries out various calculation using the inputs, and generating the output and error.Different inputs shall produce different outputs and thus the network is modified.
Simulations tests were done on the network to calculate the outputs from the original inputs.Through the RMSE equation, RMSE values are achieved which equates to the weights that minimises the error.Thus MATLAB is used to retrieve the output.The combination with the minimal root mean square error (RMSE) was selected for the validation process.
Where k is the number of iterations, Q is the total number of iterations (epochs), t is the target output, and a is the actual output.

Results and discussion
The data which has been shortlisted to 26 variables (Table 1) has undergone the process of data preparation.This includes visual cleaning, missing data treatment and normalisation.This is important to allow proper input of these large set of data into MATLAB for training and validation.The fault introduced for each variable was identified by comparing the fault introduced graphs of each variable.All the variables are evaluated and grouped as 'influenced' (I) and 'most influenced' (MI).This grouping is done based on the tripping time closest to the real shutdown.

Analysis of the ANN training results
It can be seen in Figure 2, Variable 24 (Primary Superheater Metal Temperature 1) was identified as the most important variable.The closest variable to the real shutdown is at 1617 minutes where the fault is introduced.The real shutdown occurs 10 minutes after that which is at 1627 minutes.Therefore this variable is the most important variable among the 26 variables as it caused the boilers to trip immediately after the fault was detected.Each combination produced different RMSE under different hidden layers, activation functions, number of neurons and training algorithm.After several training, the combination that produces the smallest RMSE is considered as the best combination for the ANN model.The results are graphed as in Figure 3 to ease comparison of RMSE values.
For 2 hidden layers there are 27 combinations of activation functions which include purelin, logsig and tansig, the lowest RMSE of each training algorithm is chosen and compared in the graphs below.Through visual comparison and analysis of all the 27 combinations, these best combinations with lowest RMSE are tabulated to allow data analysis.Comparison of training functions of 1 hidden layer and 2 hidden layers is shown in Figure 3.

Analysis of the ANN validation results
The model with least RMSE value is considered the best model combination, therefore that particular model is used for the validation process.A different set of coding is used to achieve the final forecasted graph.This is essential to prove that the model allows for trip forecasting, and is able to detect trip before the real shutdown.The produced results are presented in Figure 4.The chosen ANN model consisting of 1 hidden layer is used for the validation process and to come up with the forecasting graph.The first graph consists of the actual RMSE values, whereas the second one displays the predicted RMSE values.In Figure 4 , both the actual RMSE and predicted RMSE are plotted in the same graph.This is essential in order to identify the timing of the actual and predicted RMSE values.
Actual RMSE graph shows the real trip that has occurred.This is done based on the trip recorded by the plant itself.The predicted RMSE graph reflects the forecasted model.As seen in Figure 4 the trip will continuously occur without the ANN system implementation, eventually affecting the plant normal operations.As shown in Figure 4, the sudden increase in value is observed at the 1617th minute.Comparing the actual RMSE (blue lines) and predicted RMSE (green line) values in the forecasted graph, the predicted trip is detected at the point of 1617th minute.This allows for a gap of 10 minutes before the actual trip at 1627th minute.
The classification of data classifies a trip when it reaches the trip value which is "1".The difference between actual and predicted RMSE which is 10 minutes, has proven that the real trip could be eliminated successfully.The time gap available allows the plant operator to prevent real trip by taking prevention actions within the available time.The ability to forecast a trip, ensures the continuous operations of the plant.Being a continuous-learning system, the ANN system will be able to detect and trips that will occur in the future.

Conclusion
This study has focused on constructing the best ANN combination model.This study has come up with the best combination of ANN model through ANN simulation training and validation.The combination allows for effective forecasting.The training algorithm, activation functions, number of hidden layers and number of neurons are few of ANN topologies that are discussed in this study.Every time the number of neurons are set for each hidden layer and the coding is run, the results vary.
The number of neurons is set to a certain limit of range to avoid repeatedly occurring RMSE values.Higher number of neurons causes lower RMSE.
Through this study the behaviour of boiler operation variables and common causes of tube leakages were understood.The variables are identified and the Variable 24 (Primary Superheater Metal Temperature 1) is considered as the main contributor of the shutdown.Apart from that the fact that the current study uses offline data received from the plant.Future studies may focus on developing the online data input system which would allow data to be transferred directly from the plant's control room to the ANN system.The methodology established in this study can further be developed using other artificial intelligence systems to identify and detect boiler tube leakage in power plants.

Figure 1 .
Figure 1.The sequence of data processing.

Figure 3 .
Figure 3.Comparison of training functions of 1 hidden layer and 2 hidden layers

Figure 4 .
Figure 4. Graph of Actual RMSE vs Predicted RMSE set.RMSE function is used to calculate the error difference between actual output and predicted output.

Table 1 .
Fault Introduced For Each Variable