Suitability of ANN and GP for Predicting Soak Pit Tank Efficiency under Limited Data Conditions

Under Industry 4.0 scenario, smart sensors can be suitably integrated with control systems to monitor the treatment of wastewater systems. Such control systems need a sound modelling or forecasting tool for the real time monitoring. This work reports a pilot study to model the treatment efficiency of soak pit tanks for the treatment of grey water using Genetic Programming (GP) and Artificial Neural Network (ANN). Only the inlet total suspended solid is considered for modelling. The Root Mean Square of Errors for GP and ANN run models were found to be 1.8 and 12.5 respectively for Tank 1. The results indicate that GP is a more promising tool than ANN particularly when modelling under limited data conditions. The difference in performance of both the methods seem to depend on the type learning mode adopted in each case.


Introduction
Grey water can be defined as any domestic wastewater (except sewage or black water) which is an inevitable major pollution point source from residential and commercial areas of a community.The grey water is characteristically different from black water in terms of less organic loading, less pathogenic microbes and faster decomposition due to reduced nutrient/carbon content [1].Although grey and black water vary significantly in their characteristics; the current practice of mixing grey and black water increases the cost and space required for treatment.Grey water treatment is important because it decreases the dependence on fresh water usage by 50-60% and it reduces the burden on STP (sewage treatment plant) by reducing the space and energy requirement for treatment [2].
One of the best ways to manage grey water is either to re-use or recharge the groundwater using treated grey water.The problems with re-use of grey water are that it can pose serious health risks -(i) if watered to fruits or vegetable plants or (2) and if the hard bleach or chemicals used are not treated to standard levels.Of the several techniques available for grey water treatment, soak pit has attracted considerable attention as it is quick and cheap to build, requires very less space (an important consideration for an urban region) and quickly recharges groundwater [3].Although eventual clogging is inevitable in soak pits; optimization, proper design and effective pre-treatment of grey water can prolong the age of soak pit making it an effective alternative for grey water treatment.The present study focuses on assessing the performance of two soak pits (with difference in the quantity of brick ballasts and gravels used) in the removal of total suspended solids (TSS).
The solids removal in soak pit is a function of many interrelated complex parameters such as concentration of particulates (suspended and dissolved), organic and inorganic chemicals, pH, amount of surfactants etc., and it is relatively difficult to study and interrelate all the parameters using experiments.Modeling of the treatment systems is one way researchers and engineers resort to develop a better and deeper understanding of the process.Of late, data driven modeling tools are more being used to develop models and to mine knowledge of which Artificial Neural Networks (ANN) and Genetic Programming (GP) are more commonly used especially to model complex non-linear processes without considering all details of the process.Under the Industry 4.0 scenario where smart sensors can be integrated to control systems for effective monitoring of treatment efficiency, it is necessary to use a sound forecasting model to support the control system.While ANN has seen application in many aspects of wastewater treatment, GP is a relatively new technique as far as its application in wastewater treatment is concerned.This study is aimed to investigate the suitability of ANN and GP in the modeling of treatment efficiency of soak pit for TSS removal under limited data conditions knowing only the initial solids concentration in grey water as a function of time.From the performance of the modeling tools under limited data conditions, it is intended to gain a confidence in choosing the most appropriate data driven modeling technique for large scale application of wastewater treatment with more complex inter-related input parameters.

Experimental setup and tools used 2.1 Soak pit tank
A soak pit is essentially a hole designed with the purpose of allowing wastewater to infiltrate into the ground.These are used for the discharge of domestic and industrial wastewater.Certain design mandates are to be ensured for design of soak pits such as the depth of the soak pit should be between 1.5m to 4m but never less than 2m above the GWL and should be located at a safe distance from drinking water source (minimally 30m).The size of soak pits is dependent on two things, the infiltration rate of the soil and the quantity of waste water being put into it.In this study, two lab scale soak pits of 1 m depth were constructed (Tanks -T 1 and T 2 ) handling 20L of wastewater (grey water only) each day.The difference between two tanks is the quantity of brick ballast and gravels used.Performance of both the tanks was quantified in terms of removal of solids (TS, TDS, and TSS) and pH.The methods set by APHA, 2005 [4] were followed for measuring TSS.The filter media compositions for the two tanks are shown in Table 1

Genetic Programming
Genetic Programming (GP) is very similar to Genetic Algorithm (GA), an evolutionary algorithm based on Darwinian theories of natural selection and survival of the fittest with a characteristic difference that while GA operates based on bit strings, GP operates on parse trees to approximate the equation (in symbolic form) that best describes how the output relates to the input variables.By the random combination of input variables and functions, first an initial population of randomly generated equations (programs) is evolved.The functions can include arithmetic operators (plus, minus, multiply, divide), mathematical functions (sine, cosine, exponential, logarithmic), logical/comparison functions (OR/AND) etc.The choice of appropriate functions mandates at least some understanding of the process being modeled.The initial population is processed through an evolutionary process and the quality of the evolved programs are checked by means of a defined 'fitness' measure.The best program based on the fitness measure are then selected from the initial population and better programs are evolved by exchange of part of information between them by employing 'crossover' and 'mutation' as used in GA which is very similar to the natural reproduction process [5].The crossover and mutation rates need to be carefully decided.A high mutation will introduce high variability in the programs developed and might hinder in the convergence of the evolutionary process.The user must also decide a number of GP parameters before applying the algorithm to model the data, such as population size, number of generations etc.The programs that fitted the data less well as per the fitness measure are discarded.This evolution process is repeated over successive generations and is driven towards finding symbolic expressions describing the data, which can be scientifically interpreted to derive knowledge about the process being modeled [6].GP is implemented in this study using Discipulus software.

Artificial neural network
The architecture of ANN is motivated by the structure of the human brain and nerve cells.This technique is used in identifying the statistical pattern present in the time series and applies it to unknown data to predict.A network of countless simple elements called neurons with a small amount of local memory is considered.The neurons are connected through connections which carry numeric data encoded by various means.Each neuron operates only when it receives data through the connections.The architecture is formed by the learning algorithm which is responsible for the extraction of the regularities present in the data through the finding of a suitable synapses set during the process of observation of the examples.Accordingly, ANNs solve problems by self-learning.
The feed-forward architecture is used in this study.One input layer, one output layer and one or more hidden layers are available in the architecture.The information passes from the input layer to the output layer through hidden layer.Each layer is fabricated by several neurons, and the layers are interconnected by a set of weights.Neurons operate the input and transform it to produce an analog output.More details about ANN have been discussed in [7 -10].

Performance Measure
The Root Mean Square of Errors (RMSE) value is taken as the performance measure to check the performance of GP and ANN as shown in Eq (1).

Results and Discussion
The chemical characteristics monitored in grey water were pH, Total solids, Total Suspended solids and Total dissolved solids according to methods set by APHA, 2005 [4].The readings were taken for 3 weeks and are shown in Table 2.As seen from the table, the inlet characteristics vary substantially as the week progresses and starts decreasing towards weekends.Table 3 shows the training and validation data set taken for GP and ANN modelling.Total of 15 data sets were chosen for this study, of which 10 sets were used to train, three for validation and 2 sets for application.Table 4 and Table 5 show the comparison of actual removal efficiencies with the predicted output values from GP and ANN from soak pit tank1 and tank 2 respectively.Figure 1 and Figure 2 compares the removal efficiency graphically as obtained from soak pit tank 1 and tank 2 respectively on the performance of ANN and GP.
As seen from the Tables (4 and 5) and Figures (1 and 2), GP performs better than ANN for soak pit tank 1 and as good as ANN for soak pit tank 2.An attempt is also made to Where TSS %out is the TSS removal in percentage, TSS in is the input TSS and t is the time.
The GP evolved models are given by equation ( 3) and ( 4) for soak pit tank 1 and tank 2 respectively.
It is to be noted that these models are only approximate models to provide an overall behaviour of the system and is not the exact equation.The prediction of models with actual removal efficiencies from both soak pit tanks are shown in Fig. 1 and 2. It can be seen from the Fig. 1, for time 96h, with an inlet TSS of 921 mg/L for tank 1, GP and ANN predicts a removal efficiency of 31.8 and 38.8 while the experimental (actual) efficiency was 30.4; while for tank 2, with an actual removal efficiency of 43.5, GP and ANN showed an removal efficiency of 45.1 and 46.5, making it clear that GP is able to model the soak pit tanks with limited data conditions more efficiently than ANN predictions.A closer look at the equations ( 3) and ( 4) reveal that while for soak pit tank 1, the removal is directly proportional to the sum of TSS and t while for soak pit tank 2; it is a product of the two terms.This indicates that in soak pit tank 2, the removal process proceeds faster than soak pit 1.
While back propagation ANN functions on the principle of redistribution of error to update the weights, GP functions on the evolving functions to model the process.If the data set has no clear regularities but some functional relationship exist between input and output vectors, ANN might not succeed with limited data whereas GP can map the input and output relatively easily if the functions used for evolving the models are correctly chosen.The simple model obtained from GP also indicates that the process of TSS removal through filtration in the tanks is a rather simple process.

Conclusions
The following conclusions can be arrived at based on this study: (a) GP can be an effective modelling and forecasting tool in wastewater treatment when compared to ANN for modelling under limited data conditions.
(b) The modelling/forecasting tool can be integrated with the control system for real time monitoring of the effective functioning of the treatment system.

MATECFigure. 1 .
Figure. 1. Inlet concentration and soak pit tank 1 actual removal efficiency along with predicted model removal efficiency.

Figure. 2 .
Figure. 2. Inlet concentration and soak pit tank 2 actual removal efficiency along with predicted model removal efficiency.

Table 2 .
Inlet and outlet Characteristics of grey water from the soak pit tanks 1 and 2.

Table 3 .
Training and validation data sets used for modelling.

Table 4 .
https://doi.org/10.1051/matecconf/201820303001ICCOEE 2018 Comparison of actual removal efficiency with ANN and GP predicted removal efficiency for validation data from soak pit tank 1.

Table 5 .
Comparison of actual removal efficiency with ANN and GP predicted removal efficiency for validation data from soak pit tank 2.