Neural networks and principle component analysis approaches to predict pile capacity in sand

Determination of pile bearing capacity from the in-situ tests has developed considerably due to the significant development of their technology. The project presented in this paper is a combination of two approaches, artificial neural networks and main component analyses that allow the development of a neural network model that provides a more accurate prediction of axial load bearing capacity based on the SPT test data. The retropropagation multi-layer perceptron with Bayesian regularization (RB) was used in this model. This was established by the incorporation of about 260 data, obtained from the published literature, of experimental programs for large displacement driven piles. The PCA method is proposed for compression and suppression of the correlation between these data. This will improve the performance of generalization of the model.


Introduction
Pilling has been used for many years as a common foundation solution for different types of civil structures.A large number of design approaches, therefore, have been proposed to predict the ultimate capacities of piles.These approaches range from simple empirical formulations to more sophisticated finite element (FE) analyses, with new methods introduced every few years [1].
Artificial neural networks (ANNs) have been use recently to predict the ultimate capacity of driven piles based on in situ tests [2][3][4].However, most of ANNs models available in the literature generally used a limited number of data sets with a lack of more accurate measurements of soils properties from the SPT results.
In this study, two techniques, the ANN and PCA are compiled to develop a new model that would be able to predict the ultimate capacity of large-displacement driven piles embedded in sandy soils based on SPT data.
The performance parameters of the developed model are compared successfully with those obtained from field tests as well as from reliable approaches.

SPT N values and failure zone extension
Since the pile capacity depends on the soil compressibility and the SPT is one of most commonly used tests in practices for indicating the in situ compressibility of soils ; the SPT blows count 300 mm (N SPT ) are used for the purpose of this study.In most cases, the value of N presents a relatively wide range of variations due to the heterogeneity of soil layers.In order to obtain proper unit shaft and base resistances (the ultimate or total load is the addition of these two components), it is very important to consider the variations of soil resistance properties by presenting an average value for N. Since unit shaft and base resistances are related to the average value of N, this value should be a pertinent representative.As a result of a simple numerical calculate of the studied cases, using the geometrical average method to obtain the logical representative of N values seems to be more accurate and relevant [5].For example, 26.8 and 15 are the arithmetic and geometric average, respectively of 2, 3, 15, 14, 15, 17, 15, 17, 70, and 100.This example demonstrates that the value of the geometrical average is closer to the predominant values of these numerals, since the arithmetic value is highly affected by the values 70 and 100.It should be noticed that the SPT values used for the geometric average should be at a constant spacing.
In order to obtain the unit shaft resistance of piles from the SPT results, we take the geometrical average of N along the embedded length N Shaft .To account more accurately for the variability of soil properties at the pile base or the unit base resistance from the SPT results, the failure zone should be specified around the pile base.This zone is taken in accordance with the proposal adopted by the UWA method (University of Western Australia) [6]

CMSS-2017
Table 1 gives the failure zone dimensions, which are noted Zone A and zone B or their corresponding blows number N A , N B .For more representation of the tip failure zone, we suggest to take the N value at the pile base N Tip .
Table 1.Proposal used for influence zones for end bearing [6] Influence zone UWA method

Zone A 8B
Zone B (0.7 -4) B Fig. 1.Influence zone for averaging blows number near the pile base.
The normalized N value to the standard energy ratio of 60% is corrected for a number of effects, is often expressed by the following form [7]: ( Where are the correction factors for hammer energy efficiency, borehole diameter, sampling method and, length of drill rod, respectively.No available details of the the hammer energy efficiency, borehole diameter, sampling method and, length of drill rod for several studied cases beyond, are taken as unity for all data (Kramer, 1996).
The blows number is then corrected for overburden pressure effect as generally expressed by the following form: Where, C N is the correction factor for effective over burden pressure.
There are several expressions for the C N in literature.However, the expression initially proposed by [8] for sound and later suggested in ASTM D 6066-96 is used for this study, which is as follows: Where, σ v ' : vertical effective pressure at the SPT test point and, Pa: reference pressure equal to 100 KPa.This correction is not used for clays.

Neural networks
A neural network is a system composed of a set of neurons interconnected with each other.A certain disposition of the connection of these neurons produced a neural network model suitable for certain tasks.The Back Propagation Multilayer Perceptron (BPMLP) is the most popular neural network model often used, consisting of three adjacent layers, input, hidden and output [9].To obtain some desired outputs, weights, which represent connection strength between neurons and biases, are adjusted using a number of training inputs and the corresponding target values.The network error, that is the difference between calculated and expected target patterns, then back propagated from the output layer to the input layer to update the network weights and biases.It arises during the learning process and it can be expressed in terms of mean square error (MSE) using: ( ) where: t j is the target value of j th pattern, O j is the output value of j th pattern, and P is the number of patterns.

Principal component analysis (PCA)
The technique of PCA is a descriptive technique to study the dependencies between variables, for a description or a compact representation of these variables.It was also successfully applied as a technique for reducing the dimensionality of ANN inputs in a variety of engineering applications [10][11][12][13].Mathematically, PCA is an orthogonal projection technique that projects multidimensional observations represented in a subspace of dimension m (m is the number of observed variables) in a subspace of lower dimension (L < m) by maximizing the variance of the projections.The estimation of PCA parameters can be summarized in the calculus of eigenvalues and eigenvectors of the matrix Σ.From the spectral decomposition of this matrix it can be written as follows: where: is the eigenvector of Σ and is the corresponding eigenvalue.
The determination of the number L which represents the number of eigenvectors corresponding to the dominant eigenvalues is very important.Many rules are proposed in the literature to determine the number of L components to retain [14].In this study, we used the cumulative percentages of the total variance method.The percentage of variance is explained by the first L components and is given by: ( ) % .100

Database construction
In this study, one ANN model that deals with the large displacement piles is developed.The data used to calibrate and validate the ANN model are obtained from the literature and includes 260 static load tests on large displacement driven piles reported by different authors [15].The conducted tests were performed at different non-cohesive sites.-The ultimate pile capacity (Q t ) is defined in this study as the load corresponding to the plunging failure for the well-defined failure cases.For the cases where the failure load is not clearly defined, it is required to determine the failure load from the results of pile load tests through a unique criterion.According to [16], a small-diameter pile is considered to be failed if it experiences a settlement equal to 10% of its nominal dimension.On the other hand, the ultimate capacity of a large-diameter pile is accessed, following the recommendations of [17][18], when the pile settlement (S) equals to:

Model inputs and outputs
Where: Q is the test load; L is the pile length; p A is the cross sectional area of pile and p E is the modulus of elasticity of the pile material.

Methodology of implementation of PCA and ANN
This section describes the steps taken to implement the PCA and the ANN approaches.Two types of PCA data processing were implemented in two phases.The first phase is called Pre-PCA, which is responsible for preprocessing the training data matrix and eliminates correlations between them.The second is called Post-PCA, which is used to transform testing and validation data matrix according to its principal components.The implementation and simulation were performed using the MATLAB 7.5 [19] (The Math Works 2007) functions of the neural networks toolbox.

Pre-PCA phase
The input data (Matrix C) were first normalized, so that they had zero and unity variances.Then, the PCA parameters (eigenvalues and eigenvectors) were estimated to calculate the principal components (PC) using the normalized data (Matrix N), the mean and variance values.The uncorrelated components of matrix Ntrans were classified according to their variances.They were then passed to the ANN together with their corresponding target output values for a network training process based on a selected PC variance value.The nine (9) parameters of the row input matrix can be replaced by seven (7) first principal components based on a chosen PCV value.They were then introduced to the ANN inputs with their desired output.Many ANN were trained using different PCV values to determine the optimal percentage of this value of the total variance in the database.The best model is with PCV equal to 2% and seven principal components.

Post-PCA phase
During each training process of an ANN, validation and generalization performance on testing and validation data sets were evaluated.Each vector of validation or test data was post-processed with the post-PCA before it can be used an ANN to estimate or predict the output.The trained network used these reduced and uncorrelated data with its optimal weights obtained from training process to predict total pile capacity (ultimate load).

Training, testing and network selection
The developed model was trained and tested with its data set for training, testing, and validation using the Bayesian regularization algorithm [20].We have obtained for the coefficient of correlation respectively: 0.96; 0.94; 0.95, for training, testing and validation.

Comparison of ANN model with available SPT-based methods
To examine the accuracy of the large displacement pile model against available methods, the model is compared with four SPT-based methods currently used in practice.Performance of ANN model against available SPT-based methods (Table 2).We can conclude that the developed model gives the better values comparing with the others methods.

Conclusions
In this paper, a comprehensive set of in-situ pile load test results collected from the literature has been utilized to develop an ANN model for capacity predictions of large displacement piles.In order to improve the predictive ability of the developed ANN model, the principal component analyses (PCA) approach was applied.The performance of the ANN model was examined against the most practically used SPT-based pile capacity prediction methods.The results indicate that the developed model was capable of accurately predicting the ultimate capacity of both large displacement piles with high performance parameters (R² = 0.96, Mean = 1.05,SD = 0.22).

Table 2 .
Performance of ANN model against available SPTbased methodsNote: Qfit -pile capacity of the best fit of predicted versus measured pile capacity; Qp -predicted pile capacity; Qm -measured pile capacity; µmean; σ -standard deviation.