Uncertainty analysis of constant amplitude fatigue test data employing the six parameters random fatigue limit model

. Estimating and reducing uncertainty in fatigue test data analysis is a relevant task in order to assess the reliability of a structural connection with respect to fatigue. Several statistical models have been proposed in the literature with the aim of representing the stress range vs. endurance trend of fatigue test data under constant amplitude loading and the scatter in the finite and infinite life regions. In order to estimate the safety level of the connection also the uncertainty related to the amount of information available need to be estimated using the methods provided by the theory of statistic. The Bayesian analysis is employed to reduce the uncertainty due to the often small amount of test data by introducing prior information related to the parameters of the statistical model. In this work, the inference of fatigue test data belonging to cover plated steel beams is presented. The uncertainty is estimated by making use of Bayesian and frequentist methods. The 5% quantile of the fatigue life is estimated by taking into account the uncertainty related to the sample size for both a dataset containing few samples and one containing more data. The S-N curves resulting from the application of the employed methods are compared and the effect of the reduction of uncertainty in the infinite life region is quantified.


Introduction
Performing constant amplitude (CA) fatigue tests on structural details, plain material or assemblies is a necessary step to either perform or validate the design of these components against fatigue failure.When undamaged specimens are tested, each fatigue test results in two types of data: failure and right censored (runout), which consists of a test terminated without failure.According to common practice [1] the fatigue test data related to the finite life region and those related to fatigue limit, the stress range at which no failure is assumed to occur, should be analyzed separately employing two different statistical techniques: the linear regression and the staircase method.Recently, several models have been proposed in order to infer CA fatigue test data irrespectively on the type of data.In this case, the inferential process results in the estimation of the parameters related to the location and scatter in the finite life region and in the fatigue limit.The estimation of the uncertainty underlying the analysis of fatigue test data is of primary importance in order to assess the safety of structures subjected to fatigue loads by employing probabilistic methods.According to the categorization made in [2], reference is made to the aleatory uncertainty to represent the inherent variability of a stochastic quantity and to the epistemic uncertainty for the lack of knowledge [3].In practice, the aleatory uncertainty is quantified through the evaluation of the scatter produced by the experimental data.Instead, the epistemic uncertainty is quantified by evaluating the degree of belief attributed to the estimators of the model parameters, which increases with increasing the number of experimental observations.Estimating both the aleatory and the epistemic uncertainty allows to correctly estimate the safety margin to be applied in order to reach a predetermined safety level.The present paper is concerned with the estimation and reduction of aleatory and epistemic uncertainties of constant amplitude fatigue test data of the cover-plated steel beam detail, which are summarized in Table 1.A "large sample" dataset containing all the test data, reported in Table 1, and a "small sample" containing a random subset of the data, see Table 2, are inferred using the linear regression model (LRM) for comparison with common practice and to explore the affinity of the employed methods in estimating the epistemic uncertainty.Then, a regression model for fatigue is employed to infer the "large sample" data, to estimate, to reduce the epistemic uncertainty, and to evaluate their effect on the predicted fatigue life distribution.

Models and Methods
In order to model the trend of the fatigue test data, the Basquin relation [6] is often employed in its logarithmic form: Where ∆ is the stress range and  0 and  1 are parameters that respectively control the location and the slope of the curve.The aleatory uncertainty are often [1] quantified assuming that the logarithm of the fatigue life at a given stress range is normally distributed and the data are assumed to be heteroscedastic, i.e. the scatter is assumed to be independent on the stress range.On the other hand, the epistemic uncertainty are considered by determining the sampling distribution of the model parameters using ancillary functions.Indeed, this method is often employed to estimate prediction intervals when the Basquin relation and a normal log-fatigue life distribution are used to infer the fatigue test data, reference is made to this model as the LRM.Despite this, it is well established that the fatigue life shows an increasing variability at lower stress ranges, i.e. approaching the fatigue limit, and that the most suitable distribution function to model the fatigue life is actually unknown [7].In order to better model the trend of the test data, their variability and the fatigue limit, several models have been formulated in the last decades.Pascual and Meeker [8] proposed the Random Fatigue Limit model in order to mimic the curvature at low stress ranges and to better describe the variability of the fatigue life at stress ranges approaching the fatigue limit, ∆S 0 , which is defined as a random variable.To model the aleatory uncertainty, they formulated the distribution of the fatigue life (f W ) based on the distributions of the fatigue life given the fatigue limit (f W|V ) and the fatigue limit (f V ).Moreover, by using the maximum likelihood method (MLM), runouts and failure data can be analysed together.D'Angelo et alii [9][10] proposed the Bilinear Random Fatigue Limit model, which is based on the formulation of the Random Fatigue Limit model, in order to obtain linear S-N curves, similar those employed in the standard and to define a strategy to consider the epistemic uncertainty without assuming a predetermined confidence level.Moreover, their model implies a very weak correlation between the estimators of location the fatigue limit and the fatigue life.In [11] the Six-Parameter Random Fatigue Limit model (6PRFLM) has been introduced in order to explicitly model the curvature from the finite life region towards the fatigue limit and keep the added values of the two previous model: the increasing variability of the log-fatigue life at lower stress ranges, the description of the logarithm fatigue limit as a random variable f V having location parameter µ V and scale parameter σ V , and the low correlation between model parameters related to the location f W|V and f V .The equation proposed is: where p is the model parameter which controls the curvature from the finite life region to the fatigue limit.
For the cover-plated steel beam datasets inferred in [11], the equation employed in the 6PRFLM revealed to better mimic the trend of the test data with respect to the Random Fatigue Limit model and the Bilinear Random Fatigue Limit model.The MLM has been employed to evaluate the estimator of the model parameters, that is the maximum likelihood estimator (MLE), θ � MLE .Pascual and Meeker alternatively employed the lognormal distribution and the smallest extreme value distribution to describe the variability of the fatigue limit and the fatigue life.Thus, the assumption made about the shape of these two random variables affects the description of the uncertainty, more details can be found in [5,7,8].However, in the literature, less attention is paid to the estimation of the epistemic uncertainty.In [9][10] the authors proposed a Monte Carlo based approach to evaluate confidence bounds on quantiles by employing a Nested sampling of the epistemic and the aleatory uncertainty.As described later, the epistemic uncertainty was assumed to follow a normal distribution and was estimated using the Likelihood function.

The linear regression theory
The linear regression model is based on the assumption that the observations are normally distributed.The estimators are evaluated by minimizing the sum of the residuals squared and assuming the stress range to be the independent variable.Thus, by employing a linear relation, the estimators of the slope and the intercept are a linear combination of the observation.Then, it is possible to construct in a closed form their sampling distribution.Because of the normality of the residuals, ε~N(0,σ), given an n-dimensional dataset, the Student-T distribution is employed as ancillary function and the following relations apply: where  is the number of tests, t n-2 is the Student-T distribution with n-2 degrees of freedom, S xx is the corrected sum of squares of the log 10 (∆S i ), the hat denotes the estimator of the parameters of Equation ( 1) and the upper-score denotes the mean value.Equations ( 3) and (4) denote the sampling distributions for the model parameters In the same way, the sampling distribution for the standard deviation, , is denoted by the following: where χ n-2 is the chi-square distribution (ancillary function) with n-2 degrees of freedom.By combining these formulations it is possible to determine confidence bounds on any quantile of the model.

The Maximum Likelihood Method
When the distribution of the fatigue life is not normal, the Student-T distribution is no longer suitable to describe the epistemic uncertainty.Thus, the distribution of the model parameter has to be known.The likelihood function is a function of the model parameter θ proportional to the probability of the data that were observed.The maximum likelihood method (MLM) involves the maximization of the likelihood function in order to estimate the model parameter.Moreover, the MLM can be employed to evaluate the uncertainty affecting the estimation of the model parameter.For this purpose, the Wald statistic can be employed.It assumes that the (long-run) distribution of the model parameters is a multivariate normal distribution.For each model parameter θ i , the following relation apply: Where I�θ � MLE � is the observed Fisher information matrix and  is the standard normal distribution.The multivariate normal distribution has mean equal to the MLE and the covariance matrix can be obtained from the Fisher information matrix, which is calculated from the second order derivatives of the log-likelihood function at the MLE.More details can be found in [7,8] where the epistemic uncertainty have been estimated by employing the Wald statistic.The main drawback of this approach consist in the fact that the resulting distribution might be non-meaningful in some cases [12], especially when a parameter is close to the boundary of its domain.This has been found for the 6PRFLM fitting cover plate data [11].More details about these methods can be found in textbooks dedicated to the MLM, such as [12], among others.

The Bootstrap Method
Another way to estimate the epistemic uncertainty is the Bootstrap [14].It is a resampling method, which is able to provide non-parametric statistical inferences, without any assumption about the distribution describing the quantity under consideration.In order to estimate a certain statistic and its uncertainty, the method involves the use of the sampling distribution, which is a probability distribution that shows how the statistic of interest would vary if the sample was collected several times from the population.The bootstrap distribution, an approximation of the sampling distribution, is obtained by sampling with replacement a certain number of bootstrap samples from the original dataset and estimate the statistic of interest for each one of them.It is necessary that the dimension of each bootstrap sample is the same as the dataset.Thus, a certain number of estimators of the statistic of interest are obtained, determining the bootstrap distribution, which can be intended as a frequentist estimation of the epistemic uncertainty.The main drawback of the bootstrap is the intense computational effort with respect to the other methods, which increases with increasing the dimension of the original dataset.Indeed, since the resampling is performed with replacement, the number of possible bootstrap samples would be equal to (2n − 1)!/[n!(n − 1)!], which makes the analysis time consuming if the original dataset consists of more than 25-30 data.For this reason, the Monte Carlo method is employed to randomly draw bootstrap samples in order to approximate the bootstrap distribution.

Posterior estimation using Bayesian Analysis. The Random-Walk Metropolis-Hastings algorithm.
The Bayesian statistic explicitly threats the model parameters as probability density functions.For this reason, the output of an inference performed using Bayesian statistics already includes the information about the uncertainty underlying the model parameters.
The Bayesian statistics involves the use of the Bayes theorem and gives a method for updating the probabilities of unobserved events, given that another related event has occurred.The Bayes theorem allows the evaluation of the conditional probability of observing the event A given that the event B has occurred: where P(A), the probability of observing A, is the prior information about the event A; P(B) is the probability of observing B and P(B|A) is the conditional probability of observing B given that A has occurred.P(A|B) in Equation ( 7) is usually referred to as the ''posterior".Recently, it has been applied to infer fatigue test data in [8,11].It is usually referred as noninformative (NBI) when no prior information is employed.Instead, the informative (IBI) involves the use of prior information on the model parameters.The random walk Metropolis-Hastings algorithm [12,13] has been employed to perform the analysis.It is a Markov Chain Monte Carlo (MCMC) method giving as output a Markov Chain which allows estimating the posterior distribution.The algorithm is based on the following steps: 1) Start from an initial arbitrary set of values for the model parameters; 2) Using a symmetric candidate distribution (e.g.normal), whose mean is the current state of the Markov Chain and with standard deviation to be tuned, sample a new candidate set for the model parameters; 3) Based on the criterion described in [16][17][18], accept or reject the candidate value based on the likelihood function and the prior knowledge.In case of acceptance, the candidate value becomes the state of the Markov Chain at the current time, otherwise, the previous state is kept at the current time.4) Repeat the steps 2 and 3 until the Markov Chain converges to its limiting distribution.In order to reach convergence, the candidate distribution should not be too narrow or too wide.The limiting distribution obtained through his procedure is a non-parametric estimation of the posterior.More details about the random walk Metropolis-Hasting algorithm and the tuning process can be found in [12][13][14][15][16], among others.As result of this procedure, the posterior distribution is approximated and its mean value identifies the Bayesian mean estimator of the parameter.

Results and Discussion
In order to show the degree of affinity of the proposed methods, the two fatigue datasets are inferred by employing the LRM, which is based on the Basquin relation, Equation (1), assuming that the base 10 logarithm of the fatigue life follows a normal distribution.The analysis involves the estimation of the parameters and both the aleatory and epistemic uncertainty according to Table 3.The effect of the epistemic uncertainty on the quantiles of the fatigue life is estimated using the approach proposed in [6,7].Then, the 6PRFLM is employed in order to infer the "large sample" data.The uncertainty is estimated and reduced following the scheme in Table 3.The different methods are discussed and the effect of the reduction of the uncertainty is also quantified.

The Linear Regression Model
The methods summarized in Table 3 can be employed to infer the data, estimate the model parameters  0 ,  1 ,  and the associated uncertainty.In order to perform the fit using the Bootstrap and the Bayesian inference, the number of samples generated is equal to one million.

The large sample dataset
For the failure data considered in the large sample dataset, the estimators of the LRM parameters and the coefficient of variation are reported in Table 4.The value of the log-likelihood function at the MLE is 102.18.However, such value is related to the failure data only.By inferring the whole dataset using the LRM and the MLM thus, including the contribution of runouts, the value of the log-likelihood function drops to 3.43.The estimators obtained using the different methods are equal.It is known that the mean Bayesian posterior estimator tends to be equal to the MLE with increasing the sample size.The small difference observed is addressed to the use of the numerical procedure employed to determine the MLM, the Bootstrap and the NBI based estimators.The normality of the distributions describing the epistemic uncertainty has been verified in each case by using the Kolmogorov-Smirnoff test, leading in any case to "p-values" close to 1.In this case, the LSM, the MLM and the NBI are equivalent and would lead to a close estimation of each quantile of the fatigue life, when epistemic uncertainty is not considered.In order to consider both the epistemic and the aleatory uncertainty, a nested sampling has been performed in a Monte Carlo framework as described in [9][10].The 5% quantile of the linear regression model has been estimated considering the effect of the epistemic uncertainty.From Figure 1 it results that, despite the estimators of the model parameters and the coefficient of variation of the marginal distribution are comparable for each model parameter, the 5% quantile predicted by the least square method predicts a lower fatigue life (-21.93%at Δσ=100 MPa) than the other methods, that at maximum differ by 3%, in absolute value.This large difference obtained by employing the linear regression theory is mostly associated with the fact that in the linear regression theory the correlation between the model parameters is not taken into account.Instead, the correlation is estimated when the Wald statistic and the Bootstrap method are employed.However, in the former case the correlation coefficient is estimated at the mode of the Likelihood function through the Fisher Information Matrix, and, in the latter case, the correlation is evaluated through the resampling procedure.

The small sample dataset
For the 'small sample' analysis, the estimators of the model parameters are evaluated only using the LSM and the MLM because the NBI using the Random Walk Metropolis-Hastings algorithm results in a Markov-Chain that hardly converges to its limiting distribution.In other words, the amount of test data is so small that the resulting Markov-Chain was very sensitive to the values of the tuning parameter, the standard deviation of the candidate distribution.For this reason, it was not possible to obtain a converging Markov Chain by using a "trial and fail" procedure.As shown in Table 5, the estimators of the parameters β 0 and β 1 are in good agreement.Because of the small amount of test data, the LS estimator of the parameter  is 1.1 times higher than that one obtained by the MLM.Moreover, as it can be seen in Figure 2, also the estimation of the epistemic uncertainty led to different results.In general, the coefficient of variation is in any case higher than for the "large sample" dataset, meaning that due to the smaller amount of data the epistemic uncertainty increased.When the Wald statistic is employed, the epistemic uncertainty are normally distributed.Instead, the distribution resulting by employing the Bootstrap is of a different type.For example, in Figure 2 it results that the bootstrap distribution associated to the parameter  is bi-modal; this can be seen because the trend of the cumulate distribution function shows two inflection points, the first at a cumulate probability value around 0.2 and a second one which is around 0.6.The estimated quantiles of the fatigue life are shown in Figure 3.As for the "large sample" dataset, the results are obtained considering the epistemic uncertainty.LSM predicts a 5% quantile of the fatigue life that is 0.11 times the life predicted by the lower prediction bound as defined in the linear regression theory.The 5% quantile of the fatigue life estimated considering the evaluation of the uncertainty given by the Bootstrap and the Wald statistic are closer to each other and predict a fatigue life respectively 1.18 and 1.19 times higher than the 5% lower prediction bound as defined in the linear regression theory.

The Six-Parameters Random Fatigue Limit Model
The 6PRFLM has been employed to infer the large sample data according to the scheme reported in Table

Figure 1 .
Figure 1.Fatigue test data belonging to the large sample dataset and 5% quantile of the fatigue life obtained for the linear regression model considering the epistemic uncertainty.

Figure
Figure 2. Epistemic uncertainty related to the parameter  of the LRM fitting the small sample data.

2 .
Figure 2. Epistemic uncertainty related to the parameter  of the LRM fitting the small sample data.

Figure 3 . 20 Figure 4 .
Figure 3. Fatigue test data belonging to the small sample dataset and 5% quantile of the fatigue life obtained for the linear regression model considering the epistemic uncertainty.For comparison, also the 5% lower prediction bound is plotted.

Table 1 .
[4]cription of the dataset related to cover-plated steel beam with welded and unwelded ends, from[4]determining the "large sample" data.

Table 4 .
Results of the LRM fitting the "large sample" data.The estimators of the model parameters and the COV are reported for each method employed.

Table 5 .
Results of the LRM fitting the "small sample" data.The estimators of the model parameters and the COV are reported for each method employed.