Factors Influencing Ground Settlements on Different Geomorphology Units Based on Principal Component Analysis

Based on the case Nanjing Metro Line 4, the method Principal Component Analysis (PCA) was used to study the influence of ground settlement by shield tunnel construction on different geomorphology units. Correlation analysis and weighted least square method (WLS) were applied for variables selection and to obtain their relationship with settlement. 5-7 principal components could be used to present the initial 1921 variables after decreasing the dimensions of data. For the floodplain of Yangtze River, variable parameters that highly linearly dependent on settlement were depth of tunnel, distance between the roof of tunnel and the bottom of soft soil layer, thickness of soft soil, compression modulus of soil that tunnel passed through and speed of the cutter head. For the Qinhuai ancient channel, variable parameters were Poisson's ratio, porosities, moisture content, unit weight, cohesion, internal friction angle, compression modulus of soil that tunnel passed through, advancing speed, earth chamber pressure. For the terrace of Yangtze River, variable parameters were cohesion, porosities, moisture content, Poisson's ratio, compression modulus and unit weight of soil. In addition, for the geomorphology unit with col landform, variable parameters were different. Residuals of regression formula are small, which will have certain reference value in practical engineering.


Introduction
There are many factors that affect the ground settlement during shield construction. The principal component analysis (PCA) is a statistical analysis method proposed by Pearson and Hotelling [1], which can address the challenge of a large number of factors. It can convert the initial variable to a few independent principal components featured with strong directivity without losing information [2][3][4][5]. Wu and Li [6] carried out principal component analysis on 12 selected indexes. The analysis results show that the depth ratio, the disturbance coefficient and the ratio of the width to the depth are the main control factors of the mining subsidence in Yushenfu coal mine. YOUNG et al [7] adopted principal component analysis to analyze the rock spectrum and to further establish the quality evaluation system of rock. Giuliani and Alessandro [8] used principal component analysis and blurred the area between the general statistical technology and quantitative models for specific problems. The statistical mechanics framework of biological system modeling is established. From the perspective of the system, the narrow perspective of reductionism was overcome. Although PCA is widely used in various fields, whereas studies on the settlement analyzed in PCA is very little The laws of ground settlement under different geomorphic units are various. By preprocessing a large number of data, R language is employed as the carrier to realize data mining and data visualization. Based on the actual data analysis, after analyzing a lot of complicated data with principal component analysis, the factors that are linearly related to the cumulative value of ground settlement were screened out, and they were used for regression analysis, so as to realize the research on the rule of the development and change of the ground settlement on different geological conditions and establish a relationship between the factors and the ground settlement, which intended to provide reference for practical engineering.

Principal component analysis of multivariate data
Equation (1) is used for standardized changes: Where x * is standardized value; x , x , … , x are the observation values of the sample; x and are average and standard deviation.
The task of principal component analysis is to obtain 1,2, … , , 1,2, … , , , which is the coefficient of comprehensive variables. According to the x or x * , the comprehensive variables can be expressed in ∑ . The cumulative variance of the comprehensive is equal to * s. was called the first principal component because of maximum variance, which absorbs the most information of the original variables. is second, and so on. Variance contribution rate can be calculated by 100%, If the cumulative contribution rate of the previous principal components is more than 90%, the following principal components can be omitted.

Geomorphic unit along the subway
As shown in Table 1

Data preprocessing
By consulting the literature [9][10][11][12][13], the variables were designated as depth of tunnel; groundwater level; moisture content, porosities, compression modulus, internal friction angle, cohesion, Poisson's ratio, unit weight, advancing speed of shield machine, speed of the cutter head, earth chamber pressure. The variables related to soil parameters were calculated by weighting the soil layers above tunnel floor and soil that tunnel passed through respectively. For the floodplain of Yangtze River, the additional variables of " thickness of soft soil" and "distance between the roof of tunnel and the bottom of soft soil layer" were taken into account because it has a thick layer of soft soil.

Floodplain of Yangtze River
The principal components were obtained by solving the equation and the eigenvalues. Contribution rate and cumulative contribution were obtained by calculating the correlation coefficient matrix. As shown in Figure.1, the cumulative contribution rate of the first 7 principal components is 92.27%, which means 92.3% of the total information were covered. The load of factors is shown in Table 2, X 1, X 2, … X 21 are initial variables. Fig 2 shows the contribution of the variables to the principal component, the horizontal axis is the first 7 principal components, and the longitudinal axis is the variables. In a rectangle, the deeper the color (close to black) is, the greater the contribution rate to the corresponding principal component is. For example, groundwater level has the largest contribution rate that is over 60% to the 5 th principal component. Variables that have a larger contribution rate to the 2 nd principal component are unit weight of soil that tunnel passed through; cohesion and internal friction angle of soil.
The 1 st principal component are thickness of soft soil, Poisson's ratio, porosities and moisture of soil that tunnel passed through, earth chamber pressure. The 2 nd principal component mainly points to unit weight; Poisson's ratio; internal friction angle and cohesion of soil that tunnel passed through. The 3 rd principal component mainly points to internal friction angle; porosities, moisture content; compression modulus. The 4 th principal component mainly refers to depth of tunnel, the distance between the tunnel roof and the bottom of the soft soil, thickness of soft soil; the compression modulus of soil that tunnel passed through, and speed of cutter head. The 5 th principal component mainly points to the groundwater level. The 6 th principal component mainly points to depth of tunnel, groundwater level, speed of the cutter head, compression modulus of soil that tunnel passed through, the 7 th principal component mainly points to cohesion and unit weight.

Ancient channel of Qinhuai River
As shown in Figure 3, The first 6 principal components contain 92.3% of the total information of the 19 original variables. The top 6 principal components that have the highest contribution rate to the comprehensive index are screened out. The load of factors is shown in Table 3.

Terrace of Yangtze River
As show in Figure 5, the first 7 principal components included 92.2% of the total information of the 16 initial variables. The top 7 principal components are screed out, and the results are shown in Table 4.

Col landform
From Figure 7, in the condition of the subgeomorphology of col landform, the cumulative contribution rate of first 5 principal components is 91.5%, which means it contains 91.5% of the total information. The Factor loading of the top 5 principal components are extracted as the Table 5.

Influence factors of ground settlement
The correlation analysis between principal components and ground settlement was carried out with the R Programming Language. The significance was described by Pearson coefficient. Variables that has the greatest linear correlation with ground settlement on each geomorphic unit were screened out.

Floodplain of the Yangtze River
As shown in Considering the factor loading of each principal component, the most important factors are given in Table  7.

Ancient channel of Qinhuai River
As shown in Table 8, in the perspective of statistical significance, the 1 st principal component and 2 nd principal component are significantly related to the settlement. In the perspective of correlation coefficient, the 1 st principal component is highly linearly related to settlement. The 2 nd one is lower linearly. The linear correlation factors of Qinhuai ancient channel were given in Table 9.  Table 9. Impact factor table.

F1**
Cohesion, internal friction angle; porosities, moisture content; Poisson's ratio; compression modulus; advancing speed; earth chamber pressure; porosities of soil that tunnel passed through; moisture of soil that tunnel passed through; unit weight of soil that tunnel passed through; soil Poisson's ratio of soil that tunnel passed through F2** unit weight; compression modulus of soil that tunnel passed through; cohesion of soil that tunnel passed through

Terrace of Yangtze River
As shown in Table 10, it is found that the 2 nd and 5 th principal components are closely related to the settlement, and the 1 st principal component is significantly related to the settlement. According to the correlation coefficient, the 2 nd principal component is high linearly related to the settlement, and the 5 th is linearly at a relatively low level . Each factor that has higher correlation with settlement under the unit of terrace of Yangtze River Terrace was given in Table 11.

Col landform
From Table 12, in the perspective of statistical significance, the 1 st , 3 rd , 4 th , and 5 th principal component are closely related to the settlement. In the perspective of the correlation, 2 nd and 4 th are linearly related to settlement at a relatively low level. As shown in Table 13, Extremely significant factors was given.

Predicted research of ground settlement caused by shield method
Due to the deviation of measured data, the method weighted least squares (WLS) was applied for regression analysis with elimination of heteroscedasticity. It is evasive to carry on regression analysis with Ancient channel of Qinhuai River because of the limit of data.

Floodplain of Yangtze River
For the floodplain of Yangtze River, the 4 th principal component is linearly related to the settlement. 5 variables of the 4 th principal component were screened out to get regression equation, they are X 1 (depth of tunnel),X 2 (distance from tunnel roof to bottom of soft soil), X 3 (cohesion), X 4 (Compression modulus), X 5 (speed of cutter head). Equation (2) shows the result: As shown in Figure 9, Except for a few abnormal points, the residual values of most points are evenly distributed near the 0 level. The fitting results are basically normal distribution, which is consistent with the theory.

Terrace of Yangtze River
For the terrace of Yangtze River, the 2 nd principal component is highly linearly related to settlement, and the 5 th principal component is low linear correlation. Analysis applies 6 variables for regression. The settlement can be estimated by equation (3) Y=- Where X 1 is depth of tunnel; X 2 is advancing speed; X 3 is speed of cutter head; X 4 is groundwater level; X 5 is Poisson's ratio of soil that tunnel passed through; X 6 is earth chamber pressure.
As shown in Figure 10, the regression effect of the first grade terrace of the Yangtze River is ideal.

Col landform
According to the regression coefficient, the settlement of Subgeomorphology of the depressions can be estimated by equation (4): Where X 1 is compression modulus of soil that tunnel passed through; X 2 is cohesion of soil that tunnel passed through; X 3 is advancing speed; X 4 is internal friction angle; X 5 is speed of cutter head; X 6 is earth chamber pressure.
As shown in Figure 11, The amount of measured data is less, the regression effect is lower than the other unit. But most of the regression results are evenly distributed around the 0-horizontal line. So, the regression result still has practical reference significance.

Conclusion
The influence factors of ground settlement caused by shield tunnel construction is investigated by employing the method principal components analysis. The dimension of numerous variables was significantly reduced by introducing PCA. It is found that factors which has highly influence on ground settlement are different on each geomorphology unit. The factors that were screened out by correlation analysis are coincide with the engineering experience. In addition, the residuals of regression equation formed by selected factors are acceptable.