An Analysis of Domestic Tourism Consumption Based on R Software

R is a software system which can be used for data processing, calculation and mapping. The syntax of this language is superficially similar to C, but semantically it is functional programming language. It is widely used in statistical analysis. So this paper used it to analyze China domestic tourism consumption data during 1999-2015, and analyzed the main factors affecting the consumption level of domestic tourism in China from residents’ disposable income, GDP, per capita consumption of tourists, tourists, the mileage of railways and the number of travel agencies. Finally it established and solved the multiple linear regression models, taking domestic tourism consumption as dependent variable and taking GDP,and per capita consumption of tourists, the number of tourists and the mileage of railways as dependent variables. The results show that there is significant positive correlation between domestic tourism consumption, GDP, per capita consumption of tourists, the number of tourists and the number of travel


Introduction
Liu Shenzhen's "Analysis of Domestic Tourism Consumption Based on Multiple Linear Regression Model" published on The Journal of Chongqing University of Technology (Natural Sciences) in June 2016, selecting data on domestic tourism consumption in China from 2003 to 2012, used a multiple linear regression model with domestic tourism consumption as the dependent variable and per capita disposable income, tourist per capita consumption, and number of travel agencies as independent variables to analyze factors affecting China's domestic tourism consumption. The paper concludes that: Domestic tourism consumption is positively correlated with disposable income of residents and per capita consumption of tourists, and negatively related to the number of travel agencies [1] . However, the results are inconsistent with the conclusions. Residents' disposable income and the number of travel agencies are not significant in this regression equation. Meanwhile, the article only makes a simple regression analysis of the variables without considering the problems of multicollinearity, time series between variables and etc. This paper holds the idea of verifying results and expands the selection time of variables, sorts out relevant literature from previous scholars, and increases the number of independent variables based on the preservation of the author's original variables, thereby performing a regression analysis of the equations.

Research review
In recent years, with the continuous improvement of people's living standards, people have not been satisfied with the basic consumption, paid more attention to consumption at the spiritual level, which has led to an increase in domestic tourism consumption. Analyzing the major factors affecting tourism consumption has a very important significance for the development of tourism. It can not only provide a reference for tourism management departments to formulate tourism development plans, but also can be used to make shortand medium-term predictions of domestic tourism consumption and promote the healthy development of the tourism industry [2][3][4][5][6] . In studies related to domestic tourism consumption, Li Yunpeng (2005) found that China domestic tourism consumption of urban residents was mainly affected by current income and prices [7] . Sun Gennian and Xue Jia (2009) pointed out that the per capita disposable income was the basic factors affecting and restricting domestic tourism consumption. The view that income has a decisive effect on urban residents' domestic tourism consumption, have been extensively empirically tested [8] . In addition, Wu Chunlai and Gu Huimin (2003) believed that the changes in residents' income distribution structure had transformed the tourism market from a mass-visited tourism industry to a multi-faceted, heterogeneous structure [3] ; Teng  found that the urban residents' tourism consumption differed in the types of travel and per capita tourism spending, and this difference was related to per capita income [9] . From the above, we can find that the per capita income of residents has an important role in residents' tourism consumption. This article selected it as one of the explanatory variables. At the same time, with the continuous improvement of living standards, people are pursuing more spiritual and cultural consumption. Because of convenient transportation, more people are beginning to go abroad to experience different scenery and different customs. Weng Gangmin et al. (2007) found that there was the most closely related relationship between urban residents' travel rate and the factors of living standards, and per capita consumption of tourists was an important factor affecting tourism consumption income [10] . Wang Yu and Qin Yuanhao (2011) discussed the relationship between per capita tourism expenditure, GDP, free disposable income of urban residents, average labor compensation of urban residents, total amount of RMB deposits, length of railway, and free time [11] . They put these factors into the model and gradually returned. Finally, it was concluded that discretionary income, length of railway, and free time had a significant impact on tourism consumption. In domestic research, most scholars have used empirical analysis methods to select major factors for quantitative analysis in factors such as people's savings, per capita income, per capita disposable time, railway length, number of travel agencies, and national policies [12] .
Based on the above literature review, we can draw the main factors and models that affecting tourism consumption. This article established and solved a multivariate linear regression model with the domestic tourism consumption (Y) as the cause, per capita disposable income of residents (X1), per capita consumption (X2) and number of travel agencies (X3), domestic tourists (X4), gross domestic product GDP (X5), and total railway kilometers (X6) as independent variables in the multivariate linear regression model.

Data sources
According to the availability, authority, and unified statistical principles of data, the data in this paper are all selected from the "China Statistical Yearbook" from 1999 to 2015. Through the review of previous literature, this paper selects the per capita disposable income of residents (X1), per capita consumption (X2) and number of travel agencies (X3), domestic tourists (X4), gross domestic product (GDP) GDP (X5), and total railway kilometers (X6) as independent variables in the multivariate linear regression model.

Research model and theoretical hypothesis
In the selection of variables, the influence relationship between different variables has been combed. To make this model simple and clear, we use multiple linear regression model, combining the economic theory and the running law of tourism itself, the following equations are constructed: Y=β + β * X + β * X + β * X + β * X + β * X + β * X +μ Among them, Y is dependent variable, X ~X is independent variable, β ~β is correlation coefficient, andμis random disturbance term.

Stability analysis
The empirical judgment method shows that the ADF test value (T value) > 0.05(critical value), does not pass the test. That means, the unit root is not stable, otherwise it is stable. According to the above results, the t value of the original data is more than 0.05, so the time series is not stable. After the first-order difference, only the value of the variable X3 and X4 is a stationary sequence, but because the other four variables are not stable, this paper carries out the second-order difference for all data and the second-order difference results are less than 0.05. The two order difference sequence is stationary.

Co-integration test
The six variables in this paper are all unstable, but both of them are second order stationary. So the cointegration test is used to determine whether the regression equation composed of these six variables is stable or not. If it is stable, it shows that there is a stable relationship between these variables, and this research is significant. The first step is to regress the equation to generate residuals. In the second step, unit root test is performed on the residual sequence. The test result showed t=0.001452 < 0.05, so the residual sequence was stable after testing.

Multicollinearity analysis
In general, because of the limitation of economic data, general correlation among the explanatory variables existed in the design matrix. According to the criterion of correlation coefficient, when the absolute value of correlation coefficient is > 0.8, it shows that there is significant linear correlation between the two variables. When the absolute value is less than 0.3, it is called low correlation. Others were moderately related. According to the calculation results, it can be found that the correlation coefficient between the factors in the equation is more than 0.9, showing a highly correlated trend. In order to guarantee the accuracy and the goodness of the model, the factors should be eliminated and then regressed step by step.  According to the calculation results, the P values of X3(0.063317) and X5 (0.090355) are significant under the condition of less than 0.1, and not significant under the condition of less than 0.05. Usually the closer the DW value is to 2; the more unlikely the equation is autocorrelation. The DW value here is 1.5971, the equation may have autocorrelation. Other results show that the regression results of the equation are better. The selection of independent variables in this paper is relatively large, so this paper chooses to continue to eliminate variables for regression. Then the equation is Y ~ X2 + X4 + X5 + X6.  According to the result of the calculation, the p value and t value of each variable are significant; R2 is close to 1, which shows good goodness of fit and F value. According to the results of look-up table, DW value is close to 2 in the corresponding interval. This paper holds that there is no self-correlation in the equation. According to the corresponding data, this paper does not eliminate the independent variable. The final expression of the equation is: Y=-25930+31.33*X +0.1072*X +-0.0446*X +1749*X

Heteroscedasticity test
In order to ensure that the estimators of regression parameters have good statistical properties, an important assumption of the classical linear regression model is that the random error term in the general regression function satisfies the same variance, which means they all have the same variance. If this assumption is not satisfied, which means the random error term has different variances. There is heteroscedasticity in the linear regression model. According to White Test, P value = 0.5416 > 0.05. According to white test, there is no heteroscedasticity in the model.

Sequence correlation test
If there is a correlation between the expected values of the random error term, it is said that there is autocorrelation or sequence correlation between the random error terms, which is commonly seen in time series. The results show that there are heavy multicollinearity between X2 and X6, and serious multicollinearity between X4 and X5. Because of the higher multicollinearity among the variables of the equation, the corresponding variables are removed, there is no multicollinearity between X2 and X3, but the results of X3 show that p = 0.565 and T = -0.589, the variable X3 is not significant, so the equation is not illustrative. Considering the significance of the variables and the better regression effect, there is only multicollinearity among the variables, and there is no variable X3 in the equation, so the article chooses no longer to eliminate the variables, and adopts the method of principal component analysis to further optimize the regression equation.

Principal component analysis
By orthogonal transformation, a set of variables that may be correlated are transformed into a set of linearly independent variables, which are called principal components.
According to the results of R studio calculation, the final number of principal components is 1, and the coefficients of each principal component are 0.   (1) With the aid of R statistical analysis software, the regression equation is established with the per capita consumption of tourists, domestic tourists, GDP of gross national product and the total number of railway kilometres. The results show that China's domestic tourism consumption is positively related to the per capita consumption of tourists, domestic tourists, gross national product (GDP), and the total number of railway kilometres. There is no significant relationship with the number of travel agencies, which is contradictory to the conclusions of Liu Zhenzhong's article.
(2) According to the established model and the result, it is not hard to find that gross national product (GNP) GDP is highly related to the disposable income of the residents. Although the article excludes the variable of residents' disposable income, GDP can partly reflect the size of the residents' disposable income. In recent years, with the rapid growth of China's economy, the disposable income of the residents and GDP are increasing, which lead to the improvement of transportation, communication and accommodation. It is more convenient for people to travel. Tourism is no longer the exclusive property of rich people.
(3) There is no significant correlation between number of travel agencies and domestic tourism consumption. With transportation convenience, the rapid development of OTA and UGC platforms and negative news of the travel agencies, more and more tourists prefer to travel by self-help rather than package tour. Tourists who join a package tour mainly go abroad, so the numbers of travel agencies have no significant impact on domestic tourism consumption. From the present development situation analysis, the travel agencies need to break through the models of traditional low price group, the zero pay groups and the shopping group, and seek the cooperation between the online OTA and the offline entities so as to seek a developing way.
(4) According to the analysis of previous surveys, tourists spend about 30% on transportation, which is enough to explain the importance of transportation and the impact on domestic tourism consumption. To develop domestic tourism, we should first develop transportation industry; especially increase railway mileage and road mileage. Many scenic spots are located in the poor areas, the lack of transportation facilities and other infrastructure restricted the development of tourism and the economy, so speeding up the construction of railways in these underdeveloped areas is of great importance. At the same time, the investment of highway and other transportation facilities will be increased to form a more unified national mechanism, which is conducive to the development of domestic tourism.
(5) Relevant policies and institutions need to be improved, so as to improve the overall service level of the tourism industry. With the improvement of the people's living standard, domestic tourism consumption has been constantly increasing; tourism is more and more important. However, the overcharge problems of tour guide and the sky-high cost of scenic spots made tourists take a negative attitude towards some destinations. Tourism, a green industry, can not only promote economic, but also bring jobs. It is a way to develop and prosper in resource-poor areas. Therefore, the country needs more supervision system and perfect corresponding laws and regulations to promote sustainable and healthy development of the tourism industry.