Data-driven integration evaluation from the perspective of Adaboost and its application in WeChat public number ranking

The traditional comprehensive evaluation is difficult to model when dealing with large data with large parameters and complex structure, and it cannot adapt to the update of data. In order to improve this situation, this paper draws on the Adaptive Learning Adaboost perspective in statistical learning to develop a data-driven integrated evaluation model that updates the weight of sample weights and weak evaluation models with data. Three specific weak evaluation models were selected: data-driven Topsis method, principal component analysis method and factor analysis method. Taking the ranking of WeChat public account as an example, the results show that the accuracy of the integrated evaluation model is 88.57%, which is 17.14%, 31.43% and 28.57% higher than the data-driven Topsis method, principal component method and factor analysis method.


Preface
In order to effectively use data to service production and life, scientific evaluation methods are needed to help people make correct decisions from a large number of choices. Traditional comprehensive evaluation has two shortcomings when dealing with big data with many parameters and complicated structure: first, it is difficult to model when it encounters data with too many parameters and too complicated structure; second, with the update of the data, the model is accurate The degree will also get lower and lower. Therefore, the effective use of the big data environment to improve evaluation methods is a need to keep pace with the times.
Scholars at home and abroad have conducted many studies on integrated evaluation. For example: grey statistical method and entropy technology combined with fuzzy method (2007, Pang Qinghua) [1] , data envelopment analysis method (2009, Wu DD) [2] , analytic hierarchy process combined with approaching ideal point method (Topsis) (2010, Yao Shengbao ) [3] , fuzzy comprehensive evaluation (2013, Cai Zexiang, Liu Ye) [4] . Although these studies used the integration method, they were not integrated from a data-driven perspective. In this paper, the Boosting method is used to construct an integrated evaluation model by combining the Topsis method commonly used in multi-attribute decision-making evaluation methods with the principal component analysis method and factor analysis method to adapt to multiple indicators, in order to improve the above integration methods, which are easily limited by data types and quantities. Deficiencies.
With the popularity of big data today, the data-driven thinking has been applied to various industries for statistical evaluation. In the industrial field, the genetic algorithm (GA) and factor analysis method are used to optimize the evaluation indicators, and a data-driven construction safety evaluation model is finally combined with the Bayesian network (BN) to give a detailed quantitative evaluation procedure of the model (2017 , Bai Xiaoping and Pu Tao) [5] ; In the business field, a data-driven model and neural network method were used to establish a comprehensive carrying capacity evaluation model to evaluate the comprehensive carrying capacity of the Tianjin coastal area (2012, Liu Aizhen, Si Qi and Li Mingchang ) [6] and data-driven Topsis method (2018, Zhao Juan, Gong Yicheng) [7] ; in the medical field, a data-driven method is used to evaluate and improve the strict blood glucose control scheme in the intensive care unit (2016, Kuiper BI With Number SI) [8] .
At present, there are not many evaluations using data-driven ideas in the social field. Therefore, this article selects 70 public accounts in WeChat from March 5, 2018 to March 11, 2018 as the research object. A total of seven important indicators were selected-the number of posts (articles), the total reads (thousands), the headline reads (thousands), the average reads (thousands), the highest reads (thousands), the number of likes (thousands) And the heat index to measure the market influence of the major WeChat public accounts (all data in this article were collected by the octopus collector).

Integrated evaluation model from AdaBoost perspective
The integrated evaluation model based on AdaBoost idea is to use the data-driven Topsis method, principal component analysis method and factor analysis method to train the three comprehensive evaluation methods as weak evaluation models, and then calculate their respective weights according to their error rates, and finally get A strong evaluation model composed of these three weak evaluation models.
The experiment was performed in the following six steps: Step 1. Prepare the target dataset.
Step 2. Standardize the data.
Step 3. Take the three evaluation methods of data-driven Topsis method, principal component analysis method and factor analysis method as three weak learners (denoted as Step 5. Get the final learner-linear integrated evaluation model G(x): Step 6. Test and compare the final integrated evaluation model. First, it compares with each single comprehensive evaluation method to verify the validity of the model; then compares with the traditional combination method to verify the superiority of the combination method.

Evaluation of WeChat public account by data-driven topsis method
The specific ranking results are shown in Table 2. The score and ranking of the data-driven Topsis method, among which there are 50 correctly ranked apps, with an error rate of 28.57%.

Evaluation of WeChat public account by principal component analysis
The specific results are shown in Table 3.  Using SPSS software for comprehensive evaluation of principal components, there were 40 correct rankings with an error rate of 42.86%.

Evaluation of WeChat public account by factor analysis
The specific results are shown in Table 4. Reader Reader Using SPSS software for comprehensive evaluation using factor analysis, there were 42 correct ones with an error rate of 40%.

Comparative analysis and conclusion of integrated evaluation
This paper compares the ranking results of the data-driven integrated evaluation model from the perspective of Adaboost with the data-driven Topsis method, principal component analysis method, and factor analysis method. The specific data are shown in Table 5. It can be seen that the accuracy rate of the evaluation method integrated in this paper is the highest, reaching 88.57%, which is greatly improved compared with the other three weak evaluator evaluation methods. The comparison results show the feasibility of the integrated method and the effectiveness of the model improvement.

Conclusion
This article builds an integrated evaluation model from the perspective of AdaBoost based on the data-driven concept and the idea of improving integration. The data-driven Topsis method, principal component method, and factor analysis method are given weights in the integrated model based on the error rates of their respective evaluation results. Apply the integrated evaluation model to the impact evaluation of WeChat public account. The results show that the model accuracy is 88.57%, which is 17.14%, 31.43%, and 28.57% higher than the data-driven Topsis method, principal component analysis method, and factor analysis method, respectively, indicating the feasibility and effectiveness of the integrated evaluation method.
However, there are still two shortcomings in this article: First, in this paper, only three weak evaluation methods are selected in the evaluation model integration, but more types of comprehensive evaluation methods are selected to integrate the evaluation of complex structure data, which is expected to be more comprehensive. Accurate integrated evaluation model; the second is to consider applying the integrated evaluation model in different fields, which is expected to enhance the application value of the model.