Technical standards opportunity identification of Chinese manufacturing industry

Based on Standard Literature Library of CNKI, opportunities for technical standards development are identified. The initial keywords of technical standard are determined based on LDA model, whose similar keywords are further confirmed by Word2Vec, and then combining LDA model with Word2Vec the representative keywords of technical standard of manufacturing industry are determined. According to confirmed representative keywords, the core technical standards of manufacturing industry are identified by using the center degree and “M-core” method. The dimensions of the frequency square matrix are reduced by using the PCA module of Python, and the scatter plot is brawn, the technology gaps are identified by combing the consideration on the core technical standards, and then the future development opportunities are confirmed based on the map of patent technology effect matrix.


Introduction
Based on the technical basis of existing standards, identifying the opportunities of future technical standards and scientifically selecting the construction scheme of technical standards are the premise and foundation for the cultivation and development of technical standards in China's manufacturing industry. In the early stage, patent documents were mainly used as data sources for technological innovation and change management research. With the rapid development of science and technology and the rapid progress of manufacturing industry in China, the research of technical standards has more technical value and industrial guidance value. But the research based on technical standard literature is not perfect. First of all, the high frequency words or subject words based on the bibliometric method for technical opportunity recognition, its accuracy needs to be improved. Therefore, this paper uses the method of combining the theme model with the keyword similarity model to carry out bibliometrics to ensure that the extracted topic words are representative. Secondly, the current research uses the network co-occurrence map to determine the technical hot spot and the patent map to excavate the technical blank spot. Based on the combination of centrality and "m-kernel ", this paper removes the weak connection and isolated nodes.

Theoretical foundations 2.1 The importance of technical standards
Against the background of the in-depth promotion of "standardization ", technical standards have replaced prices, technologies, patents, and become the core of enterprise competition and the key to the direction of technological development in leading industries (Zou Siming, 2017) [1] . Peng Hailing et al. (2019) also pointed out that once a technology is successfully broken through, and the technology is put into application and integrated into the standard, the technology will be rapidly spread and become the mainstream technology in the industry. This technology standardization will help enterprises to grasp the "voice" of the industry and guide the development direction of industry technology [2] . Based on guiding the future technological direction, technological standardization or the bridge between technological innovation achievements and practical applications is the key for enterprises to realize innovation and development. In addition, technical standardization can improve product quality [3,4] .
For the manufacturing industry, technical standards are conducive to the adjustment of the internal economic structure of the industry, standardize the requirements of the internal behavior of the industry, and promote the optimization and upgrading of the industrial structure (Lu Meng 2013) [5] . Tao Zhongyuan et al. (2016) believe that technical standards promote technological innovation, promote advanced technology spillover, promote product research and development, and realize upgrading in value chain by constraining technological diversity, accelerating technology diffusion and transfer [6] . It can be seen that the construction of technical standards is of great strategic significance to the manufacturing industry.

Technical standard opportunity identification
Sungjoo Lee et al. (2010) proposed a keyword-based patent map creation method, Identify potential technological innovation opportunities by looking for blank areas with sparse but large map pate nt density [7] . MitkovaL team (2015) and Marra A team (2016) further visual patent map construction methods based on semantic structure analysis, text analysis and principal component analysis, And to understand the blank area by combining the characteristics near the blank area [8,9] . However, the current research based on the standard literature information, taking into account the similar words, extracting the subject words to identify the standard blank area, understanding the blank area based on the surrounding core standards and identifying the future development opportunities is still not perfect.
Network co-occurrence diagram. Network co-occurrence is widely used in the future of technology development (Huang Lucheng et al., 2014) [10] , technology opportunities and forecasting of evolution trends (Jeong, 2015) [11] . A scholar uses CiteSpace to map a social network of keywords, Research on technological developments, (Wang cuibo et al., 2020) [12] .But in the current study, Research on effective access to core technical standards is inadequate, and how to combine the technical efficacy map to determine the more accurate future technology development direction after clarifying the blank area also needs to be deeply studied.

Keyword extraction
(1) Generate text data. In this paper, based on the "Standard Data Base" and using the retrieval strategy of manufacturing industry as the subject word, the obsolete standard documents are manually cleaned out, and 207 relevant technical standard documents which are most in line with the characteristics of the manufacturing industry are selected, and the standards are merged into a complete document. The Jieba word breaker of Python is used for word segmentation. A lot of words with no important text meaning exist in the standard, but they appear frequently. They are called stop words, such as "or ", "yes" and so on. In the python, the edited stop word table is introduced to clean the data. Obtain text data for analysis.
(2) Initial keyword extraction based on LDA model. Using the Sklearn package in the Python software LDA model, the theme word extraction of the cleaned document is carried out by constructing the theme model to form the initial keyword set, as shown in Table 1. Initial keywords 1 equipment, material science, adhesives, tests, system, process, chemical raw materials, radio frequency, non-metallic minerals, technology 2 hygienic protection, clean, hygiene, technology, energy consumption, chemical raw materials, quality, reliability, systems, methods 3 information, automation, foundation, system, specifications, interfaces, data processing, control system, connect, hygiene 4 experiment, processing, rail transit, control, specification, connect, technical specification, technology, materials, automobiles 5 product, technology, specifications, systems, process, data, machining, quality, reliability, information (3) Determination of similar words based on Word2Vec keywords. When using the LDA model to extract the theme words, we should set the number of theme K by ourselves. If the number of topics is too large, the theme is not representative, and if the number of topics is too small, some words will be repeated in multiple topics. To solve the problem that there may be too much or too little topic setting, this paper selects a non-repeating keyword as a candidate word under each topic on the basis of extracting the initial keyword by using the LDA model. Word2Vec is used to quantify words and extract words similar to candidate semantics. Determine the similarity of the initial keywords, as shown in Table 2.

Analysis of manufacturing advantage standards based on representative key words
Co-occurrence network centrality analysis. According to the frequency of 20 representative keywords in Table 3 in 207 manufacturing related technical standards, the standard frequency matrix of 189×20 is formed by deleting the standard (8 items) with keyword frequency of 0, Import the matrix into SPSS, Calculate the Euclidean distance between the standards, Form 189×189 standard square array. Import the standard square array into UCINET, Drawing the network co-occurrence diagram using NETDRAW module centrality, Visualizing output, as shown in figure 2. By using the UCINET software analysis, the manufacturing standard center index is 12.4. From figure 2, we can see that the utilization degree center degree analysis cannot effectively judge the core standard.
(2) Identification of core manufacturing standards. In Figure2 the number of nodes is large and the core advantage of nodes is not obvious. Therefore, adding "M-core" can eliminate the connections and nodes with low correlation degree, and finally obtain the highest connection strength node and determine it as the core standard. As shown in figure  3, the typical core standard in the diagram is shown in Table 4. The core standards in the manufacturing field mainly focus on the front-end manufacturing process, including the characteristic standards of materials and auxiliary materials required by manufacturing enterprises; the energy consumption standards and quality inspection standards of each link in the production and processing process of enterprises; in addition, it also includes the data exchange standards between enterprises, so as to help enterprises realize efficient and reliable data exchange, improve business efficiency and reduce operating costs, as shown in Figure 4 . This standard helps to better understand the contents of loop inspection and provides technical methods for loop inspection, including document inspection, visual inspection, functional inspection (using standard technology to test all components), and gives specifications for technical means to solve errors when errors are found. The following is the first example of this standard Standard 158-Industrial automation systems; product data display and data exchange In order to improve the operability of the data in the automation system, the standard specifies the adaptive testing technology of the automation system and the data display, exchange and storage technology Standard 167-Energy consumption per unit production Set the energy consumption quota standard of comparable unit output of products and standardize the energy consumption detection technology and energy consumption management technology of enterprises Standard 192-Material Auxiliary Material Standard The standard specifies the design and production techniques of materials and auxiliary materials required in the manufacturing process, and the application and inspection techniques of materials

Manufacturing Standard Opportunity Identification and Trend Analysis
(1) The core area is determined based on the standard scatter diagram. The location of the red core standard in the scatter plot is extended to the area with low standard density and sparse distribution, and the surrounding standard of the connecting area forms four polygonal areas surrounded by yellow lines, that is, the technical standard blank area. This paper uses the Python PCA module to reduce the dimension of frequency square array (189×189). By calculation 189 standard scatter diagrams are drawn, as shown in Figure5.  (2) Technical standard opportunity identification based on technology-effect bubble diagram. Based on the technology-efficacy diagram, this paper determines more detailed and clear technical standard opportunities. First, according to the 20 representative keywords determined in Table 3, the corresponding words of technical process and functional effect dimension are determined, as shown in Table 5. hygienic protection,data processing,product,energy consumption,quality,automation,clean,reliability Secondly, using the technical process dimension as the horizontal coordinate and the functional effect dimension as the vertical coordinate python drawing the technical-efficacy diagram, as shown in Figure 6, the bubble size represents the frequency of the cross coordinate and the vertical coordinate representative keywords in 189 standards.
Finally, based on the technical efficacy diagram, the existing advantage standards of blank area are expanded, optimized and combined to determine the development opportunities of future manufacturing technical standards, as shown in Table 6. Identification of blank areas based on core standards The region includes cleaner production, energy consumption, automated integrated systems environment and energy efficiency standards in different manufacturing industries This area includes data generation standard, data transfer standard, data processing standard, control system and information system integration model standard, integrated system application implementation specification and dynamic server evaluation specification In order to improve the operability of the data in the automation system, the standard specifies the adaptive testing technology of the automation system and the data display, exchange and storage technology

Conclusions
Based on the standard literature, this paper designs a systematic method to identify the technical standard opportunities of manufacturing industry. The LDA model is used to determine the initial keywords of manufacturing technical standards, and the similar words Word2Vec to determine the initial keywords are used to determine the five technical standards and 20 representative keywords of manufacturing materials and equipment, environmental protection and safety, information and utilization, operation process and technical quality based on the LDA model and Word2Vec to ensure the comprehensiveness and objectivity of text data mining. Secondly, based on 20 representative keywords, the combination of centrality and "M-core" is used to remove weak connection and isolated nodes, and the core technical standards of manufacturing industry are identified. The Python PCA module is used to reduce the dimension of the frequency square array, draw the scattered plot, and determine the ecological environment--the standard blank area, data transmission and integration, automation and interconnection, and the inspection and adoption of new materials in combination with the core technical standards. Finally, the technology efficacy chart is used to identify the future technical standard opportunities of manufacturing industry, such as green production standard based on material and process, systematic digital management standard in processing and inspection stage, resource connection standard across enterprises, front-end automation standard, and technical standards for the development and application monitoring of testing measuring instruments.