Research on stock similarity and community division based on user attention sequence

We conduct research from the perspective of user groups and analyze the differences in the users' attention and posting order in different time periods to vectorize stocks and build relationships from the generated vectors. This provides a new perspective for the complex network construction and community division of network public opinion space. The experiment result show that we can get the community division consistent with reality using our model.


Introduction
In practical society, there is a mutual connection between things, this kind of connection also hides a certain regularity, the use of complex networks can be simple and effective description of the interconnection between things.Therefore, many researchers have used complex networks to describe this connection in the financial stock field, and have done a lot of research.Current researches mainly study the relationship by the stock return, which can not describe the interaction between stocks well.In addition, the group behavior of the stock is also the focus of investors.In different periods of the stock market, the performance of group behavior can help investors to better grasp the dynamics of the market.On the basis of establishing the stock network, the community detection can make a better summary and analysis of the group behavior.
Complex networks help to understand the interactions between the various elements of a complex system and how this interaction affects the overall properties of the entire complex system [1,2,3,4], which has given rise to complication.In recent years, a large number of studies aiming at the statistical properties and topological properties of networks.Through a large amount of research, most of the complex networks have been found to be different from the properties of random networks, such as the "small world" [5,6], scale-free distribution [7] and community structure [8,9,10,11] , Rich club [12,13,14] and so on.
In this paper we carry on the experimental verification and analysis to the stock network, and finds out that the similarity between the stocks has the normal distribution in different periods.

Related work
The first application of complex networks in financial markets was the minimal spanning tree stock network model constructed by R.N.Mantegna [15].Later, in 2004, Newman used the correlation coefficient to establish a stock network to obtain the shortest path between stocks and analyze the correlation between the non-adjacent stocks in the network.J.P.Onnela [16] developed a stock network by gradually adding edges to the network in order of strength of relevance, and studied the stocks of the New York Stock Exchange.A. Vizgunov [17] selected the data of Russian stock market from 2007 to 2011, calculated the correlation between stock prices, constructed the stock network and found that there was a strong connection between a large number of stocks and the largest faction in the Russian stock market.A. Namaki [18] studied the stocks of the Tehran Stock Exchange in Iran and found some unique topological properties of networks under different thresholds, that is, scale-free and power-law distribution.In 2010, Tes CK used the stock's closing price index to establish the correlation coefficient, used the correlation coefficient as the weight to analyze the relationship between stocks and studied the community structure [19].Domestic use of complex networks to model the stock market started later, but in recent years still maintained a high degree of research.For example, Huang [20] take the stocks of China's SSE 180 Index and the Shenzhen Stock Exchange 100 Index as the research objects, and used the maximum filter and minimum spanning tree (MST) methods to model the stock network.From the network's perspective and viewpoint analyse the impact between stocks.Yang [21] take 284 stock daily return data of 13 industries in Shanghai and Shenzhen 300 constituent stocks as the experimental subjects, and used the stock as the node to calculate the correlation coefficient between stocks and construct the stock daily yield network model, and examined the small-world and scale-free characteristics of the network and further analyzed the statistical characteristics of the stock network structure.Chen [22] used stochastic matrix theory and correlation matrix decomposition method to analyze the network structure characteristics of stock and collective behaviors in Shanghai and Shenzhen stock markets.Wang [23] put forward a hierarchical node clustering algorithm based on influence calculation model, introduced the definition of node's activity and influence and completed the social division of stock network.stock in the attention sequence q S of the q-th user q u can be inferred from the context in which the stock appears, ( | )

Stock vectorized representation model
, where context is a context- aware stock within a m-sized window of j j t stock , then we can turn the attention sequence representation problem into a maximum likelihood probability., It can be further transformed into the following formula, (2) where m is the size of the context window, and the larger the value of m is, the more data will be entered into the model to make the model more accurate, but the training time will also increase greatly.

Experiment
In the period of 2014/10/01 to 2015/05/31, 200 non-stop stocks in different sectors of the SSE Composite Index were selected, they can be found in table 1.We crawl the user group concerns including posting and reply sequence on network public opinion space during 2014.7 to 2015.7.We get the attention vector of the 200 stocks that have not been suspended in the above time period.
We use the formula (3) to calculate the similarity of each stock at different time periods and obtain the time series of similarity.We make contour heat maps for the experimental results.The results are as shown in figure 2. Among them, the darker the red, the higher the similarity.The darker the blue, the lower the similarity is.Each row in the chart represents a stock, each column represents a day.It can be seen that in the figures a, b and c, the warm part is increasing, that is to say, in the stationary period, the start-up period, and then during the high tide period, the similarity of stocks gradually increases.During the collapse in figure d, the relationship between stocks is relatively more consistent.
Using the formula (3), the statistical analysis of the similarity results is shown in figure 3. It can be seen that the similarity of stocks in different time periods basically accords with the normal distribution.Similarly, we set the proportion of similarities between the shares of stock to the sum of the similarities of the attention vectors of all the 200 stocks as  ,  From figure 3, we can see that the first 10% nodes with the highest similarity between nodes form our complex network.So we choose = 0 .Then we use AP algorithm to cluster stock similarity.Figure 5 shows the result.2. Community Center and the section where it is located.
Table 2 shows the statistical result of the community division.Taking the time period from 2015.6.16 to 2015.7.28 as an example, there are 132 stocks with 600016 as the center of the community, among which 152 stocks have the strongest relationship with bank shares, accounting for 75% of the stock.We find that most of the stocks in the stock market during the period of collapse are the banking sector, which means that there is a strong relationship between bank stocks and other stocks.And the actual situation is also the same, in the 2015.6 during the collapse, most of the stocks are plummeting energy outflow status, and MATEC Web of Conferences 189, 10022 (2018) https://doi.org/10.1051/matecconf/201818910022MEAMT 2018 large-cap stocks such as bank shares and real estate have repeatedly led the entire stock market during this period.In other time periods, it can be seen that (a) the distribution of nodes in the time period is more even, while the distribution of (b), (c) is more concentrated, indicating that in the stationary period, the relationship between stocks, and during the startup and soaring period, it is more concentrated and more in line with the actual situation.

Conclusion
We construct the complex network of the stock market from the perspective of the group's awareness activity on the network using the embedding based model.It can be seen from the experimental results that the social division result is in line with the people's division of stock in the real society.Network public opinion information is not only in the East Money Post Bar considered in this article.In future research, it may also need to consider the information contained in Weibo and WeChat.We may need to include a large range of other transaction-related data and be subject to a number of realistic trading rules.

Fig. 1 .
Fig. 1.Model structure where represents the network model and represents the characteristic representation of the i-th stock.

Fig. 3 .
Fig. 3. Network public opinion space stock attention vector similarity distribution.

1 Fig. 4 .
as the threshold to construct our network.The results of the complex network in the network public opinion space are shown in figure4.(a) 2014.7.2-2014.8.12 (b)2014.11.20-2015.1.7(c) 2015.3.11-2015.4.27(d) 2015.6.16-2015.7.28 Network public opinion space complex network construction results in different periods.

Fig. 5 .
(a) 2014.7.2-2014.8.12 (b)2014.11.20-2015.1.7(c) 2015.3.11-2015.4.27(d) 2015.6.16-2015.7.28 Community classification results.Table The stocks in each user's attention sequence over a period of time are in some relation to their contextual concern stocks, the probability of the j-th stock constitutes a group of interest series S , the attention of the stock in the set S constitutes a non-overlapping set of stocks k V STOCK stock stock stock  , k is the number of non-overlapping stocks.