Overlapping Community Detection Algorithm Based on the Law of Universal Gravitation

Community mining has been the focus of many recent researches on complex networks. In this paper, the authors proposed a community mining algorithm based on Universal Gravitation principle (UGPCA). This algorithm based on the Law of Universal Gravitation is used to add the quality into the information of the nodes, and measure the weights of nodes in the networks. Each node will generate a gravitational field and have a force to the other nodes which can be reached. At last, the authors tested the UGPCA on real-world networks. Experimental results show the UGPCA has the ability of community mining and also has the ability to find the overlapping nodes.

fields, a community discovery algorithm based on topological potential is proposed in [6]. Shi et al [7] proposed a method based on potential function of data field and improved the robustness of Label Propagation Algorithm. Cheng et al [8] researched the relationship between diffusion dynamics on network and community structure and proposed a method to detect community structure by using network conductivity. From the point of Brownian motion, Zhou at el [9] formulated the measurement of similarity between nodes and proposed the algorithm of community detection. Based on the seed expansion and the Newtonian gravitation, Liu et al [10] proposed an overlapping community detection algorithm.
The existing algorithms based on network dynamics have good performance to detect community, whereas, they are not suitable for different networks. In real-world networks, nodes with hop 2 always have the certain influence on the current node. Based on the foundation of Law of Universal Gravitation, we take the influence of nodes with different distance into consideration and propose a community detection algorithm UGPCA. We compare the UGPCA with LMF algorithm, and ensure the algorithm is reasonable though experiment.

MODEL
Most networks could be transformed into mathematical symbols. A Undirected network could be represented by =(V,E) is the nonempty and finite set of nodes that exist in network G, V n ; V V u is the set of edges, E m . In this part, we introduce the law of universal gravitation into network, and build the model of community detection.

Law of Universal Gravitation
Law of Universal Gravitation is an important principle in classical physics, the formula is shown as follows: Its principle is that: the gravitation action exists between two particles, the size of gravitation is proportional to their mass product, and it is inversely proportional to their distance.
In the real-world network, the property of nodes always have many meanings, such as the influence of the nodes in social networks, the forward capability of the nodes in communication system, and the flow of the nodes in transportation network. Enlightenment by Law of Universal Gravitation, nodes in the network are regarded as particles, and add mass into the property of node. Thus, each node would have acting force with other reachable nodes.

Node degree
Identifying the key spreaders in complex networks and having the maximum impact on information dissemination is of significant impact on our understanding and control of spreading on networks. Node degree is a representative of local property of networks [11,12].
On the graphic, node degree expresses a number of edges which are connected with current node. The areas always have dense edges where contain the nodes with large degree. Node degree could reflect the importance of nodes in the network. In the calculation method, the calculation of node degree also has a lower time complexity. As a result, we use node degree to describe the mass of node. We can obtain the value of node degree by calculating the adjacency matrix of network. The calculation formula of node degree is shown as follows: We accomplish the cluster of nodes in the network by the gravitation among nodes. The mass of nodes are expressed as node degree, thus, the formula of force between node u and its adjacent nodes is shown as follows:

Impact factor P
For computer networks, the distance between two nodes could not be described by Euclidean distance, but be described by hop count. As for node u, the hop count to its adjacent nodes is 1. The size of gravitation is related to the distance of nodes. We introduce the impact factor P to measure the influence of force among nodes. In the network, the sum of gravitation between node u and its adjacent nodes is shown as follows: In the real-world networks, nodes with hop 2 always have a certain influence on the current node. However, nodes with hop 3 always have rare influence. We consider the influence of nodes with hop 2 into algorithm, set 1 P which is the impact factor of nodes with hop 1, and 2 P which is the impact factor of nodes with hop 2. The sum of gravitation to the current node is shown as follows: Where, V u is the set of nodes that are adjacent with node u, and ' V u is the set of nodes with hop 2 to node u.
We apply the model to the community detection Web of Conferences MATEC 01056-p.2 algorithm of complex network. Based on the description mentioned earlier, the nodes with degree are greater than 0, and they will have gravitation. Formula (5) is used to calculate the acting force to the current node which is generated by the surrounded nodes. By analyzing the relationship of interaction among nodes, we obtain the community structure and algorithm convergence.

ALGORITHM
We propose an improved algorithm UGPCA based on the Law of Universal Gravitation. In the algorithm, we consider the influence of nodes with hop 1 and hop 2, and separately set the different impact factor 1 P and 2 P . In the process of algorithm implementation, we introduce the setsV , 1 C and V r . V is the set contains all nodes in network, 1 C is the set of nodes which have joined a community, and V r is the set of remaining nodes.
Input: dataset network = G , the impact factor 1 P and 2 P of nodes with hop 1 and hop 2, the coefficient M of overlapping nodes. Where, { , ..., } Output: the number t of communities, communities ,..., 1 C C t . Process of algorithm: 1. Calculate the nodes degree in network V , and select the node u with largest degree as core node. If there are some same largest nodes degrees, we will compare the sum of degree of their adjacent nodes.
2. Divide the core node with its adjacent nodes into the community as an initial community. Add the nodes that joined the initial community into set 1 C , and add the other nodes into set V r . If there are no nodes meet the conditions to join the initial community, we will complete a community partition. If 0 V r z , we will repeat the process mentioned earlier until 0 V r .

EXPERIMENTS AND RESULTS
Experimented with the algorithm in the real-world networks and compared the effect of UGPCA with algorithm of LMF by using modularity, it is proven that the UGPCA is feasible and effective. In the experiment, we will use the dataset of American College Football network, Zachary Karate Club network and Dolphin network, and we separately test the effectiveness of the chosen impact factor, the division effect and so on.

Selection and discussion of parameters in algorithm
We will discuss the impact factor 1 P and 2 P in this experiment. First, we select different parameters in the experiment. Then, we compare the division effect of American College Football network through the parameters, and obtain the impact factor with best division effect.
In the formula (5), we set + =1 1 2 P P . Figure 2 shows the accuracy curve of node detection. From this curve, when =0.1 2 P , the accuracy of node detection reaches the best one. With the increase of impact factor of nodes with hop 2, the accuracy rapidly decreases. Thus, we can draw the conclusion that the nodes with hop 2 have certain and weak influence on the network partition.
The division accuracy with different impact factor Impact factor u

Partition result in Zachary dataset
Zachary karate club is one of the most commonly-used dataset in the research of community detection. This network contains 34 nodes and 78 edges. The network structure of karate club is shown in figure  3(a), of which, the light color nodes and the deep color nodes are respectively expressed two communities. Node 1 and node 34 are the core nodes of two communities; node 3 is the overlapping node. We use UGPCA to divide the Zachary karate club network and select the parameter =0.09

Formula of force among nodes
Newman et al proposed the modularity method to evaluate the rationality of community detection. The foundation of modularity is that the expectation of edge number in the community is higher than the expectation in a random network. In order to solve the heterogeneity of nodes distribution, Newman et al [13] proposed an improved modularity method, the definition is shown as follows: We compare the UGPCA with algorithm of LMF by using the modularity method in the formula (7). The dataset in the experiment is used in American College Football network, Zachary Karate Club network and Dolphin network. Set the parameters =0.9 1 P , =0.1 2 P in UGPCA. Table 1 shows the comparison result between UGPCA and LMF by using modularity method.
From table 1, we can draw the conclusion: In the network of Zachary and Football, the Q value of UGPCA is higher than LMF; it shows that the networks divided by UGPCA are more significant. But in the network of Dolphin, the UGPCA is weaker than the LMF.

SUMMARY
Community detection algorithm is an important point in the research of complex network. In the present work, most of algorithms based on network dynamics model are just considered as the influence of adjacent nodes. In this paper, based on the Law of Universal Gravitation, a new algorithm was proposed for detecting overlapping community. The algorithm is used to calculate the mass of nodes, and consider the influence of nodes with different hop count. On the problem of overlapping nodes, the algorithm has a standard to judge overlapping nodes. The result of experiment on real-world networks shows that the algorithm could effectively and precisely detect community.