Semi-empirical Theory of Distributions of Users of Social Online Networks by the Number of Contacts

Based on the empirical analysis of contacts in the social network VKontakte, quantitative patterns are established that describe the distribution of users of social networks by the number of contacts (the number of friends). The parameter entering into the differential equation, whose solution is the proposed distribution, makes sense, similar to the Dunbar number and approximately coincides with it in magnitude. The fact that parameters characterizing a given distribution, depending on the size of the communication space in different cities, has been shown. However, the basic parameter that characterizes the fractal dimension of the communication space remains constant for all cities studied.


Introduction
The study of the communication space, formed by the rapid development of telecommunication technologies, is of considerable interest in several aspects.
Thus, a significant part of the campaigns for the promotion of goods on the market are currently conducted in social online networks [1][2][3][4][5], and such networks are already being used not as an auxiliary tool for campaigning, but as the main one [6].
Social online networks linking a large number of people are an ideal platform for carrying out various information impacts of a mass nature, including in the context of information wars [7][8][9].
At the same time, modern telecommunication technologies make it possible to receive online education, which creates a number of advantages (availability of university education courses around the world, saving time for moving, convenience for people with high demands) [10][11][12]. Mass open online courses (MOOK) are beginning to gain popularity [13][14][15].
The consistent use of social online networks for various purposes requires their adequate theoretical description at a quantitative level.
In this paper we analyze the distribution of users of social online networks by the number of friends (mutual contacts) and a model describing this distribution is proposed.

Experimental
Data used in the work were collected on the basis of direct collection of information on the number of friends of users of the social online network "VKontakte".
A sample of users residing in the following cities was randomly created: Almaty, Moscow, Kiev, Pavlodar, Novosibirsk. Counted the number of friends each user has. Based on these data, the dependencies shown in Fig. 1 -5 (points). The abscissa is the number n, which are at the center of the intervals on which the axis is divided. The scale of the partition was chosen unevenly. The ordinate is the ratio of the total number of users, whose number of friends fit into this interval, to the extent of this interval. This construction approximately corresponds to the density of the distribution of users by the number of friends.
It can be seen that the curves obtained for various cities in which the Runet is actively exploited have a similar character.

Theoretical model
The distribution of the number of users by the number of friends in the general case can be found on the basis of the following system of equations where n j is the number of users whose number of friends is j, w j,i is the frequency of the formation of a new connection between persons having friends j and i,  is the lifetime of the network user (the actual time of use of the resource).
Assuming that the functions under consideration with a change in the number j vary slowly, we can proceed to the continuous form of writing equation (1). Replacing the discrete numbers j and i by continuous variables x and y, we can write Applying the expansion in a Taylor series up to quadratic terms, we have From this we obtain that for the stationary case where It is essential that the functions entering the multiplier for the function f (x) in the expression (4) are determined by the entire profile of the distribution under consideration.
Expression (4) allows us to interpret the establishment of an equilibrium distribution of users according to the number of friends through a stream directed towards their increase. This, in particular, allows us to use heuristic considerations when finding the function K (x).
These considerations are as follows. The literature currently uses the Dunbar number [16][17][18][19], which represents the maximum number of permanent social contacts (links) that one person can support. Initially, this number was obtained when studying the influence of the size and development of the neocortex on the size of the monkeys flock [16][17], but now it is increasingly used to describe the communication space [20][21].
It can be assumed that the Dunbar number describes the potential causing the flow in the positive direction along the x axis in the model under consideration. If the number of friends of this user is less than the Dunbar number, then one can expect that the derivative in formula (4) will have a positive sign (and vice versa).
The simplest entry satisfying this requirement has the form where the factor  takes into account the fractal nature of the communication space.
This character of the communication space follows from obvious considerations. The formation of communication between two persons is facilitated if they have a common friend, i.e. there is a certain chain connecting them. Inclusion of such chains leads to a power-law dependence in formula (6).
Equation (6) is of first order and is easily integrable, we have where C is the integration constant.
where the normalization factor A is introduced into the notation,

Comparison with experimental data
A comparison of the experimental results with the theoretical calculations based on the proposed model is presented in Fig. 1-5.     The theoretical curves (solid lines) in these figures were obtained on the basis of the model described above. It can be seen that there is good agreement between the experimental and theoretical data.
The control parameters that define the theoretical curve were determined on the basis of experimental data using the method of least squares. The numerical values obtained, which are used to construct the theoretical curves presented in Fig. 1 -5, are summarized in Table 1.
It can be seen that analogues of the Dunbar number, obtained on the basis of the analysis of the presented dependences, fit within the range (100-230) indicated in the literature data [16][17][18][19]. It is also seen that the analog of the Dunbar number for Moscow proves to be much higher than for Almaty and Novosibirsk, which could be expected from general considerations.
The most significant result is that the fractal dimension of the communication space is the same for all three cities, with a value of 2/3 with high accuracy. It can be assumed that this value reflects some fundamental features of the formation of the communication space.

Conclusions
Thus, on the basis of heuristic considerations and literary data on the nature of the Dunbar number, it is possible to propose a simple semi-empirical model that adequately describes the experimental distributions of users of social online networks by the number of friends. This model is based on the assumption of the fractal nature of the communication space. This assumption seems justified also because the fractal dimension obtained for various cities in which Runet is actively used remains with a high accuracy constant. The next step in the study is to expand the sample to more cities, and also to test for another segment of the Internet (not Runet)