An outline of the method for predicting IT vulnerabilities

Majority of the currently known quantitative models for vulnerability analysis do not allow for a comprehensive vulnerability prediction process for a selected software. The article presents the outline of the method for predicting software vulnerabilities. The presented solution is based on probabilistic properties that allow to reflect external and internal factors affecting software and determining its vulnerabilities. Also, a possible direction of further method development was described, indicating the way of improving the method with elements representing preventive measures, as a result of which it may be possible to limit or eliminate potential software vulnerabilities.


Introduction
Recent advances in information technology and spread of Internet services result in new threats emerging in cyberspace. In many cases, their natural consequences may lead to the execution of cyberattacks, which are most often outcomes of conscious exploitations of IT vulnerabilities. Among many dictionary definitions of IT vulnerability it can be generally understood as a weak point in the applied security procedures of systems, hardware, software, as well as bad system administration and usage that may cause damage to its operations or may be used by cyber-aggressors to gain unauthorized access to information or disruption of the system [1,2]. IT Vulnerability is most often defined in relation to the cyber threat, which is commonly described as any potential mechanism, event, random or intentional action that may lead to vulnerability exploitation and consequently to a potential loss [1,3]. In the last few years, many works regarding the provision of software security have been done, taking into account the fact of software vulnerabilities presence [4][5][6][7][8]. In most cases, these works focused on a qualitative approach which aims at improving vulnerability detection techniques and preventing their occurrence in the software. In contrast to those already mentioned, there are less examples in the literature describing the use of quantitative methods for vulnerability analysis, which are used in the study of software reliability in an effective and reliable manner (e.g. Software Reliability Growth Models -SRGM) [9]. Therefore, it can be assumed that a similar approach can be implemented for measuring software security.
The article presents the outline of the method for software vulnerability prediction. The solution is based on probabilistic properties which provide great versatility. These can reflect software security, which for the purposes of this work, can be understood as resistance to vulnerabilities including external and internal factors affecting the software. An example of the further development of the method was also proposed, which will allow to adapt elements reflecting the use of preventive measures and, as a consequence, reduce the number of vulnerabilities in the software.

State of the art
A quantitative vulnerability analysis process involves using methods developed to provide a useful insight to assess software security. One approach to measuring vulnerabilities quantitatively focuses on the study of a single vulnerability. It can be implemented using mechanisms which provide standardized vulnerability scores. These can be obtained on the basis of calculations performed using individual vulnerability characteristics, ultimately indicating vulnerability severity [4]. The following standards are examples of measuring single vulnerabilities depending on their type: • CVSS (Common Vulnerability Scoring System)a standard for measuring the impact of software flaw vulnerabilities and a format for communicating vulnerability characteristics [4]; • CWSS (Common Weakness Scoring System)a mechanism for testing the vulnerability severity resulting from software defects. Conceptually similar to CVSS, but there are differences in construction and the type of information processed on vulnerability [10]; • CCSS (Common Configuration Scoring System)a mechanism for measuring the severity of vulnerabilities, resulting from the use of inappropriate system configuration [5]; • CMSS (Common Misuse Scoring System)a mechanism for measuring software feature misuse vulnerabilities [6].
Other ways of measuring software vulnerabilities focus on particular vulnerability sets [9]. These include analyses covering either whole software products or their particular modules. For this purpose, Vulnerability Discovery Models (VDMs) have been developed. The general purpose of VDM is to indicate trends in vulnerability detection. Most VDM models are time function based and were created on the basis of current and historical data obtained during software operation process. With this approach, it is possible to predict the number of potential vulnerabilities that may be detected in the future. Such quantitative approach to measuring vulnerabilities of a given system can be used to determine software security metrics such as e.g. Vulnerability Density or Vulnerability Discovery Rate [9]. As a result, it can be a reasonably good instrument for software producers and users to understand security trends, allowing for an efficient security management. A brief description of selected VDM models is presented in subsections 2.1-2.3.

Alhazmi-Malaiya Logistic (AML) model
The Alhazmi-Malaiya Logistic Model (AML) is an Sshaped, time logistic VDM [9]. The whole process of software vulnerability discovery in AML model is divided into three distinct phases: learning phase, linear phase and saturation phase, as shown in  In the first phase, when the examined software starts being used, there is a gradual increase in the number of software users, including security testers and potential attackers. This way, a relatively small increase in the number of detected vulnerabilities can be observed. After some time (passing through Transition point 1), the number of users and testers increases significantly, which results in a linear increase in detected vulnerabilities -the software is then in the so-called linear phase. After that, when a newer version of the examined software is released (passing through Transition point 2), it enters a saturation phase, during which the vulnerability detection rate decreases.
The AML model is based on the assumption that the rate of change of the cumulative of all vulnerabilities ( ) N t , detected to time t , depends on the two factors: the first decreases with the decrease in the number of still undiscovered vulnerabilities in the system, while the second factor increases with time from the release of the system. Using the above assumptions, the Vulnerability Discovery Rate is given by the following differential equation: where ( ) N t is a cumulative number of vulnerabilities to the testing time t , a and b are constants, determined empirically using the recorded data. By solving differential equation (1) we obtain formula for cumulative number of vulnerabilities to the time t , as follows: where: c is a constant introduced while solving differential equation (1).

VDM for a software system using stochastic differential equation
Another VDM -model by A.K. Shrivastava, R. Sharma and P.K. Kapur, can be considered as an improvement of the ALM model, to which authors introduced additional parameters and assumed that the vulnerability discovery process in the system can be modelled as a stochastic process with a continuous state space, based on the following assumptions [11]: 1. As the research progresses, the number of remaining vulnerabilities in the system shows a downward trend.
2. There may be a system failure during its operation as it has vulnerabilities. 3. The process vulnerability elimination is ideal and during that, no new vulnerabilities arise, as a side effect. In the approach, authors have extended differential equation (1) describing the Vulnerability Discovery Rate through randomisation [12] to a stochastic differential equation, that is given by the following formula: ( ) W t is called a Wiener or a Brownian process [11].
The extension of the equation and using Ȋto formula for further transformations allowed authors to obtain the following formula for the cumulative number of software vulnerabilities to the testing time t :

Rescorla Exponential Model
The next example of VDM -Rescorla Exponential Model [13] is an adaptation of a Software Reliability Growing Model by Goel-Okumoto [14] which can be described as follows: which can be discovered in the examined software. λ is a constant rate, representing Vulnerability Discovery Rate. By integrating equation (5) it is possible to determine the formula for the cumulative number of vulnerabilities in the software to the testing time t . which can be given as follows:

A dynamic approach to measuring vulnerabilities
The idea of software security level assessment is presented in Fig. 3. ( ) R t -software security state (no vulnerabilities discovered); ( ) Q t -vulnerable state (vulnerabilities discovered in the examined software); ( ) t λ intensity of transition from state ( ) R t to ( ) Q t . By using the above symbols, software security can be defined as follows: According to literature [15,17,18,19], the intensity of vulnerability occurrence in the software is probability density of vulnerability occurrence at time t on condition that there were no vulnerabilities discovered in the software by the time t . The intensity of vulnerability occurrence in the software is described by the following formula: Let function ( ) f t be the density of vulnerability occurrence in the software at time t . By using the conditional probability definition, after performing transformations, the function ( ) f t can be written as follows: Further transformations allow to obtain the following system of differential equations [16,17,18,19]: By assuming initial conditions ( ) = one obtains the following solution to a system of differential equations (10): For the purposes of this work, it can be assumed that the probability of a cyberattack resulting from the exploitation of vulnerabilities in a certain time interval is directly proportional to this time interval and to the number of all working software instances, within which the vulnerability can be exploited. Based on the above assumption, it can be defined as follows: ( , ) P t t t + ∆ -Probability of vulnerability occurrence in the software in a t ∆ time interval; ( ) N t -number of software instances in operation, in which the vulnerability can be exploited; λ -Factor of proportionality (Intensity of vulnerability occurrence); t ∆ -time interval of software operation (cumulative time interval of software operation) Let ( ) n P t be the probability that there were n vulnerabilities discovered during time interval (0, ) t . By using postulates of a Poisson process, the system of equations can be written as follows/in the following way: By using definition of the derivative, solution to the system of equations (13) takes the following form: The probability that there may be n vulnerabilities discovered in time interval (0, ) t is described by the Poisson distribution, whereby t λ expression is replaced Thus, probability of vulnerability discovery in a single software instance is given by: where: q -Probability of vulnerability discovery in a single software instance; t -time of single software instance operation in time interval (0, ) t . Further transformations allow to determine estimator of the unknown intensity of discovered vulnerabilitiesparameter λ . For this purpose, the maximum likelihood method was used, which gives the result that can be written as follows: That allows to describe estimator λ as the average number of mutually independent vulnerabilities per time of operation of tested software instances.

Vulnerability assessment
To assess security by considering the impact of the potential vulnerabilities in a tested software, it should be verified whether a difference in intensities of vulnerability discovery obtained for two successive time intervals has changed significantly. For this purpose, let's use a statistical hypothesis test. Let's assume that software was in operation in time In the first step, intensities of vulnerability discovery in a tested software for time intervals , t t t t t t are calculated: Then, it is possible to formulate the following hypotheses: -0 H : The difference  (19) where: n -number of discovered vulnerabilities in time interval Conclusions about too frequent vulnerability discovery in the examined software (hypothesis 0 H ) or significant improvement of its security (hypothesis 1

H )
can be validated if obtained value of statistic S is between ( , ) a b , where parameters a and b are determined on the basis of vulnerability data recorded during the process of software operation. Otherwise, security of the examined software has not changed significantly.

Future development -a proactive approach
An example of improving the process of predicting vulnerability discovery in the software as described in sections 3 and 4 may be achieved by the implementation of a proactive approach that allows for the use of preventive measures in this process.
When the process of increasing discovery of vulnerabilities in the software is random, it can be assumed that the intensity of vulnerability discovery is directly proportional to the product of the vulnerability discovery intensity and inversely proportional to the number of discovered vulnerabilities which determination included learning rate, applied prevention measures, etc.. Thus, the intensity of discovery of vulnerability can be given by the following formula: S -value of statistic (19).
What is more, the probability of an increase in the number of vulnerabilities in the examined software in a specified time interval can be described as follows: allow for predicting vulnerabilities discovery, which as a result may enable to assess the software security level.

Conclusions
The ability to prevent vulnerability occurrence in software or minimizing the level of their negative impact can have a significant influence on the broadly understood security of cyberspace elements. The use of quantitative models for vulnerability discovery can provide useful knowledge to assess the security of information systems. For example, it may allow for a more accurate determination of the time necessary to obtain the required level of security as well as an indication of personal, hardware and software resource allocation to achieve this. The method proposed in the article aims to improve the ability to predict IT vulnerabilities in the examined software. As a consequence, it can be a good tool for software end-users and vendors, enabling to forecast software security trends, as well as planning a broadly understood security management process.