Semi-markov model of processing requests to the cloud storage

This paper presents a semi-Markov model as an important part of the computer-aided manufacturing and a modular system of cloud storage, which affects the functioning of the whole process. The residence times of the system in states and the probability of system transitions are determined. There is a stationary distribution of the embedded Markov chain. The residence times of the system in states with allowance for repeated hits are determined by the theorem on the distribution functions of the residence times of the system in states with allowance for repeated hits. When using the trajectory method, the distribution function of the time for the complete processing of the read request by such a system is determined. A comparison of the expectation time complete processing the read request obtained in this study and obtained by the known formula for determining from the literature the expectation residence time in the subset of the system states.


Introduction
Computer-aided manufacturing today is a complexstructured multi-purpose hardware and software systems operating on a wide range of problems with the use of advanced information technology and advanced software and hardware tools, to effectively carry out operational planning and management of the whole enterprise as a whole.At the same trends in the development of such systems is largely due to the transfer of software geographically distributed industrial companies by means of Internet communications in the virtual space.The introduction of the concept of virtual enterprise (extended enterprises) allows to provide technological support for remoting professionals and individual units of mechanical assembly production at the stages of design, production, marketing and other stages life cycle of products.
One of the most promising information technologies in this direction is the technology of cloud computing.Cloud computing, according to [1], provide a model for convenient network access on demand to the total pool of configurable in computing resources (networks, servers, storage, applications and services) that can quickly identify and provide with minimal management effort or minimal intervention on the part of the supplier of cloud services [2].The introduction of cloud computing technology has some significant advantages [3][4][5][6]: the enterprise by eliminating the need to invest substantial sums to machine tool making, purchase of expensive software (ERP, PDM, CAD / CAM / CAEsystems); Economy of computing resources; Possibility of mobility of workplaces; Scalability, which increases the efficiency of computing resources through the use of virtualization technologies; saving on maintenance of the IT department, these functions are moving to outsourced cloud providers.
One common use for this cloud for industrial automation is the placement of the cloud of part or all SCADA-system (Supervisory Control and Data Acquisition).Cloud technologies are used by SCADA applications in two main ways: 1) the SCADA system is installed locally, directly connected to the control network, and also sends information to the cloud where it can be stored and distributed from there.2) the SCADA system is fully deployed in the cloud and remotely connected to the control network.The data received from these systems in a network cloud storage for later use MES-systems (Manufacturing Execution System) and other industrial automation systems.It should be noted that the volumes of transmitted, processed and stored information are very significant and can be attributed to the Big Data.In this connection, the problems associated with determining the performance of these systems become particularly topical.We also note that mathematical models allow not only to determine productivity, but also to indicate ways for increasing it.
The primary characteristic processing systems and data storage is waiting for the response to the request or the response time.Given that the response time is a random variable that depends on a number of factors different for each particular implementation of the system, the definition of this time is not an easy task.As part of further studies, the inventors have focused on one of the technologies of data storage systems − block storage [7] as the most widely used for industrial automation systems.
The purpose of this paper is to determine the distribution function (FF) of the response time of a block cloud storage system to a data read request.Consider a block system of cloud storage of general data.This system works as follows.Modular storage system (NAS) provides computer systems access to a block-level storage volumes.In this environment, the file system is created on the basis of computer systems, and access to the data is carried out through the network at the block level.A block storage system consists of one or more controllers and a storage system.In its turn, the controller storage system consists of three key components: a client interface (external ports and controllers), cache memory and the server part.The structure of the block-type storage intelligent system is shown in Fig. 1, a detailed description is given in [7].
The Input/Output request is transferred from the computer system via an external port and processed using the server's cache memory.Read request can be served directly from the cache, if it found the necessary data to the user.In modern storage systems, the client interface, cache memory and server part are usually integrated into a single panel (which is called the storage processor or storage controller).
The client interface provides data exchange between the storage system and the computer system.It consists of two components: external ports and external controllers.Typically, the client interface has redundant controllers for high availability, and each controller contains several ports that allow multiple computer systems to be connected to an intelligent data storage system.
Fig. 2 shows the read operation using the cache memory subject to the availability (a) and unavailability of (b) in the data cache.

Fig. 2. Read using cache operations
When the computer system sends a read request, the storage controller reads the RAM label to determine the availability of the necessary data in the cache.If the requested data is found in the cache, this is called a hit in the read cache or hit, while reading, and the data is sent directly to the computer system without performing operations in the internal storage.This provides for the computer system a short response time (about milliseconds).If the requested data is not found in the cache, then this is called a cache miss.In this situation, the data must be read from the storage system.The server part gains access to the appropriate storage device and retrieves the requested data.The data is placed in the cache memory, and then sent to the computer system using the client interface.Cache misses increase the response time for Input/Output operations.
The performance of reading is measured by the hit rate of read operations or the frequency of hits, which is usually expressed as a percentage.This ratio is the ratio of the number of read hits to the total number of read requests.The higher the hit rate of read operations is, the higher the performance of these operations is.
It should be noted that the response time also depends on the reliability of data transmission networks.
To simulate storage block type is proposed to use trajectories method given in [8].This method allows one to obtain an exact solution of the Markov recovery equations for a system with a discrete phase space of states.

Formulation of the problem
We pose the problem in the following way: Distribution functions (DF) ( ) the cache memory and the probability p of finding the desired information in the memory are known.Also the probability of finding the data in the cache memory of the system p is know P.
When building the model it is assumed that when a failure occurs in the data transmission channel of the user performing a read request is restarted.
In the current study, it is necessary to determine the time distribution function for the complete processing of the read request by the specified system.

Building a model
The functioning of the system is described by a semi-Markov process (SMP) ) (t ξ with a discrete-continuous phase space of states [9,10,11,12,13].We introduce the following set E of Semi-Markov states of the system: Let us decipher the meaning of the state codes: S0 -the request is executed (instantaneous state); S1 -a request from the user is received (or when the query has failed in data transmission channel) the reexecution request is beginning; S2 -searches for the data in the cache memory modules; S3 -the data in the cache memory is not found, a request is made to SDS (or during a query to SDS, the failure occurred in the data channel), the process is beginning with the start request.
Count system states is shown in Fig. 3.
where ∧ is a sign denoting a minimum of random variables.
Then the DF of the times of the system stay in the states are equal: We describe the probabilities of transitions of the embedded Markov chain (ECM) (1).
Using system (1), we write the system of equations for determining the stationary distribution of the embedded Markov chain ) (x ρ [14,15,16]: We represent the set of all states in the form: E describes the process of servicing the read request.
To determine the DF complete request processing time for reading system it is used the method of trajectories [8].For this we use the theorem on the DF time the system spends in the states considering reentering [8].
Define the trajectories of the system's output into a subset { } With sequential convolution DF time of system stay is defined in each path: where * is the sign of the convolution operation.
And also the probability of finding data in the cache memory of the system p=0,4.
Let us compare the values of the mathematical expectation of the function (4) obtained by us and the mathematical expectation determined by the expression [14]: The mathematical expectation of the distribution function obtained by us is 1.7037037 sec.

Conclusions
A comparison of the values of the mathematical expectation of the time for the complete processing of the read request by the data storage system determined using the distribution function obtained by us and obtained using the formula given in [16] showed a complete coincidence of the results.
The results allow to determine the response time for reading by the cloud storage system request threads cloud storage, which allows to evaluate the impact of the information system performance in the adaptability of the process and, if necessary, to make a decision to increase the speed of the system.The resulting simulation of the distribution function allows the use of this model in the construction of models of more complex hierarchically subordinate information and production systems.

Fig. 3 .
Fig. 3. System graph.Times of stay 3 2 1 0 , , , θ θ θ θ of the system states S0, S1, S2 and S3 are equal to: residence time distribution function of the system in the states 21 and 20 and with the DF repeated hits by the formulas (1):

Fig. 4 .
Fig. 4. Type of DF of full read request processing time.