Leveraging Renewable Energies in Distributed Private Clouds

The vast and unstoppable rise of virtualization technologies and the related hardware abstraction in the last years established the foundation for new cloud-based infrastructures and new scalable and elastic services. This new paradigm has already found its way in modern data centers and their infrastructures. A positive side effect of these technologies is the transparency of the execution of workloads in a location-independent and hardware-independent manner. For instance, due to higher utilization of underlying hardware thanks to the consolidation of virtual resources or by moving virtual resources to sites with lower energy prices or more available renewable energy resources, data centers can counteract their economic and ecological downsides resulting from their steadily increasing energy demand. This paper introduces a vector-based algorithm for the placement of virtual machines in distributed private cloud environments. After outlining the basic operation of our approach, we provide a formal definition as well as an outlook for further research.


Introduction
Cloud infrastructures and the underlying virtualization technologies are building the foundation of modern data centers.These paradigms also offer potential for reducing the energy consumption of data centers representing most of their ongoing operational costs.In this paper, we seize an opportunity to increase the energy-efficiency of data center operation by introducing a vector-based algorithm to support virtual machine placement decisions.After a brief introduction of related work in Section 2, we outline the basic operation followed by the formal definition of our algorithm.Further, we evaluate our approach and discuss the impact of migration costs in Section 4. Finally, we give an outlook on future work in Section 5.

Related work
The placement of virtual resources in modern cloudbased environments is subject of current research.A project called CAESARA [1], introduced an algorithm for the energy efficient placement of virtual machines by estimating the server's energy consumption based on the running virtual machines' characteristics.The cost of virtual machine migration operations and their energy consumption itemized by the different type of data center equipment, is described in [2].In [3], a utility is described that allows to distribute virtual machines considering the migration cost and also a basic analysis of migration cost and the impact of live-migration on the running application is outlined.A distributed algorithm for placing virtual machines in large cloud environments is outlined in [4].The basic approach is that each server knows the CPU load of the other physical servers and each server tries to comply with an upper and lower threshold for the CPU load and initiates the migration of virtual machines when these thresholds are violated.Also, the underlying mathematical challenges like the setpartitioning [5], [6] and bin-packing [7]- [9] problems are still subject of current scientific studies and research.There also exist vector-based approaches for VM placement, but these mostly focus on intra data center placement of VMs on physical machines (PM).In [10] a vector-based methodology to model VM resources and to place VMs on PMs is introduced.Another vector-based constraint programming approach is described in [11].A routing centric placement algorithm is introduced in [12] and describes a combined optimization approach for data center traffic and VM placement.Furthermore, the communication demand is focused in [13] for VM placement.In contrast to the listed publications, our approach also incorporates the use of renewable energies and their fluctuating characteristics concerning availability and pricing.

Inter-DC energy-aware placement of virtual machines
This section outlines the basic idea and functionality of our algorithm.In this context the iterative approach of the algorithm causes its complexity to be relatively low compared to bin-packing or set-partitioning algorithms.Thus, each single iteration will lead to a better overall topology.The algorithm is run continuously, though in reality a delay or pause between individual runs might be reasonable, especially, if a predefined threshold of migrations across multiple runs has not been reached.This limits the resources and management traffic generated by the execution of the algorithm.The algorithm consecutively considers the optimal placement for each virtual machine with respect to its network flows, corresponding relationships among them and connections to external clients.

Energy-efficient placement considering renewable energies
The algorithm used in this paper is a vector-based approach to optimize scheduling and placement decisions in private clouds.In this context the dimensions of the used vector space specify the characteristics of virtual resources as entities in distributed data centers.To illustrate the basic operation of the algorithm, we will just use an example with three dimensions here.The dimension ‫ݔ‬ and ‫ݕ‬ depict the geographical location of the entity and the dimension ‫ݖ‬ the availability of renewable energy sources for this data center site or, more generally speaking, location.The example shown in Figure 1 shows the data center ‫ݓ‬ ௗ representing a site with available wind energy and ‫‬ ௗ depicting a site with available photovoltaic energy.The shown arrows indicate the modification of the positional vector of the data centers ‫ݓ‬ ௗ and ‫‬ ௗ over the time span from summer to autumn.Figure 2 illustrates the computation of a destination vector.Thereby ݀ ଵ , ݀ ଶ , ݀ ଷ and ݀ ସ represent data centers and ܿ ଵ , ܿ ଶ und ܿ ଷ clients with a uniformly distributed volume of communication.The destination vector in this example will be computed for a virtual machine currently executed in data center ݀ ସ .In this case the destination data center ݀ ଶ is chosen for the migration since this data center location has the shortest distance to the destination vector ‫ݖ‬ ⃗ .The algorithm can be adapted to consider the network location instead of a geographical location, e.g., by including weights regarding the latency or other quality of service metrics between the data center and the clients.

Definitions
Let ܸ be a finite-dimensional vector space over ℝ with the dimension ‫ݒ‬ and ܲ the set of properties with ߚ : ܲ → ܸ the function to map properties of ܲ to the vector space ܸ.Furthermore, let ݉ be a virtual machine defined by the tuple ݉ = (݅ , ܲ ) with ݅ a unique identifier of the virtual machine and ܲ ⊂ ܲ the set of the associated properties.Moreover, let ‫ܦ‬ = {(݅ ௗ , ܿ ௗ ∈ ℕ ା , ‫ܯ‬ ௗ )} be the set of available data centers with ݅ ௗ the unique identifier of the data center, ‫ܯ‬ ௗ the set of virtual machines currently executed at this data center and ܿ ௗ the capacity of this data center, so it holds ‫ܯ|‬ ௗ | ≤ ܿ ௗ .The complete set ‫ܯ‬ of all virtual machines is defined as follows Of course, each virtual machine ݉ ∈ ‫ܯ‬ is restricted to only be executed at one data center for any moment in time.This means so that ‫ݒ‬ ⃗ ௫ is the positional vector of the data center ‫ݔ‬ ௗ .One more function ߚ ெ : ‫ܯ‬ → ܸ is used to map a virtual machine to the positional vector of the data center within it is executed and is defined by Also, we define ܰ as the set of network flows by ܰ = ‫ݏ({‬ ∈ ܲ, ݀ ∈ ܲ, ݊ ∈ ℕ ା )} so that ‫ݏ‬ is a property of the communication source, ݀ a property of a communication destination and ݊ the metric over a defined observation time span.Furthermore the euclidian distance for ܸ is used for the distance function ߜ: ܸ × ܸ → ℝ as follows: (1) Moreover, we define the following helper functions: x The function ߪ: ‫ܯ‬ → ‫ܦ‬ maps each virtual machine to the data center it is currently executed in and is defined by: x The function ߠ: ‫ܯ‬ → ܰ maps each virtual machine to the set of network flows with matching properties: x The function ߶: ‫ܯ‬ → ℕ ା defines for a virtual machine the sum of metrics of the associated network flows.It is defined by: x The function ߚ : ܲ → ܸ maps a property to a vector.
Respectively, for a property of a virtual machine the positional vector of the associated data center is given by:

Continuous placement algorithm sequence
For the correct operation of the algorithm at least one data center of the set of data centers ‫ܦ‬ must have spare capacity available.This means: The sequence of the algorithm is as follows: 1) We define ܶ = ‫ܯ‬ 2) While ܶ ≠ ∅: a) We choose an arbitrary ‫ݐ‬ ∈ ܶ with ‫ݐ‬ = (݅ , ܲ ) and define ܶ = ‫.ݐ\ܶ‬Furthermore, we set ‫ݐ‬ ௗ = ‫)ݐ(ߪ‬ with ‫ݐ‬ ௗ = (݅ ௗ , ܿ ௗ , ‫ܯ‬ ௗ ) and define For the case ‫ܦ‬ ௧ = ‫ݐ{‬ ௗ } we can examine the next virtual machine and start over with step 2 of the algorithm.
b) Now we can compute the destination vector We choose the destination data center ‫ݖ‬ ௗ = ൫݅ ௗ , ܿ ௗ , ‫ܯ‬ ௗ ൯ ∈ ‫ܦ‬ ௧ with the shortest distance to the destination vector ‫ݖ‬ ⃗ so that For the case ‫ݖ‬ ௗ = ‫ݐ‬ ௗ the virtual machine is already executed in the optimal data center, so we can examine the next virtual machine and start over with step 2 of the algorithm.
d) For the case ‫ܯ|‬ ௗ | = ܿ ௗ , for the given destination data center ‫ݖ‬ ௗ = (݅ ௗ , ܿ ௗ , ‫ܯ‬ ௗ ) ∈ ‫,ܦ‬ we choose the virtual machine with the lowest communication amount and move this machine to the nearest data center with available capacity: i) We determine a virtual machine ‫ݔ‬ ∈ ‫ܯ‬ ௗ so that ‫ݔ‬ = (݅ ೣ , ܲ ೣ ) which holds iii) We define ‫ܯ‬ ௗ = ‫ܯ‬ ௗ ‫ݔ\‬ and ‫ܯ‬ ௗ ೞ = ‫ܯ‬ ௗ ೞ ∪ ‫}ݔ{‬ e) Now, ‫ܯ|‬ ௗ | < ܿ ௗ , so we move the virtual machine ‫ݐ‬ to the destination data center ‫ݖ‬ ௗ .Finally, we alter the sets ‫ܯ‬ ௗ = ‫ܯ‬ ௗ ‫ݐ\‬ and ‫ܯ‬ ௗ = ‫ܯ‬ ௗ ∪ ‫.}ݐ{‬ 3) The algorithm has now examined each virtual machine of ‫ܯ‬ .After a new representative set of network flows is collected, we can start over with step 1 and a new iteration.As outlined above, the algorithm starts over after all virtual machines were processed.Due to the online/iterative nature of this algorithm new network flows, historical data or even changes in the availability of renewable energy sources will be taken into account.This means, that this approach optimizes the overall topology continuously over time.It is obvious that the temporal resolution of the available data and its rate of change has to be considered when running the algorithm in a real private cloud environment to limit the amount of resources necessary to run the algorithm.Also, oscillations of virtual machines between data centers (e.g., due to similar clients or properties) should be prevented, e.g., using a hysteresis based on former placements of the virtual machine. (

Evaluation of optimal VM placement and migration costs
The algorithm to optimize the placement of virtual machines described in the previous section is based on the minimization of operational characteristics of the VMs.In the outline of the algorithm described above, energy costs for the data centers hosting the VMs and the distance between the data centers and the clients are minimized.However, regarding the energy efficiency of the approach, the additional costs for the implementation of the algorithm have to be taken into account.While the energy consumption of the algorithm itself can be limited, e.g., by decreasing its resolution using longer intervals, the migrations resulting from the execution of the algorithm lead to additional energy costs.These migrations costs can be divided in direct and indirect costs.Direct migration costs arise from the effort to carry out the migration, i.e. using compute, storage and network resources across multiple data centers and links between them.Indirect costs are formed by the consequences of the migration and impacts on the operational characteristics of the virtual machine.For example, such indirect costs can result from a higher latency for some clients after the migration.The algorithm presented above does not consider possible service level requirements regarding the latency of all clients in favor of its primary goal, to benefit from the lowest energy price.Such constraints could be integrated in the optimization by using them as additional metrics of the virtual machines.This can be described as a the problem to minimize whereas ‫ܥ‬ are the overall energy costs to be optimized, ‫ܥ‬ ‫ܯ(‬ ) the operational costs of the data centers based on the the properties and contraints of the virtual machines defined in ܲ , ‫ܥ‬ ‫ܯ(‬ ) the costs to run the algorithm and the direct (݉݅݃ ௗ ) and indirect (݉݅݃ ) costs for the resulting migration and placement decisions of the algorithm ‫ܥ‬ ‫ܯ(‬ ) = ‫ܥ‬ ‫ܯ(‬ ) + ‫ܥ‬ ‫ܯ(‬ ) .The minimization of ‫ܥ‬ is subject of several projects and related work regarding the energy efficiency and power management of data center infrastructures.As stated above, ‫ܥ‬ can be limited by increasing the interval between iterations of the algorithm and its implementation.‫ܥ‬ is primarily minimized by the defined constraints.‫ܥ‬ is influenced by the migration technique used to transfer the virtual machines' resources between data centers.Based on related work in this area, in [2] we presented results from a simulation to leverage fluctuating renewable energies in northern, southern and central Germany.Also, we implemented an extension for OpenStack to use migrations to enhance the energy efficiency in distributed private clouds.Besides leveraging renewable energies, the testbed also consolidated the virtual machines across different data center sites to lower ‫ܥ‬ .By integrating the algorithm described in this paper in this testbed and the previously presented simulation studies, also the placement of new virtual machines can be included in the optimization.This also includes the possibility to spawn multiple instances of a service provided by a virtual machine, e.g., to address the constraints regarding the latency between the service and the clients, hence increasing ‫ܥ‬ in favor of a reduced ‫ܥ‬ and resulting lower ‫ܥ‬ .This can solve situations in which the algorithm would migrate a virtual machine to a new site that is too far away from some connected clients and hence would violate predefined service level constraints (as, e.g., implemented by content delivery network and cloud providers).

Conclusions and future work
The algorithm described in this paper can be used to evaluate the use of renewable energies to enhance the overall energy efficiency across multiple data centers.Based on our previous research, we are evaluating to use the algorithm in simulations as well as integrate it in an extension for the OpenStack Nova scheduler, that we developed to leverage renewable energy sources and power management in distributed private cloud infrastructures.The scheduling extension can also consider the costs for placing or migrating virtual resources across multiple public or community clouds.Such hybrid cloud environments, can benefit from the energy efficiency and low energy prices in distant, supposedly public, clouds to lower the energy cost for workloads that are safe to be transferred to a third-party cloud provider.To benefit also from short-term fluctuations in the availability of renewable energy sources, e.g., on a daily basis, new live-migration and placement techniques for virtual resources have to be developed.We're currently working on container-based migration and service placement.This lightweight virtualization solution facilitates the transfer of the current state of the virtual resource and the underlying storage.Initial tests have shown that the amount of data that needs to be transferred is far less compared to fullsized virtual machines.However, the network virtualization, especially the data plane performance, and the effort to checkpoint and restore containers under load is still a challenge compared to existing and well-tested virtual machine live-migration techniques.

Figure 1 .
Figure 1.Example of the 3-dimensional vector space V.