Implementation of micro application storage with high reliability based on Oracle 12c

This paper proposes the implementation of data storage structure with high reliability based on the characteristics of Oracle 12c. On the basis of micro application platform, the main advantages of data structure works out active-active or multi-active problems of hardware storage device and ensures that the business can still be able to use data source to carry on data manipulation under the circumstances of one or multiple data sources corruption, so as to guarantee the whole business without interruption. This paper makes detailed introduction of each module in the system structure, conducts brief description, comparison and analysis on disseminating algorithm of data manipulation and comparison algorithm of the same data source, and carries on detailed proof and explanation of the use of various algorithms in the actual use procedure.


Introduction
Micro application development platform based on cloud computing has great significance to large-scale enterprises. First of all, this application can enormously improve the speed of enterprise software development and reduce the development cost of enterprises. Micro application platform can provide unified safety protection and deployment for all the micro applications when using the technology similar like PAAS. Secondly, this application reduces the threshold of software development. The development of micro application mainly adopts the method of graphical interface dragging. The staff who are weak in basic programming can also complete a majority of development work of micro application. In conclusion, micro application platform is able to make swift development and use for the content in the work, which greatly promote the supporting degree of informatization work in the daily work.
Under this situation, the realization of storage with high reliability for the micro application background is an extremely important link of achieving micro application development platform. Firstly, this represents the data security, which is crucial for the enterprise. Secondly, this represents an important link of reliability and unremittance of enterprise micro application business.
According to the new characteristics of Oracle 12c, this paper designs a new micro application storage scheme with low cost and high reliability in allusion to the micro application development platform. This method develops the idea of connecting one micro application to multiple databases at the same time, in order to ensure no interruption of the business and no data loss when one of the database or storage in the database is damaged. In the meantime, this method guarantees the reliability and efficiency, carries through brief description, comparison and analysis on disseminating algorithm of data manipulation and comparison algorithm of the same data source and draws the conclusion..

Implementation of storage mechanism with high reliability based on micro application
The reliability of storage at present mainly relies on the reliability of Oracle itself as well as the storage hardware. However, for the micro application system, based on the large number of applications and the more responsible overall situation, the high cost caused by using rac and multi-active of storage hardware is unable for the enterprise to afford. Therefore, according to the theories of high performance and low cost, this paper aims at characteristics of micro application, completes the storage mechanism with high reliability and accomplishes data multi-active from the perspective of software.

Multi-source data entity
This system structure achieves data source of micro application through using characteristics of Oracle 12c including multi-tenant as well as multiple pluggable databases. Each micro application can correspond to multiple data sources and has priority ranking to these data resources. At the same time, the priority of each data source can not be the same. The data source with the highest priority has the first priority in the data manipulation and has the highest credibility. When it is damaged or determined to make data loss by accident, we can cancel its priority and set it as disabled state. In this condition, it ensures the reliable and continuous operation of business data source, unless all the data resources are damaged simultaneously.

Data source interface of micro application
On the basis of realization process of micro application, this part can be achieved either in the interior of micro application or in the storage system. The interface invoked by micro application is the realization content of this part. In the process of actual use, micro application has no use for understanding the working mode at the bottom of storage system. It only requires to send the data to corresponding data source interface of micro application according to the relevant agreement of Oracle. The specific storage process can be realized by its storage structure. This interface should restrict the number of micro application connections in the process of realization so as to ensure the reliability of interface routine as well as the overall storage system structure.
The working process is to receive the operation request of data source from the source program. And then, this request will be transferred to the synchronization module of data manipulation. This module will send the data source to multiple different data to operate.

Synchronization module of data manipulation
When conducting data manipulation, simultaneously operate multiple data sources and ensure the correctness of multiple data resource manipulation. This module can also determine whether the data source is damaged, and carry on the corresponding warning at the same time. The background will implement corresponding maintenance. This module will deal with multi-source data when an exception occurs, and report the exception to the availability confirmation module of multi-source data in order to determine whether any data source is damaged.
Data synchronization can be done in two modes. The first is high efficient mode, that is, under the circumstance of completing operation responded by two data source interface with highest priority, it gives feedback of successful operation without waiting for the success from other data sources. In this situation, it requires the operation cache library of each data source to ensure that all the operations can be realized in all the data sources. The data source can achieve the fast response through this method while the disadvantage is that if it has too many operations in a short time and the number of operations exceeds the rage of buffer that can contain, data in other databases will not be reliable in addition to the two databases with highest priority. The second is the high reliable mode, that is after waiting for all data source interfaces to have the same feedback, it will give feedback of operation results.
The high efficient mode is suitable for the cases that have less operation, long single operation time and great differences of operation time of various data sources. This mode can provide the fastest response time and have excellent execution efficiency under the circumstance. Mode with high reliability is suitable for most cases while its problems is that if there have great differences of operation time among various data sources, the overall execution efficiency is relatively low.

Creation and change module of multi-source data
This module is responsible for creating multiple data sources in the initial situation, and creating data sources when adding data sources later. This module ensures the same of all the data dictionaries. This module will record common creation statement of all the data sources. The modification and operation of data dictionaries of all the data sources are required to record here every time, so as to ensure that the overall structure of newly added data source is the same as the original data source and keep fully consistent of all the data sources. In the meantime, this module can realize the operation of data dictionary for multisource data to ensure the operation success for all the databases.
This module is mainly used to add the data source and synchronization of data source in the aspect of data dictionary so as to ensure the same of databases.

Same multi-source data as well as availability confirmation module
This module is responsible for regularly checking whether the data is the same among multi-source data and recording the same time points of each data source, which is beneficial to the recovery of the whole data.
There are three methods to compare whether the data is the same. The first method is to record the time of adding or modifying the operation in each data, and then to compare whether these operations have been realized in all the data at fixed period. If all the operations have the same results(success or failure), the data is synchronized and the overall system is normal. This time point will be recorded.
The second method is the sampling comparison. When the data is large, the similar reliability of the two data sources can be compared through the ways of history memory of data manipulation, customizing important or frequently changed table or random method. When the similarity exceeds a threshold, it can be recognized that the two data sources are the same.
The advantage of the first method is to ensure the same of data sources, which is suitable for the condition where there is less data. The second data method is suitable for the condition where there is more data. Under the circumstance of setting corresponding threshold and reaching the satisfactory similarity such as 99. 999%, the two data sources can be affirmed to be the same. The sampling method ensure its correctness by carrying on sampling for the operational data in the data logging area. For more details of this part, please see the chapter three.
At the same time, in the process of confirmation, this module will affirm whether this module is available for the data source. If there has difference between data source with low priority and data source with high priority, the low-priority data source is determined to be invalid. The operation and maintenance staff will deal with it, create a new data source and carry on copy and replication.

Data logging area
It creates and changes the module synchronization with multi-source data, records data writing for each time as well as whether the operation result is successful. This module is an important and base module in the system structure. The synchronization module will write-in the same multi-source data and availability confirmation module.
The reliability of the data source and the completeness of data synchronization have extreme relation to this module. This module should record at least the time of each data source operation, operation content, whether the operation is successful, etc. At the same time, this data source will record time point when the data synchronization is successful, which is used for recovery and synchronization of different data sources.

Data copy and replication module
In the process of storage, when it requires to add data sources due to the reasons including damage of data sources, this module is responsible for increasing data to the data source. This module ensures to confirm the copy process of each module. The history memory will be extracted from data logging with the highest priority in the process of module copy.

Performance analysis of data distribution
First of all, we can analyze how many data sources can guarantee the same data under the condition of the same storage for a micro application.
We can know from the above formulas that only when the difference among the more time-consuming data execution process and each execution process is minimum, the system structure will have the maximum execution efficiency.
If the data manipulation is synchronized by adopting high efficient mode, after two data sources with the highest priority have finished synchronization, the rest of data sources do not make synchronization. Therefore, if the time consumption of the remaining data sources has time gap compared with these two data sources, the time will be accumulated. So we define r1, r2. . . rn as the time of data manipulation for each data source, r as the time of most time-consuming data manipulation and r as the the whole free time of system structure. Under the condition of high efficiency, the most time-consuming time requires a=r-max(r1, r2)-p to catch up with the whole time of system structure. If this value continues to increase and does not decrease, it can lead to the problem that the most timeconsuming data source can not catch up with the data source.
It can be known from the above formulas that if the time slot among different commands is less than the difference between such data source and data source with the highest priority and between such data source and data source with the second highest priority, the data source will never be synchronized with data sources with the highest and the second highest priority.

Performance analysis on the same data sources
The first method is the overall comparative method, that is, to make a comparison between the data items involved in all data manipulations except query and the record of whether the operation is successfully executed. If the comparison between two of them is the same, the two data sources contain the same data.
The second method is the sampling comparative method, that is, to make sampling from all data manipulations except query, and then to make a comparison between the data items involved in the data manipulations and the record of whether the operation is successfully executed. If the comparison between two of them is the same, the two data sources probably contain the same data.
We assume u as the comparative time of each data, i as the data items of data design, o as the average value of data item for each data manipulation, t as the rest time for each data manipulation to be processed and compared and z as the proportion of sampling to the whole data when using sampling comparative method.
We can know from the first method that the cost time is t1=i*u*o+t. Since u and t can be regarded as the constants that usually do not increase with the increase of i in most circumstances, the complexity of cost time for the first method belongs to O(ou)level. As a result, if the value of o is larger or its value is related to u, for example, each manipulation will increase a data comparison statement, the time complexity is high and the resources required by comparison are larger.
The time consumption of the second method, sampling comparative method is t2=z*u*o+t. According to the classic theory of probability, when the data size is bigger, if the comparative results are totally the same, the probability of same data sources can be determined based on the size of z value.

Summary
This paper puts forward the realization of a data storage structure based on the characteristics of Oracle 12c. The main advantage of this data structure is that based on micro application platform, it solves the active-active or multi-active problems of hardware storage devices through the way of software and ensures that the business can still be able to use data source to carry on data manipulation under the circumstances of one or multiple data sources corruption, so as to guarantee the whole business without interruption. This paper briefly introduces each module of the system structure, makes detailed description, comparison and analysis on disseminating algorithm of data manipulation and comparison algorithm of the same data source, and carries on detailed proof and explanation of the usage environment of various algorithms in the actual use procedure. This paper uses a software with low cost to deal with multi-active problems of hardware with high cost, which aims to use low-cost hardware to provide storage experience with high performance.