Data compression algorithms for sensor networks with periodic transmission schemes

. The operating state of switch cabinet is significant for the reliability of the whole power system, collecting and monitoring its data through the wireless sensor network is an effective method to avoid accidents. This paper proposes a data compression method based on periodic transmission model under the condition of limited energy consumption and memory space resources in the complex environment of switch cabinet sensor networks. Then, the proposed method is rigorously and intuitively shown by theoretical derivation and algorithm flow chart. Finally, numerical simulations are carried out and compared with the original data. The comparisons of compression ratio and error results indicate that the improved algorithm has a better effect on the periodic sensing data with interference and can make sure the change trend of data by making certain timing sequence.

space of sensor nodes by data compression algorithms. Compared with the traditional data compression algorithms, the size of wireless sensor network data compression algorithm proposed in this paper is small and the complexity is low. The data compression algorithm based on the periodic transmission model has obvious advantages like more compression ways and higher compression rate than the traditional algorithms, so it has received extensive and intense attentions in recent years [5][6][7][8].
However, some data compression algorithms based on periodic transmission model cannot guarantee the time sequence of data [9], which results in missing of many sensor data features, such as the evolution trend of sensor data along with time. In view of this problem, this paper introduces Pearson correlation coefficient and proposes a data compression algorithm which can guarantee the time sequence, especially for smooth and periodic data. Furthermore, the concept of outlier is introduced to replace the incorrect or unnecessary data caused by interference, so as to achieve better data compression effect. Finally, compile the processed data into dictionary to reduce the number of data bits, and then complete the data compression.

Periodic transmission model
In sensor networks, collecting and sending data to the next node continuously is not effective, which causes the large amount of sensor energy loss and communication resource waste. Therefore, a data transmission model based on periodic collection is proposed for sensor nodes, which allows sensor nodes to collect and clean up a series of data before sending to the next node. Because a lot of time data redundancy generated in the periodic data transmission model, data compression is an effective method to save sensor energy.
The following is a specific algorithm generated by periodic data transmission models. The sensor node collects the data in the current period, when each reading is collected, then the number of readings is added "1", and when reading is obtained, the reading vector is constructed according to chronological order as follows: where is the total number of readings for the current period, = 2 , ∈ .

Pearson correlation coefficients
Pearson correlation coefficients represent the relevance degree between the two data sets, and the range is between [-1,1]. The higher the Pearson correlation coefficient is, the higher the relevance degree is. When the Pearson coefficient equals to 1, which means that the two data sets are completely positive correlated, that is, = + , represents the vector form of the first data set in order, represents the vector form of the second data set in order, is an arbitrary positive number, is an arbitrary constant. When the Pearson coefficient equals to -1, the two data sets are completely negatively correlated. By contrast, when it equals to 0, the two data sets are completely independent.
The Pearson correlation coefficient is described as follows: where ∈ , ∈ , n is the number of elements of or .
In the case of complete positive correlation between two data sets, if their element average is equal, it can be obtained that the data set relation as = , which means the two data sets are completely equal.
The absolute value of the average difference of the elements is expressed as follows:

Data compression algorithm based on Pearson correlation coefficients
The key idea of the algorithm is to divide the vector into two subvector 1 = [ 1 , 2 , . . . , ] and 2 = [ +1 , +2 , . . . , 2 ] with the same number of elements, and then to estimate whether to compress the vector or not by the comparisons between 1 , 2 and 1 , 2 as well as and . More details are given as follows: When 2 ≠ 2: as 1 , 2 ≥ and 1 , 2 ≤ , it is considered that these two vectors highly correlated and numerically similar.
represents the high correlation threshold values, ∈ [−1,1], ≥ 0 represents the threshold values closed to the average. The higher the value of is, the more accurate the data is, while the lower the value of is, the higher the compression ratio is.
The vector 3 whose elements are the average of the corresponding elements of two sub-vectors: , . . . , Then update as = [ 3 , 3 ] and the value of the corresponding position in the reading vector ; When 1 , 2 < or 1 , 2 > , it is considered that 1 is not highly positively correlated with 2 or the average value of vector elements is not close, then keep the original value.
In particular, for the case of 2 = 2, Pearson correlation coefficients are not applicable, so the absolute values of the difference between two elements in the original vector can be directly compared, that is, the magnitude relationship between � 1 − 2 � and .
To be specific, when | 1 − 2 | ≤ , the values of two elements in are similar, the let Likewise, update the value of the corresponding position in the reading vector . For the case with �r i 1 − r i 2 �＞t m , the original value is maintained.
Based on the above developments, construct the vector queue to be executed, add the unexecuted vectors 1 , 2 and 3 , delete the executed original vectors , and select the original vectors according to the order of addition to execute the above algorithm.

Improved data compression algorithm with outlier replacement
On the basis of the above algorithm, the concept of outlier is introduced. The outlier refers to the large difference between one or more data and other data in the data set, so it needs to be eliminated or replaced. Given that the number of vector elements does not match in the proposed algorithm if we remove the outlier directly, thus we adopt the method of replacing outlier. Meanwhile, it is also important to estimate whether the value is outlier or not, thus some data are chosen as alternate outliers in advance, then judge them as outliers by testing, and finally replace the associated data. The details are provided as follows.
In particular, for the case of 2 = 4, if 1 , 2 < or 1 , 2 > , and the absolute value of the difference between two elements in 1 and 2 is greater than , the reading element r i in the sub-vectors 1 and 2 is regarded as the alternate outlier, and let Then, calculate 4 , ′ 5 , ′ which are the Pearson correlation coefficients between 4 and vector ′, 5 and vector ′ respectively. Specifically, ′ can be obtained as follows: take the remainder that obtained through the first element position of vector in the reading vector set divided by 8. If the remainder is 1, take 4 elements backward from the position after the tail element to form vector ′; if the remainder is 5, take 4 elements forward from the position before the first element to form vector ′.
In the following, it is required to judge the magnitude relationship between 4 , ′ , 5 , ′ and as well as the magnitude relationship between 4 , ′ , 5 , ′ and respectively. When 4 , ′ ≥ and 4 , ′ ≤ , it is considered that R i 4 and R i ′ are highly correlated, the average value of elements is similar and the reading element r i is an outlier. Similarly, when 5 , ′ ≥ and 5 , ′ ≤ , it indicates that 5 and ′ are highly correlated, the average value of elements is closed, and the reading element is an outlier. Note that when the calculated values of 4 and 5 both meet the above conditions, then take the value that 4 satisfied.
Following the above analysis, calculate the corresponding vector 3 , then update [ , ′] = [ 3 , 3 ] and the corresponding values of and ′in the reading vector ; When 4 , ′ < or 4 , ′ > and 5 , ′ < or 5 , ′ > are satisfied, the reading element is not considered as an outlier and the original value is maintained.

Compilation dictionary algorithms
On the basis of the above algorithm, the concept of outlier is introduced. The outlier refers to the large difference between one or more data and other data in the data set, so it needs to be eliminated or replaced. Given that the number of vector elements does not match in the proposed algorithm if we remove the outlier directly, thus we adopt the method of replacing outlier. Meanwhile, it is also important to estimate whether the value is outlier or not, thus some data are chosen as alternate outliers in advance, then judge them as outliers by testing, and finally replace the associated data. The details are provided as follows.
The above algorithm produces a lot of time data redundancy, but does not carry on the substantive data compression, so the compilation dictionary algorithm is a feasible data compression algorithm.
When the vector queue to be executed is empty, the data processing ends. Count the number of the same and different elements in the reading vector, respectively, and sort the number of the same elements from largest to smallest, then assign the binary index according to the number of different elements, compile into the following dictionary, and the element reading in the reading vector is replaced by the binary index as follows: Finally, transmit the resulting dictionary and reading vector to the next sensor node. Enter the next period and circulate all the above steps. The overall algorithm structure diagram is shown in figure 1.

Simulations
Randomly generate a section of simulated data set with disturbance according to the above algorithm, and retain the undisturbed data set, and use the data compression algorithm to generate the compressed data set. Let = 8 , then one can gets = 256 . Sample 10 periods to generate trigonometric function simulation data, which is randomly superposed of normal distribution perturbations with (1,0.04), and then let = { ( − 2) * 0.35 − 1,1}, = 0.6 , where = 2 ( 2 ) represents the number of elements of the currently executed vector .
The simulation results show that the data compression is 35.90% of the original data, the average absolute difference between the compressed and the uninterfered data is 0.1374, and the one between the compressed and the collected data is 0.1352, and between the collected data and the uninterfered data is 0.1580. Note that the average absolute difference value approximates to that between compressed data related and disturbance, this indicates that the algorithm can keep good compression data rate while maintaining low distortion rate. Hence, the proposed algorithm is effective for wireless sensor network (WSN) for switch cabinet, and can greatly save the energy consumption, storge space and other problems of the limited sensor resources.

Conclusion
The data compression algorithm proposed in this paper can achieve good effect in processing the sensing data of the periodic switch cabinet with disturbances, which can compress the network data to 20%-40% of the original data, and guarantee the lower distortion rate. The more the total number of readings in per reading period, the higher the compression rate. Since the method proposed in the paper guarantees the timing sequence, the sensor data of the switch cabinet also guarantees the evolution trend of time. At the same time, adjust the two thresholds and reasonably according to the specific situations, so that the compression method can achieve different effects flexibly.