A fast CU Size Decision Algorithm for 3D-HEVC

. The emerging 3D-HEVC has achieved the highest coding efficiency but requires a very high computational complexity. To speed up the encoding process for the dependent texture views, we propose a fast CU depth range selection algorithm by jointly making use of the inter-view and temporal-spatial correlations. Firstly, adaptive correlation weights are proposed to predict coding unit(CU) depth range and skip some specific depth levels rarely used in independent view, the previous frame and neighboring CUs. Besides, a new early termination algorithm is proposed to further reduce the coding time. Experimental results demonstrate that the proposed method saves about 56% coding time on average compared to HTM with maintaining the similar video quality.


Introduction
With the rapid development of multimedia technology, compared to text, voice, images, video is used more and more abroad.However, the traditional 2D video can't meet people's need, and 3D video is becoming more and more popular.It is well known that the data size of video is huge, so it asks for higher requirement for transferring, storing and playing.Furthermore, 3D video uses more than one camera to shoot the same scene, which leads to increasing a large amount of information.
To improve the coding efficiency of high-definition video, the Joint Collaborative Team on Video Coding (JCT-VC) designs the standard High Efficiency Video Coding (HEVC) [1].Subsequently, the Joint Collaborative Team on 3D Video Coding Extension Development (JCT-3V) develops HEVC based 3D video coding standard (3D-HEVC), which is for the compression of multi view video plus depth (MVD) format [2].In 3D-HEVC, similar to HEVC, the mode decision process in HTM is performed using all the possible CU sizes , prediction modes, and coding tools(disparity-compensated prediction (DCP), interview motion prediction, backward view synthesis prediction (BVSP)) to find the optimal one with the least rate distortion (RD) cost using Lagrange multiplier, which leads to high computational workloads.It obstructs wide application of 3D-HEVC.
In order to reduce the complexity of video coding, much work has been done to explore the fast algorithms for H.264/AVC.Liu et al. [3] propose block partition algorithm based on image features to reduce the coding complexity.The low complexity mode prediction in [4] is proposed based on the spatial-temporal correlation.The algorithm in [5] uses SKIP mode to early terminate mode decision.Fast algorithms for HEVC inter prediction mainly aim at the process of coding unit (CU) selection which has high complexity.Method in [6] is proposed to terminate procedures of CU splitting by setting a threshold value based on the RD-cost of the CUs which have already been coded, if the RD-cost of the current CU is less than this threshold ,CU will stop splitting, and a fast CU size decision method based on coding tree pruning is proposed in [7].Shen et al. [8] propose three early termination methods based on motion homogeneity checking, RD cost checking and SKIP mode checking to skip the procedure of motion estimation on unnecessary small CU sizes.In 3D-HEVC, the maximum depth of the related coding block in dependent view is first used to early terminate the CU splitting in [10].A fast mode decision algorithm based on variable size CU and disparity estimation in [11] is proposed to reduce 3D-HEVC computational complexity.The texture quad-tree initialization (QTI) and depth quad-tree limitation (QTL) coding tools and their associated predictive coding (PC) algorithm by utilizing the correlations between the quad-tree of the texture and its associated depth is proposed in [12], which was adopted by the 3D-HEVC working draft.Zhang et al. [13] speed up the encoding process of dependent texture views based on inter-view correlation, which uses an early merge mode decision algorithm and an early CU splitting termination algorithm.CU depth correlation between the independent view and the dependent view is also studied to accelerate CU splitting process of dependent views in [14].
The aforementioned methods for 3D video coding only use view correlation or temporal-spatial correlation to predict the depth of current coding block, while not jointly considering these correlations.Consequently, in this paper, we comprehensively exploited the correlations between views , and spatial-temporal correlation, and then proposed a fast CU size decision algorithm based on adaptive weights called depth range selection algorithm and an early termination algorithm.Experimental results show that the proposed algorithm saves an average of 56% time compared to the original algorithm of HTM with negligible loss in compression efficiency.
The rest of the paper is organized as follows.Section 2 describes the proposed algorithm in detail.Experimental results are shown in Section 3. Conclusions are given in Section 4 2 Proposed fast coding algorithm 2.1 CU depth splitting analysis for texture views 3D-HEVC inheriting the same quad-tree structure as HEVC, which must traverse from depth level 0 to depth level 3, hugely increases the coding complexity of 3D-HEVC [15].Zhang et.al [13] made a statistical analysis about the probability of all four splitting depth levels of dependent texture view for each test sequence.The result shows that 76.2%, 1%, 3.9% and 1.4% of treeblocks choose the depth level "0", "1", "2" and "3", respectively.It demonstrates that small depth levels are always selected, so selecting suitable CU depth range is an effective method to save coding time.
In HTM15.0,we make a statistic of the encoding time of texture video and depth video, the result of which is shown in Figure 1.Every test sequence contains three views, and every view includes texture video and depth video.Figure 1.(a) shows that the encoding time ratio is 75% for texture pictures and 25% for depth maps.Figure 1.(b) shows the encoding time ratio of three texture views, it can be seen that View0 accounts for about 17%, while View1 and View2 take up about 40% respectively, which demonstrates that the two dependent textures (View1 and View2) occupy most of the encoding time in 3D-HEVC.Accordingly, the proposed fast algorithm in this paper is only for the two dependent texture views.

The depth range selection algorithm
Many previous fast CU decision algorithms use spatial-temporal correlations to predict the depth range of current coding block, but most of them adopt fixed weights that can't adapt to the degree of correlation, which may lead to imprecise prediction.Based on the characteristics of 3D video, we propose a depth range selection algorithm based on adaptive weights using split complexity(SC), which is defined as follows: SC Col is expressed as the split complexity of inter-view correlated CU.To avoid getting the same SC when depth levels are different, unlike the method used in [15], which directly averages the depth value, we adopt different definitions according to the maximum depth level.Experimental result in [15] demonstrates that the spatial-temporal correlation is connected with temporal average depth error.The larger the value is, the stronger the temporal correlation is, and the weaker the spatial correlation is.We evaluate the degree of correlation by using temporal split complexity error (TSCE) and interview spatial split complexity error (ISCE) which are defined as follows: The SC of current CU is predicted using spatial neighboring CU (Left, Up in Figure 2), the co-located CU (t in Figure 2) and inter-view correlated CU (Col in Figure 2 Where col W , l W , u W , and t W are calculated by jointly utilizing the temporal-spatial and inter-view correlations as follows: In addition, in order to take full advantage of interview correlation, we define a Correlated Split Complexity (CSC) in (6).If CSC<Th, DR is [0,1], otherwise do as follows: Where Th,th1,th2,th3 are thresholds, and they are set as 3.0,1.0,2.0 ,5.0 according to extensive experiments.
To verify the effectiveness of the proposed adjusted depth range, four sequences with different activities are tested in Table 1 and Table 2.It is observed from Table I that when Th=3.5, the probability of selecting depth level "0" and "1" can be achieved about 98%.When th1, th2, th3, th4 are set as 3.0,1.0,2.0, 5.5, we can see from Table II that the average prediction accuracy can be 92%-97%.

Early termination(ET) of CU
An early CU splitting termination is proposed to make a more accurate prediction about the splitting depth level of current CU by considering both inter-view correlation and spatial-temporal correlations.
For DR selection algorithm , the size of predicting CU is too large to predict accurately, so when early terminating the CU splitting, the size of referring CUs are 8x8, which locate in the left and top of current CU, and the maximum depth is named as Depth_neighbour.Besides, we also consider the depth level of corresponding block in forward reference frame, backward reference frame, and independent view of current CU, which are named as Depth_t 0 , Depth_t 1 , Depth_col respectively.
If the following two conditions are true, early CU splitting termination is performed for the current CU. 2) Skip mode is selected as the best prediction mode for the current CU.
Where _ Depth max is the maximum value of all the depth levels, uidepth is the splitting depth level of the current CU.

Overall algorithm
Based on the aforementioned analysis, the proposed overall algorithm incorporates the depth range selection algorithm and the early termination method.A flowchart of the proposed algorithm is shown in Figure 3.  Performance of the proposed algorithm is measured by BD-rate and ¨EncT.BD-rate evaluates the coding gains correspond to bitrate reductions, positive and negative values represent increments and decrements respectively.¨EncT shows the encoding time reduction compared with HTM15.0.
we compare the performance of the proposed algorithm called "DR&ET" and the fast encoder decision algorithm for texture coding called "EMD&ECUST" [13] with HTM15.0.The experimental results are presented in Tables III-IV.The "video1" and "video2" columns show the BD-rate of the two dependent texture views.The "Video PSNR/video bitrate" column shows the BDrate performance considering Y-PSNR of the coded texture views over the bitrate of texture data.
As can be seen from Table 3, the proposed method can reduce the entire encoding time by 56% on average.Meanwhile, the loss of BD-rate in dependent views is 0.52% and 0.57%.Therefore, the proposed DR&ET can efficiently reduce encoding time with a little loss of RD performance.The performance of "EMD&ECUST" is so good that it has been adopted to 3D-HEVC reference software, but our proposed DR&ET can reduce more time than it, and the BD-rate seen in "Video PSNR/video bitrate" merely increases about 0.5%.4, the proposed method can reduce more coding time with maintaining the RD performance than that of EMD&ECUST.In addition, according to Fig. 4(b), we can see that when QP is low, DR&ET is more effective than EMD&ECUST, while as the increase of QP, the performance of both algorithms will tend to be similar.In general, the proposed algorithm is better performance than EMD&ECUST.

Conclusions
In this paper, we propose a fast CU depth range selection algorithm and an early termination CU algorithm for 3D-HEVC.Experimental results show that the proposed algorithm can significantly reduce the computational complexity with negligible loss of BDrate.And this algorithm will make 3D-HEVC more available for real-time applications.

Figure 1 .
Figure 1.Analysis of coding complexity:(a) Encoding time comparison between texture and depth maps;(b) Proportion of encoding time of three views is the value of depth level, max depth is the maximum value of depth level.The corresponding blocks are shown in Figure 2. i indicates the type of correlated CU, for example, _

Figure 2 .
Figure 2. inter-view, spatial, and temporal correlated CUs.Cur: current CU; Left: Left CU; Up: upper CU; t: the co-located CU in the previously frame; Col: inter-view correlated CU.
C S C = m ax { S C _ co l,S C _ L ,S C _ U ,S C _ R ,S C _ D } (6) According to _ SC pre and CSC, we can predict depth range of current LCU as follows: First, calculate _ SC pre and CSC; Next, predict the depth range of current CU according to CSC and _ SC pre ; Last, check the early termination conditions of current CU, and if the conditions are met, CU splitting ends, otherwise, the depth level increases by 1.

Figure 3 .
Figure 3. Flowchart of the proposed overall algorithm

Figure 4 (
Figure 4 (a) and (b) illustrates an example of RD and time saving curves under four different QPs compared to HTM15.0.As shown in Fig.4, the proposed method can reduce more coding time with maintaining the RD performance than that of EMD&ECUST.In addition, according to Fig.4(b), we can see that when QP is low, DR&ET is more effective than EMD&ECUST, while as the increase of QP, the performance of both algorithms will tend to be similar.In general, the proposed algorithm is better performance than EMD&ECUST.

Figure 4 .
Figure 4. Experimental results of "Ballons" in video 1 under different QP settings.(a) RD curves of "Ballons" in video 1.(b) Time saving curves of "Ballons" in video 1.

Table 1 .
Probability of selecting low depth level for small CSC.

Table 2 .
Accuracy of depth range selection