Quantile Regression Learning with Coefficient Dependent l q - Regularizer

. In this paper, We focus on conditional quantile regression learning algorithms based on the pinball loss and l q -regularizer with 1≤q≤2. Our main goal is to study the consistency of this kind of regularized quantile regression learning. With concentration inequality and operator decomposition techniques, we obtained satisfied error bounds and convergence rates .


1INTRODUCTION
Let X be a compact subset of n  and Y=  , ρ be a Borel probability distribution on Z=X×Y which governs the relationship between the input data x and the response y.The set of samples , see [12].
Let K be a Mercer kernel, i.e., a continuous, symmetric, and semi-definite function defined on X×X.The reproducing kernel Hilbert space H K is defined to be the completion of span

= 
(see [1] for detail).Kernel based quantile regression learning schemes can be described as Here H K is taken as the hypothesis space, and the penalty term is based on the norm of functions in H K .The error bound and asymptotic convergence of this learning scheme have been discussed, see [18,12] and references therein.
To balance the approximation ability and sparsity of the algorithm (2), the ε-insensitive pinball loss is studied in [19].Now, we restrict our attention to coefficient-based regularization schemes in a data dependent hypothesis space.The hypothesis space here is determined by a Here the kernel K is only asked to be uniformly bounded and continuous.Since we don't require a Hilbert or Banach space norm for functions in the hypothesis space, the penalty term This kind coefficient-based regularized network was first introduced by Vapnik [15] to design linear programming support vector machines.Taking this regularization scheme, one can more freely choose kernel function K and different norms of coefficient vector as the regularizer to fit the data in certain trend.Coefficientbased regression learning with l q -norm for 1≤q≤2 can be defined as Mathematical analysis of coefficient-based regression learning (4) has been established, include, framework of analysis for coefficient-based regression learning is proposed in [17] , for coefficient-based regression learning with l 1 -norm penalty see [20,8,13], for coefficient-based regression learning with l 2 -norm penalty see [7,14], also for l q -norm penalty see [6].
In the quantile regression learning, Li and Sun takes more general kernels and ε-insensitive pinball loss with l 2 -norm and l 1 -norm regularizer respectively, see [4,5] .
In this paper, we consider the following quantile regression learning scheme with ε-insensitive pinball loss and coefficient-based where . In the sequel, the empirical τ-quantile error of a function f is denoted by We will prove the asymptotic convergence of this learning scheme, i.e., how the output function approximates the quantile regression function τ ρ, f as m→  .In fact, we extend the existing results of this learning schemes from l 1 and l 2 regularization to more general l q regularization (1≤q≤2).
The rest of this paper is organized as follows.In section 2, we give the assumptions and main result.In Section 3, we give the estimates for hypothesis errors and sample errors.In Section 4 we give the error bound and learning rate by iteration method.

Assumptions and Main Result
The projection operator (see [2]) deals with heavy tailed noise well, that is helpful in obtaining our main result of Definition 1.The projection operator π on the space of function on X is defined by .It is proved in [11] that for any measurable function f on X, p' Where By this conclusion, we can bound the error in learning theory scheme.Note that Through introducing continuous we can decompose the excess generalization error as the following.

Proposition 1. Let λ>0 and
given by (5).Then ( ) The left hand side of ( 12) can be decomposed as .The last item is also at most zero due to . This proves the desired inequality. Let be the nonnegative eigenvalues of integral operator K L , and N k , φ k  be the associated orthonormal eigenfunctions.
Our approximation condition is given as Proposition 2. Under the approximation condition (13), and 0<λ≤1 .Then there holds It follows from (13), Our sample error is estimated through a concentration inequality, so the capacity of the hypothesis space plays an important role.Covering numbers are often utilized to measure the capacity which have been well studied in [21] .
Let F be a set of functions on X, The sampling operator For any R>0, denote that Our main result is given in the following that will be proved in Section 4.

Estimates for hypothesis errors and sample errors
In this section, we firstly estimate the hypothesis error H 1 and H 2 by the following probability inequality in [9].
Lemma 1.Let H be a Hilbert space and ξ be a random variable on a probability space (Z,ρ) with values in H. Assume Apply Lemma 1 to the random variables  16) with some 0<μ<2.Let R≥1 and 0<λ≤1.Then, for any 0<δ<1, with confidence 1-δ, we have Here C 2 is a constant independent of m,λ,δ .Proof.Recall the estimate of H 1 , for any 0<δ<1, with confidence 1-2δ/5, there holds 2λ This inequality ensures that for every By the proof of Proposition 5 in [4], there exists λ R U with measure at most δ/5 such that for every Moreover, from Proposition 3, we know that there exists a subset U 2 with measure at most 2δ/5 such that for every , by plugging the above estimates into (20), Proposition 4 holds.
We next bound S 1 by Proposition 5 proved in [4].Proposition 5.Under assumptions of Proposition 4. Let R≥1 and 0<λ≤1.Then, for any 0Vδ<1 and all f R B  , with confidence 1-δ, there holds Here C 1 is the constant dependent on the constants μ,θ,c μ,K ,C θ .

Error Bound and Convergence Rates by Iteration
In this section, we deduce the error bound and convergence rate by the iteration technique.
For R≥1, denote 13) and ( 16) hold.Suppose that ρ has a τ-quantile of p-average type p' for some ] , . Let 0<λ≤1,R≥1, and 0<δ<1.Then, there exists a subset R V of m Z with measure at most δ such that for any Here C ˆ and 3 C are constants independent of m,λ, From proposition 3 we know that there exists 1 From proposition 5 we know that there exists a subset 2 V of m Z with measure at most δ/2, such that for , the measure of R V is at most δ.With the above bounds combined in proposition 1, for every Where Π is the rest three terms.By Lemma 7.
By the definition of λ z, f and |y|≤1 ,we have .Then, for any 0<δ<1 and arbitrarily small 0<η≤1, with confidence 1-δ, there holds Let us define a sequence Since the measure of Finally we are in the position to prove our main results.
Combine Lemma 3 with Proposition6, and replace δ by δ/2 in both results, with confidence 1-δ, we have constant specifying the desired quantile level.Throughout this paper, we assume that the conditional distribution xin quantile regression learning.The associated generalization risk for replace the τ-pinball loss τ ψ , and performance of the associated regularization scheme λ z,  by |y|≤1, then the second item of the above equation is at most zero by the definition of λ z, f ˆ and (u) ψ ) (ε τ with 0<μ<2 hold.Suppose that ρ has a τ-quantile of p-average type pthe right hand side of (17) is related to regularization parameter q.
us deal with H 1 first.Consider the random variable q λ

Proposition 4 .
Suppose that ρ has a τ-quantile of paverage type p' for some ] know that there exists a subset R U of m Z with measure at most δ/10, such that for q

For
to obtain a tighter bound we shall apply iteration technique that has been widely used in learning error estimate, see[11,16].
is complete.