AdaBoost typical Algorithm and its application research

. Boosting is popular algorithm in the field of machine learning. Adaboost is the most typical Algorithm in the Boosting family.This paper introduces Boosting and its research status briefly ， and introduces the typical algorithms of each series respectively.

mainly focus on classification problems. For example, the application of literature [7] ,mainly solves the two-class problem, multi-class single label problem, multi-classification and multi-label problem, class single label problem and regression problem.
Based on the application of AdaBoost in text categorization, this paper presents the basic idea and application of the two-class single label problem, and analyzes the algorithms of AdaBoost.M1and AdaBoost.M2 and their application in solving the multi-class single label problem, and the algorithms of AdaBoost.MR and AdaBoost.MH and their application in solving the multi-classification and multi-label problem.

AdaBoost algorithm and the two-class single label problem
Prepare training sets(x 1 ， ｙ１)， …， (ｘ n ， ｙ n ). x i X， X represents a certain domain or instance space, and each member is a training example with a label.
Thirdly, compute the weight of weak classifier based on the error rate:  t =(1/2)ln((1- t )/ t ). Fourthly, update the sample weight:  Z t is a standardized factor that meets the probability distribution.
⑷T-weak classifiers are gained after T times of circulation, and a strong classifier When AdaBoost.M1calls the WeakLearn, the error rate This is the most direct multiclass extension of AdaBoost. When the WeakLearn is strong enough to gain proper precision in the difficult distribution generated by AdaBoost, it is enough to solve multiclass problems.
Provided that the WeakLearn can not gain a precision of no less than 50%, the method fails. Therefore, multiclass problems are generally solved by simplifying them into multiple "two-value problems". This idea is the preferred choice for algorithms of AdaBoost.M2，AdaBoost.MH， AdaBoost.MR, etc. to solve multiclass problems.

AdaBoost.M2 algorithm
Input N-example sequence (x 1 ，ｙ１)，…，(ｘ n ， ｙ n ),label yi ∈ Y={1,2 ， … ， k}, and set D as the distribution on the instance, WeakLearn as the weak learning method, and T as the iterative times. Initially, AdaBoost.M2 is a special case of AdaBoost.MR. For each sample x i with correct label y i and each incorrect label y (y represents the other k-1 type except y i ), raise such a two-value question as : for x i, the correct label is y or y i ? Here, the given h is used to answer the k-1 two-value questions, and separate out the correct label y i from the incorrect label y. Propose , then the answer to the above question is y i ; if h(x i ，y)=1 and h(x i ，y i )=0, the answer to the said question is y. If h(x i ，y)=h(x i ，y i ), then select any one from the two at random. When h fetch its value from [0,1], h(x，y) can be interpreted as a random decision: select a random bit b(x， y){0， 1}, the probability that the value is equal to 1 is h(x，y), and the probability that the value is equal to 0 is 1h(x，y). Hence, the probability to select incorrect answer y is: If answers of all the k-1 questions are equally important, the assumed loss is defined as average value:

AdaBoost.MR and AdaBoost.MH algorithms solve multi-class and multi-label problems
Each object can belong to one or more of multi categories. A single label classification problem is a particular example of multi-label ones. Supposeγis the finite set of a label or a class, and make K=|γ|. In the multi-label case, each instance x i X may belong to multi labels in γ; therefore, the example with a label is (x, Y) pairs, among which Yγis a label set given to x. The purpose of learning is the need to look for a label hypothesis of a label set given to the instance by prediction, i.e., the purpose is to look for the probability

H:Xγof the minimized H(x)Y on a new type (x,Y) .
This type of measurement is referred to as one-error of hypothesis H since it measures a probability even with an incorrect label. The one-error D (H) is used to express the one-error about distribution D made by hypothesis H when observing on (x,Y), namely, among which Z t is a normalization factor, and then gain the final hypothesis: AdaBoost.MH operates by producing a set of two value problems to each sample x and each sample y, as follows : for sample x, its correct label is y or something else? Suppose the present target is only to predict all the correct labels, then learning algorithms generate the hypothesis of predictive label set, and the assumed loss depends on how large the difference b/w predictive set and observed result is. Consequently, the loss is , among which Δ represents symmetric difference, and is referred to as the hamming loss of H and expressed as hloss D (H). In order to minimize it, the problem is resolved into k-Orthogonal binary classification ones, namely, Y can be considered as the special value of k-binary labels (relying on whether a certain label y is included in the Y ). Similarly,   x h can be considered as k-binary prediction. Thus, the loss can be regarded as an average error rate of h in

AdaBoost.MR based on ranking loss
Input training sets ( T loops execution: ⑴ use distribution D t to train the WeakLearn; ⑵gain weak hypothesis h t ：X×→R; ⑶ choose  t R; ⑷revise: find the hypothesis of the label set connecting with each instance, while the purpose of AdaBoost.MR is to find a hypothesis which sorts the labels, in the hope that the right labels can reach the highest level. To formally express, find a hypothesis shaped like f：X×R, and interpret it as: to sort the labels in a given instance x，γ as per f(x), that is to say, if f(x，l 1 )f(x，l 2 ), the level of a label l 1 is considered to be more higher than that of l 2.  are key pairs in relation to (x i ,Y i ), this distribution is non-zero in the triples(i, l 0 ,l 1 ).

Improved algorithm of AdaBoost.MR
Input training sets(x1,ｙ１),…,(ｘm,ｙm)， When the original AdaBoost.MR method has a lot of labels, the efficiency both in time and space is not high enough. |Y i || Y i |-weights need to be maintained for each training example(x i ，Y i ) and each weight needs to be revised each time. In reality, for the same algorithm, O(mk 2 ) may be the difference between space complexity and time complexity that each iteration needs , and only weight v t needs to be maintained in {1,2,…,m}.
Provided that l 0 ，l 1 is the key pair in relation to (x i ，Y i ), it is true that D t (i,l 0 ,l 1 )=v t (i,l 0 ).v t (i,l 1 ). Thus, it can be seen that this algorithm is equal to the original AdaBoost.AR algorithm and the overhead is only O(mk).

Conclusion
AdaBoost has the advantages of being quick in speed, simple in operation and easy to be programmed.
There is no need to adjust parameters except for the number of iterations. It can be flexibly combined with any method to look for weak hypothesis without the need of any priori knowledge of WeakLearn. Given sufficient data and a WeakLearn with only reliable moderate accuracy, it can provide a set of theoretical guarantee of learning. It focuses on look for a weak learning method only better than random predication, instead of trying to design an accurate algorithm in the whole space.
However, AdaBoost still has its own disadvantages. For example, its real performance in a particular problem clearly relies on data and WeakLearn; in the event of insufficient data set, it has a poor performance in too complex and too weak hypotheses; and it seems to be very sensitive to noise. Sciences,1997,55(1):119-139