Research of H5N6 Treatment by Comparing with H6N1 and H10N8 by Using Decision Tree and Apriori Algorithm

Since 2003, 608 people in 15 countries have infected with human-infectious AI viruses and 359 of them died. Especially, in China, H6N1 and H10N8 viruses were wide-spread and a lot of people were infected and died. Recently, H5N6 virus emerged in China and the number of patients has been increasing gradually. Therefore, this research compared amino acid strain of Matrix Protein, Hemagglutinin, Neuraminidase and Nucleoprotein of H5N6, H6N1 and H10N8, by using Decision tree and Apriori Algorithm, to figure out their similarity and devise the treatment. In result, Matrix protein and Nucleoprotein sequences of H5N6 were similar with those of H6N1 and H10N8. Therefore, this research concluded that the treatment targeting those proteins of H6N1 and H10N8 will be also effective to H5N6.


Introduction
Since 1878, when the first AI(Avian Influenza) was reported in northern Italy, a great number of chickens and ducks have infected with various AI viruses and been buried.[1] At the beginning of the outbreak, it was considered that only avian features could come down with the flu like its name, but now even humans are the host of AI viruses.
Since 2003, 608 people in 15 countries have infected with human-infectious AI viruses and 359 of them died.Especially, in China, the number of people who died due to viruses including H10N8 and H6N1, is overwhelming compared to other East Northern Asian nations.[2] As a considerable number of AI viruses in East Asia were from China, there is a strong possibility that humaninfectious AI viruses arisen in China will spread to near countries.[3] In this situation, examining of H5N6, a new human-infectious AI virus in China, can contribute to the treatment of not only the virus but future-emerging AI viruses.
Thus, we are going to compare H5N6 with H10N8 and H6N1 by using Decision tree, Apriori Algorithm and discuss the treatment of it according to the result.[4] 2 Materials and methods

Avian viruses
Since 1990s, many influenza A viruses have circulated in poultry populations in China, which offer abundant gene pools for mutations.All of three viruses we use in this experiment are subtypes of influenza A viruses from China.[5]

H5N6
The first case of human H5N6 infection is a 49-year-old man from southwest China's Sichuan Privince in April, 2014.Since the initial case, nine human infections were reported, including the recent cases in 4 January 2016 and 11 January 2016 in China.[6]

H6N1
H6N1 viruses had circulated persistently in poultry, and continued to evolve and accumulate changes, increasing the risk of cross-species infection.Although H6N1 virus is one of the most common viruses in wild avian species, it has only infected one person, a woman in China, who even recovered.[7]

H10N8
Human infection with H10N8 was reported in China in December 2013 and January 2014.Those two patients showed signs of pneumonia, and died soon despite the administration of antibiotics and antiviral agents.Thanks to the later experiments, there is no more fatal infection case.[ The A decision tree is a way of algorithm specialized in classifying data and sequential decision problems.A decision tree uses explicit tree-shape graph so that many can easily read and understand.
A decision tree splits the remaining data and make common attribute to node.Then set groups into subsets that usually branches down with particular attribute.When expressed, the data is described mathematically for its generalization and categorization.It follows the form of (x,Y) = (x1, x2, x3,....,xk,Y).Vector x is composed of input variables and a dependent variable Y is the purpose should be understood, classified, or generalized.[9]

Apriori algorithm
Apriori is one of the usual algorithms of association rule.Apriori is a method for clarify correlation among data on the basis of frequency and general rules of data.Apriori has two steps to start processes.First, it finds groups of high frequency following the minimum support previously set.Second, figure out the association rules which satisfy value of confidence among the groups of high frequency.Then, all the groups of high frequency's subsets have high frequency also, and certain group can not have higher support than before regardless of a new added item.Therefore, Apriori works based on following sentence, for a start, it finds groups of high frequency among the groups whose size is 1.Then among the groups whose size is 2, 3, ...... k, until it can't make the group, and get the answer.called bottom up approach. [10]

Experiment method
First of all, we collected nucleotide sequences of H5N6, H10N8, H6N1 viruses from NCBI, "National Center for Biotechnology Information".We designed experiment using Decision tree and Apriori algorithm in window5, window7, and window9 for each virus.By using those algorithms, we track out certain frequencies in Matrix Protein(M), Hemagglutinin(HA), Neuraminidase(NA), and Nucleoprotein(NP).

Decision tree
By using decision tree, we track out rules of patterns with certain frequencies in M, HA, NA, and NP.A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V are names of amino acids.We use three viruses as experiment variables in every fold, and the results show both similarities and differences.We used the sequences for a 10-fold cross validation experiment, that is to say, we test 10 experiments Table1-6.When we gather extraction results, we first selected the rules of patterns of three viruses whose frequency rates higher than 0.750 and extracted at least two times.This is because degree under 0.750 is considered as almost unfixed rule patterns that are too unclear to be sure.
The decision tree reveals whole kinds of amino acids of HA and NA, so we can find lots of similar and different amino-acids.Also, there were proportional relationship between the lengths of the rule(window number) and the amount of particular rules of high frequency.Usually, the rules have frequency of 0.75, but window9 pos8=M and pos1=W have frequency of 0.833, for instance.
Furthermore, not as we expect, there was no rule in the experiment results in NP and M. Considering the characteristic of decision tree, we can see this result as different amino acids among these viruses did not have differences that are enough to be a key factor.Therefore we can see these viruses as similar species.

Apriori algorithm
A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V are names of amino acids and the number means frequency.Star marks (*) beside amino acids in H6N1 and H10N8 mean that the rule is same with that of H5N6.In M experiment Table 7, 9 of 17 rules of H5N6 were same with H6N1 and 8 were same with H10N8.Key factors are L(Leucine) and E(Glutamic acid).We discovered similar rate and feature in the experiment by window7 and window9.
In HA experiment Table 8, only 2 of 9 rules of H5N6 were same with H6N1, and none were same with H10N8.Even in the experiment by window7 and window9, we couldn't find any certain key factor.. In NA experiment Table 9, none of 11 rules of H5N6 were same with H6N1, and 2 were same with H10N8.We got the similar results in the experiment by window7 and window 9.In the whole, S(Serine) and G(Glycine) appeared in large numbers, but we didn't regard amino acid in different location as a standard of choosing key factor.In NP experiment Table10, 8 of 9 rules of H5N6 were same with H6N1 and H10N8.We got the similar results in the experiment by window7 and window 9. We got the highest sameness rate(88.9% in both) in this experiment.Key factors are R(Arginine) and E(Glutamic acid).According to Decision tree and Apriori algorithms, we found new-emerging H5N6 has significant similar Matrix protein(M) and Nucleoprotein(NP) sequences compared with H6N1 and H10N8.Therefore, we concluded that H6N1 and H10N8 virus treatment targeting M and NP will be effective to cure H5N6 virus.M2 protein is a channel protein that protons selectively pass through and plays an important role in replication cycle of influenza virus and NP provides important information in viral NA replication [11], [12].

Discussion and conclusion
This study provides efficient way to treat avian influenza virus by discovering similar sequences between three kinds of viruses and targeting those sequences with one specific treatment such as M and NP inhibitors.We could find similarities between three influenza viruses.This gives us relevant possibility to make a treatment targeting the inhibitors.If human inhibitors also have relevant similarities, scientists could make an efficient cure for the disease.But we should be cautious to this because if we target the wrong inhibitor due to misunderstanding, it could damage normal genes, which could lead to other serious disorder.For example, a nonreplicating cell such as neurons or brain cells, a singlestrand break or other type of damage in the transcribed strand of DNA can block RNA polymerase II to transcript gene sequence.Consequently, the body might not be able to make essential proteins, which regulate gene revelation.[13] Even if the treatment for influenza is devised, there is still remaining concern as mentioned above.Researchers of this disease should notice that making the treatment of the influenza disease is not the only problem, but checking the possibility of malfunction is also important, and that taking care of all is the ultimate goal of public welfare.