Treatment of various avian influenza virus based on comparison using decision tree algorithm

Recently, the world is full of anxiety about AI(Avian Influenza). Among Avian Influenza virus subtypes, H5N1 is considered the most threatening to not only birds, but also humans as numerous human cases with high mortality have been reported. Unlike H7N9, which has not been reported infect humans, H5N8 also became infectious to humans due to dramatic mutation. As human infection cases of AI have increased, numerous researchers have been trying to develop an effective treatment against them. Thus, our project group decided to analyze the similarity and difference of H5N1, H5N8, and H7N9, since it would be useful for finding effective treatment of AI, using Decision Tree Algorithm which figures out distinctive factors of given dataset for comparing protein sequences of each viruses. The comparison using Decision Tree Algorithm, which indicates correlation among H5N1, H5N8, and H7N9, will be effective on narrowing the range of attempts on developing treatment for Avian Influenza virus.


Introduction
Recently, the world is full of anxiety about AI(Avian Influenza), as many Avian Influenza viruses subtypes have killed not only a number of birds but also other species and even humans. Among them, the highly pathogenic influenza A virus subtype H5N1 is considered the most threatening to both birds and humans.
H5N1 has killed millions of poultry in numerous countries throughout Asia, Europe, and Africa, causing enormous damage. Generally, AI is believed to be infectious only to birds. However, since the first transmission of H5N1 to human was reported in Hong Kong in 1997, 844 cases have been reported until December, 2015 in Cambodia, Egypt, Indonesia, Viet Nam and so on, resulting a high mortality rate of about 50% [1,2]. Similar to H5N1, H5N8 is an avian influenza virus. As H5N8 devastated South Korea in 2014, about 200 million 3 thousand poultry were killed [3]. It has not been reported to infect humans unlike H7N9, a kind of AI, which became infectious to humans starting from 2013 due to dramatic mutation [4]. However, it seems that H5N8 is fairly capable of infecting humans in that H7N9 went through a dramatic mutation in a relatively short period.
As human infection cases of AI, which has a considerably high mortality rate, have increased, numerous researchers have been trying to develop an effective treatment against H5N1, H7N9. Actually, there are various suggested vaccines that mainly target the glycoprotein of influenza [5].
In this research, we analyzed the similarity and difference among the respective Protein HA, NA, NP, M of H5N1, H5N8, and H7N9 using Decision Tree Algorithm, since it would be useful for finding effective treatment of AI.

H5N1
H5N1 virus is a subtype of Influenza A virus which triggers highly pathogenic avian influenza (HPAI). One of the most disastrous infections of H5N1 occurred mid-December 2003 to January 2004 [6]. Avian influenza H5N1 has been found in wide areas; its human susceptibility has been reported in South Korea, Japan, Vietnam, Thailand, Cambodia, Indonesia, Pakistan, Laos, and China [7], threatening human health and resulting a plunge in agricultural production [8]. Direct contact with pathogenic materials such as excreta or secretion of poultry seems to cause infection of H5N1 to human. H5N1 has been reported to cause human respiratory problems, fever, pneumonia, and relatively severe cytokine releases [9].

H5N8
1 H5N8 virus is a subtype of Influenza A virus. It was first reported from turkey in 1983 Ireland. In January 2014 South Korea, avian influenza that occurred among poultry and caused enormous damage was confirmed as a H5N8 virus [10]. H5N8 viruses cause fever, respiratory problems, and conjunctivitis among poultry.

H7N9
H7N9 virus is a subtype of Influenza A virus, but was newly revealed as a human infectious virus since the outbreak from China in 2013 [11]. Several H7 viruses such as H7N2, H7N3, and H7N7 had been previously verified as human-infectious. Its infection pathways also include direct and indirect exposure with pathogenic materials. Generally, H7N9 brings deadly pneumonia and fever on human while it causes relatively slight illness among poultry. As a result, it requires researches to distinguish infected poultry from others [12].

Glycoprotein of Influenza A virus; Hemagglutinin (HA) and Neuraminidase (NA)
Hemagglutinin (HA) and neuraminidase (NA) are among the essential proteins for invasion of virus into host cells; there are 11 type of Influenza A viral proteins PB1, PB1-F2, PB2, PA, HA, NA, M1, M2, NS1, NEP, and NP. Hemagglutinin is a surface glycoprotein that enables virus to initiate chemical reactions to get into plasmic membranes of infected cell. It combines with monosaccharide sialic acid, accelerating attach on the surface of cell [13]. Neuraminidase is also viral surface glycoprotein that is, on the other hand, responsible for the escape of virus from host cell. It decomposes sialic acid from host cell, facilitating transmission to other cells [14]. The type of hemagglutinin and neuraminidase from virus has been considered as distinguishable elements for classification in influenza viruses. As a result, the name of Influenza A virus reflects the type of HA and NA, resulting H1N1, H5N8, H7N9, etc.

Decision Tree Algorithm
Decision Tree is a common tool of data mining using tree diagrams. It contains nodes that require decisions for each questions about the relative degree of influence on specific results. Decision Tree Algorithm(ID3 algorithm; Iterative Dichotomiser 3 algorithm) utilizes Decision Tree in order to deal with relevance between given data(‫ݔ‬ ଵ , ‫ݔ‬ ଶ , ‫ݔ‬ ଷ , … , ‫ݔ‬ ) with major component( ܻ ). Decision Tree algorithm indicates the relationship as It extracts rules that show certain dataset which has lowest H; in other words, which has highest relevance with result(Y) sets. This process originates from decision trees judging which data to accept; based on entropy(H) measure [15].
In our research, Decision Tree Algorithm was used for showing interrelation of H5N1, H7N9, and H5N8 viruses in molecular level; amino acidic sequences of each viruses. From extracted rules that indicates the degree of differences among viruses, it is possible to infer distinguishable characteristics of each objects [16]

Result
We basically extracted data with frequency rates 0.750 or higher in HA. However, we extracted data with frequency rates 0.800 or higher in H7N9 HA under 5 window, 9 window due to observation of relatively excessive rules than other viruses. Also, we extracted data with frequency rates higher than 0.750 in NA.   features in HA. Also, it can be assumed that position 3 is the important factors that differentiate each other since position 3 is observed most frequently. Position 1 and 4 also seem to be important factors since they are also observed frequently.   According to Table 2, Table 3, similar to 5 window, rules extracted from H7N9 were distinctively numerous than rules from the H5N1, H5N8. Furthermore, there were no extracted rules of H5N1. This result also supports that H5N1, H5N8 have similar amino acidic features in HA, and H7N9 has distinctive amino acidic features compared to others. In Table 2, it can be assumed that position 5 is the important factors which differentiate each other. In Table 3, position 4 and position 9 are the important factors which differentiate each other. Due to altered window condition, although positions which are considered as important factors changed, the fact that H7N9 has excessive rules compared with H5N1 and H5N8 did not change.  According to Table 4, the results under 5 window, it is noticeable that rules of amino acids at position 3 and position 4 were shared between H5N1, H5N8, and H7N9. Compared to Table 2 and Table 3, NA sequences generally showed equal frequency of extracted rules. It showed identical positions but different corresponding amino acids. We assume that these viruses showed different amino acidic features in NA with each other probably due to the difference in types of viral neuraminidase.  According to Table 5, the results under 7 window, rules of amino acids at position 1, position 2 and position 3 were shown as important factors that differentiates viruses each other. This result that shows existence of relatively equal number of unique rules also supports that H5N1, H5N8, and H7N9 have distinguishable amino acidic features from each other in NA.  According to Table 6, the results under window 9, we could assume that position 6 works as the important factors for differentiating three viruses because it is observed it is observed most frequently.
Not as our thought, no rule was found in NP and M of H5N1, H5N8, and H7N9. This result is due to the absence of distinguishable amino acidic differences that are huge enough to be considered as key factor. Thus, we could assume that these viruses may indicate similar amino acidic features in NP and M proteins.

Conclusion
With the Decision Tree algorithm, we could see the similarities and differences between three typical avian influenza viruses; H5N1, H5N8, and H7N9. Particularly, Decision Tree Algorithm showed that NP and M of H5N1, H5N8, H7N9 have significantly similar amino acidic features. However, NA of three viruses have considerable differences amongst themselves. Also, in HA, H7N9 has distinctively different amino acidic features compared with H5N1 and H5N8, which have similar amino acidic features.
As a result, we could conclude that H5N1 virus treatments targeting NP and M would be effective to H5N8 and H7N9, which also means we can use common treatments to cure all of H5N1, H5N8, and H7N9. Also, we made a conclusion that H5N1 virus treatments targeting HA will be effective to H5N8, not to H7N9, which indicates we can use common treatments to attack both H5N1 and H5N8.
The outbreaks of Influenza A virus have been causing fatal diseases on both human and poultry. Researchers generally search for vaccines and its infection pathways to prevent its spread when new type of Influenza A virus occurs. Thus, if they lacked data of such virus, the treatment would require numerous attempts on finding appropriate viral vaccine. Hopefully, from our research on Influenza A virus (H5N1, H5N8, and H7N9), we investigated the similarity and difference between viruses using data mining Decision Tree Algorithm. As we mentioned above, it suggests the possibility of effective methods for developing treatments on each viruses in terms of time and economy.
Based on its similarity among viruses, we can narrow the range of attempts on developing treatments. Even when a new type of Influenza A virus emerges, it is possible to infer its overall characteristics with indirect bioinformatic research on correlation between known viruses. We hope that our research on Influenza A virus will be widely used for epidemiologic research of various viruses and contribute to finding effective treatment of them.