Data mining and analysis of bacillus virus

Anthraxcan be found naturally in soil and commonly affects domestic and wild animals around the world. Bacillus anthracis is mostly common viruses of Anthrax. There are some more occurred in similar DNA sequences: 3 Bacillus Viruses: Bacillus anthracis, Bacillus cereus, and Bacillus thuringiensis. This is a report about analyzing the similarities between the Bacillus viruses by investigating the frequency of amino acid and finding the difference between those three viruses based on the gene. Those diseases are infected by parasite and host animals, and cause muscle pain. Therefore, we can conclude that Leucine is a protein that plays a significant role in causing muscle pain. Secondly, In Analysis of decision tree, there are only little differences between each classes. The classes represent positions that include representative protein. The very position that windows mention is their difference.


Introduction
Bacillus, a bacteria, is a bar and cylindrical shaped aerobacter.There are a 34 species of bacillus virus, such as bacillus cereus, which causes food decay and has flagellum but doesn't have any capsule (different from bacillus anthracis), and bacillus thuringiensis, which produce sterilized toxin.However, most of them are avirulence, but bacillus anthracis is the only virus that causes anthrax.Therefore, Bacillus anthracis was utilized for a number of biochemical weapons.Since each bacillus virus causes different influence to the nature, we attempt to reveal the difference between bacillus cereus, bacillus thuringiensis and bacillus anthracis.We use decision tree program and apriori program as classification standards.Apriori is used to measure the frequency while the decision tree classifies elements that are unique.

Apriori experiment
We did an experiment about three bacillus viruses: bacillus anthracis, bacillus cereus, and bacillus thuringiensis.We made the results using apriori algorithm and decision tree algorithm.Before having an apriori experiment, we divided three bacillus into three classes: 13window, 17window, and 19window.Since the lengths of each bacillus are different, we divided them into different windows according to their lengths that the comparison with the viruses' position similar is possible when the sequences are repeated.We got similar consequences from each cases, thus overall we could conclude that acid I-Leucine has the highest frequency and that all three cases are involved in the actions occurring in the body.

Captions/numbering
We use apriori algorithm to figure out which protein has highest frequency in each of three bacillus viruses.Apriori algorithm is used to find out the most frequent protein, like frequent item set mining algorithm, and association rule.learning over transactional databases.So, we use this algorithm to know which amino acid mostly consists bacillus-Leucine-, why Leucine shows the highest frequency and how Leucine affects bacillus and our bodies.

Apriori results
We got exactly same results in all experiment-all windows of three viruses: Leucine.

Results analysis
As you see in the tables ahead, results of all three bacillus is same-Leucine.Even though there are small differences among figures-exactly show how much time Leucine had been detected-, such things are so insignificant that it cannot cause substantive distinction inside the organism.Therefore, we could figure out that Leucine plays important role in most bacillus virus.For instances, most effects of amino acids on bacillus signaling are abolished by lowering the concentration of Leucine.

Leucine and myalgia (How Leucine effects on our bodies)
Bacillus anthracis, which is infectious to animals, is endemic(Anthrax)-a disease which regularly transmit infection to regional societies-.In some cases, it is infected by physical contacts, thus mostly it occurs in certain region or tribes.You would suffer from myalgia, serious cough, dyspnea, papule, and so on.Not only anthrax but also generous endemics-such as malaria, dengue fever, yellow fever, typhoid fever, cholera, etccontain those symptoms, especially myalgia.And apriori experiment results of those viruses are the same: Leucine.Therefore, we may concluded that Leucine consists most of endemic viruses and brings out myalgia.The experiment is adapting Quinla's C 5.0 [3] algorithm and rule extraction method.See 5.0 program is used for the experiment.
Fig. 1 shows abstract shape of common dicision tree, which we used, and Fig. 2 shows the part of decision tree and condition to get to the next node at particular situation.We use Decision tree algorithm to find out the difference between three bacillus viruses.Decision tree algorithm showed position of amino acid.So, we used this algorithm to know how can we classify best according to position.Thus, we utilized this algorithm in order to figure out how we can classify the best regarding to position.

Decision tree
There are mainly two types of decision trees according to usage.Classification tree which is used for classification tasks and regression tree, used for regression tasks.[4] Our usage of the decision tree is the former one, to classify a case by starting from the tree root and moving it till a leaf is encountered.The case's result for the test at individual non-leaf decision node is decided and focus moves to the subtree's root, matching to this outcome.After this process ultimately (and naturally) generates to a leaf, it is expected that the class of the case to be that noted at the leaf.[3] A path is made between the root of the decision tree and one of its leaves when the tree is used to classify a case.To reach a single leaf, it must accord with entire conditions on the path.We can know that because of this, decision tree is highly related to rule induction [5]

Decision tree results
According to the Table 1, we can see that those 3 viruses have a big similarity.Which means, it is hard to find out specificity at each virus.By this result, we can assume that all of these viruses are derived from one same virus and there are small proportion of mutein.According to Table 5, we have to notice rule extraction under 13 window showed their rules with amino acid at position 11.Glutamic Acid (E), Glysin(G), are Isoleucine(I) are all presented on 11 th position in 13 window, and this common position represents the property of bacillus virus.Moreover, second position is also playing an important role, presenting G on class 1, N on class 2, and G and C on class 3.In Table 6, we have to notice that under 17 window, rules were extracted at position 16.At Table 7, we can find out under 19 window, position 8 was the place where rules were extracted.Also, we can see that those position played important role differentiating each subtype from each other.
Following Tables 5, 6, 7 are rule extraction due to the position of amino acid.

Conclusion
Firstly, In Analysis of apriori, Three kinds of bacillus virus have Leucine the most (among.....).Leucine is a virus detected in endemic diseases such as malaria [6], dengue fever [7], typhoid [8], serious cough, dyspnea, papule and so on [9,10].Those diseases are infected by parasite and host animals, and cause muscle pain.Therefore, we can conclude that Leucine is a protein that plays a significant role in causing muscle pain.Secondly, In Analysis of decision tree, there are only little differences between each classes.The classes represent positions that include representative protein.The very position that windows mention is their difference.We are looking forward to conceive natural treatment(not the artificial one) to cure endemic disease.Using common property of bacillus virus that it has Leucine, which is contained in endemic disease, we are planning to analyze the similarity and develop the natural treatment.

Table 1 .
All experiment-all windows of three viruses

Table 2 .
Analysis of bacillus anthracis

Table 3 .
Analysis of bacillus cereus

Table 4 .
Analysis of bacillus thuringiensis