Application of Artificial Neural Network ( ANN ) : Development of Central-based ANN ( CebaANN )

Nowaday, the number of known protein structures is significantly less than the number of known amino acid sequences. It is because the regularity of amino acid depend on structure is not clear and the number of thermodynamic conditions are too many. There are some cases that discovering protein structure by experiment. However, It needs much time and cost for increasing the number of amino acid sequences, thus, there is less efficiency. So the empirical method which predict theoretically the structure of protein has been developed. We suggest Central-Based Artificial Neural Network as prediction method of protein structure. CebaANN can analyze similarity more detail by making part of center that affect outcome bigger. In experiment we got 85% of prediction probability at E structure, but we got 34% of probability at total.


Introduction
Nowaday, the number of known protein structures is significantly less than the number of known amino acid sequences.It is because the regularity of amino acid depend on structure is not clear and the number of thermodynamic conditions are too many.There are some case that discovering protein structure by experiment.However, It needs much time and cost for increasing the number of amino acid sequences, thus, there is less efficiency.
Empirical methods which theoretically predict the structure of protein has been proposed to overcome this limitation.First is ab initio method based on the thermodynamic hypothesis of Anfinsen.Second is method that using protein which is already discovered as template.First method is possible in a brief time, however, accuracy of the structure is unable to give us a satisfactory result.Therefore, second method which means two protein which have similar structure is usually used.
Because it is unable to compare existing protein sequences and structure, one by one, algorithm such as artificial neural network and Support Vector Machine is usually used at analyzing link between sequence and structure and deriving similarity by comparing result of general structure and result of input structure.
Artificial Neural Network is algorithm which was invented from neural network and consisting of input, hidden, output nodes.In this paper, we developed cebaANN which is application of ANN, and propose it is algorithm for protein structure prediction.CebaANN has existing structure [input -hidden -output].However, it can analyze similarity more detail by making part of center that affect outcome bigger.In experiment we got 85% of prediction probability at E structure, but we got 34% of probability at total.

Artificial neural network
An artificial neural network is an interconnected group of nodes, akin to the vast network of neurons in a brain.Here, each circular node represents an artificial neuron and an arrow represents a connection from the output of one neuron to the input of another.G and H structure is numbering by P(a), In G and H structure is numbering by P(a), machine learning and cognitive science, artificial neural networks (ANNs) are a family of statistical learning models inspired by G and H structure is numbering by P(a), biological neural networks (the central nervous systems of animals, in particular the brain) and are used to estimate or approximate functions that can depend on a large number of input and are generally unknown.Artificial neural networks are generally presented as systems of interconnected "neurons" which exchange messages between each other.The connections have numeric weights that can be tuned based on experience, making neural nets adaptive to inputs and capable of learning.
For example, a neural network for handwriting recognition is defined by a set of input neurons which may be activated by the pixels of an input image.After being weighted and transformed by a function ICCAE 2016 (determined by the network's designer), the activations of these neurons are then passed on to other neurons.This process is repeated until finally, an output neuron is activated.This determines which character was read.Like other machine learning methods -systems that learn from data -neural networks have been used to solve a wide variety of tasks that are hard to solve using ordinary rule-based programming, including computer vision and speech recognition.

DSSP classification
The Protein structure is the biomolecular structure of a protein molecule.Proteins are polymers -specifically polypeptides -formed from sequences of amino acids.Each unit of a protein is called an amino acid residue because it is the residue of every amino acid that forms the protein by losing a water molecule.By convention, a chain under 40 residues is often identified as a peptide, rather than a protein.[1]To be able to perform their biological function, proteins fold into one or more specific spatial conformations, driven by a number of non-covalent interactions such as hydrogen bonding,ionic interactions, Van der Waals forces, andhydrophobic packing.To understand the functions of proteins at a molecular level, it is often necessary to determine their three-dimensional structure.
The Dictionary of Protein Secondary Structure, in short DSSP, is commonly used to describe the protein secondary structure with single letter codes.The secondary structure is assigned based on hydrogen bonding patterns as those initially proposed by Pauling et al. in 1951 (before any protein structure had ever been experimentally determined).There are eight types of secondary structure that DSSP defines: -G = 3-turn helix (310 helix).Min length 3 residues -H = 4-turn helix ( helix).Min length 4 residues -I = 5-turn helix ( helix).Min length 5 residues -T = hydrogen bonded turn (3, 4 or 5 turn) -E = extended strand in parallel and/or anti-parallelsheet conformation.Min length 2 residues -B = residue in isolated -bridge (single pair -sheet hydrogen bond formation) -S = bend (the only non-hydrogen-bond based assignment) -C = coil (residues which are not in any of the above conformations) In this paper, we only use 6 structures (G, H, I, T, E, B)

Chou-fasman parameters
The Chou-Fasman method of secondary structure prediction depends on assigning a set of prediction values to a residue and then applying a simple algorithm to those numbers.We were in the process of converting a short for each amino acid to a number, using this table.CebaANN is the application of artificial neural network that the structure is changed for making input value's affect more powerful as close to center input node.Unlike typical neural network that each input node is connected each hidden node, CebaANN has network that center input node is connected all of hidden nodes and as going outside, the number of hidden nodes which is connected input node decreases.For the structure, network must be constituted with the specific condition: -output node should be one, and (n = the number of input nodes, m = the number of hidden nodes) m should be [(n+1) / -threshold is (average of input node values / m) * sum of weights between input and hidden * sum of weights between hidden and output.
-threshold should be constant.so while in the progress, the sum of weight between input and hidden and between hidden and output should be constant.
-CebaANN is teacher learning method (if the target really have the result, base value is 1, and if not, base value is 0).So it have two case that apply the algorithm.Case 1 is output>threshold and base value is 0, Case 2 is output<threshold and base value is 1.Balancing method : If Case 1, follow 3.1.1and 3.1.2and make new output that applying changed weights.Repeat these three steps until output < threshold.
If Case 2, increase weights on the route of maximum input value (opposite of 3.1.1)and decrease weights on the route of minimum input value (opposite of 3.1.2)and make new output that applying changed weights.Repeat these three steps until output > threshold.
During the progress, if the number of Maximum or Minimum aren't one, decide the maximum for using as maximum which is close to center.if minimum is in same situation, decide the minimum like same, too.Because as go center, it's affect is more big.

Decreasing weights input value
Decreasing input-hidden weight and hidden-output weight of the route whose input-hidden weight * hiddenoutput weight is minimum among the routes' that maximum input value passes.And increasing other inputhidden weights which is connected same hidden node as much as 1/(2*[m/2]) of change amount.As result, the sum of input-hidden weights is constant, and output is decreased.

Increasing weights on the route of minimum value
Increasing input-hidden weight and hidden-output weight of the route whose input-hidden weight * hidden-output weight is maximum among the routes' that minimum input value passes as much as change amount of a).And decreasing other input-hidden weights which is connected same hidden node as much as 1/(2*[m/2]) of change amount.As result, the sum of input-hidden weights and the sum of hidden-output weights are constant, and output is decreased.

Experiment (Using cebaANN)
We did experiment that getting weights which can predict protein structure when protein sequence pass the CebaANN.

Making material of experiment
Target protein sequence dataset : RS126.We use 2/3 of dataset as making special weights, and use others as measuring correct percent of prediction.

Separating target protein sequences
Seperating target protein sequence as amino acid sequence whose length is the number of input node (In this experiment, nine) from first amino acid.and increase the position of start amino acid as one.Now, we say this as input sequence.

Numbering each elements
Each element of input sequnces is numbering by choufasman parameters.G and H structure is numbering by P(a), and E and B structure is numbering by P(b), and T and S structure is numbering by P(turn).It's because G and H are helixs, and E and B are E sheet or bridge, and T and S are turn or bend.And then, we can get three numbered sequence at one input sequence.Now, we say these three sequences as input group.

Matching input group with structure.
Matching input group with input sequence's disclosed structure.only one structure among six or 'none' is matching with input group.We say this matching as material.After, at the progress of running algorithm, elements' of input groups become input, and matched structure become base value.

Hidden-Output weights
special weights on the route between Hidden and Output.

Correct percent of prediction
Correct percent : (the number of sequence which predicted structure equals the target structure) / (the number of sequence which has the target structure)

Conclusion
Nowaday, the prediction method of protein structure is developed so much.However the necessity of exact prediction method of protein is still exist.There are many amino acid sequence which doesn't discovered its structure.And because protein structure is clue to find protein's function, prediction method can help many people as diverse way.CebaANN's prediction is only act at E structure, but it's percent is high.It can prove that CebaANN has possibility to be accurate prediction method.We need to find why it only act at E structure and why it doesn't work at other structure.So we would expand experiment ahead.The limitation was it only do work at E structure.And We will make improvements at that part.This experiment was meaningful that we do new challenge.

Central-based artificial neural network (CebaANN)
2] Classifying materials into six groups of structure.And then, run CebaANN at each six group.During this progress, if material has targeted structure, base value become 1, and if not, base value become 0.After running is finished, special weights of each six group were made.First element of input group is runned at G, H group, second element of input group is runned at T, S group, Third element of input group is runned at E, B group.If all of output is smaller than threshold, it has no structure.But if some of output is bigger, structure is decided by list of the highest-to-lowest parameter value of middle amino acid among structure whose output is bigger.If the number of highest parameter value isn't one, it would be decided as structure which has bigger output value.total.

Table 4 .
percent of prediction