A Deep Learning Approach for DDoS Attack Detection Using Supervised Learning

This research presents a novel combined learning method for developing a novel DDoS model that is expandable and flexible property of deep learning. This method can advance the current practice and problems in DDoS detection. A combined method of deep learning with knowledge-graph classification is proposed for DDoS detection. Whereas deep learning algorithm is used to develop a classifier model, knowledge-graph system makes the model expandable and flexible. It is analytically verified with CICIDS2017 dataset of 53,127 entire occurrences, by using ten-fold cross validation. Experimental outcome indicates that 99.97% performance is registered after connection. Fascinatingly, significant knowledge ironic learning for DDoS detection varies as a basic behavior of DDoS detection and prevention methods. So, security professionals are suggested to mix DDoS detection in their internet and network.


Introduction
The increase of IoT devices and computation devices have completed living relaxed and suitable for us due to the debauched and correct computation of our information. But, augmented incorporation and placement of linked devices also disclosures vital capitals to DDoS threats [1]. Technological growths in current years have complete it likely to connect a variety of devices to computer networks, which brings various benefits to users. But, with the increase of the technologies elaborate, the number of cyberattacks is also increasing, using more sophisticated means to incorrectly access sensitive information and to extort money or the already mentioned interruption of services. One such technology is the Internet of Things (IoT) [2]. The idea of the Internet of Things includes various devices, sensors, objects, and intelligent nodes that are able to function autonomously and communicate with each other without human intervention. Such IoT devices are able to deliver a number of valuable facilities and, cheers to sensors and actuators, provide various data in real-time. In many cases, however, devices in the field of the Internet of Things, in particular, contain various software bugs brought in from the factory that make them vulnerable. Such vulnerabilities often allow attackers to perform various cyberattacks and compromise the security of the environment in which IoT devices are located [3]. Several defense mechanisms have been proposed in the past against DDoS attacks in IoT networks. They can be divided into two basic groups: traditional DDoS defenses and IoT-specific DDoS defenses. They fluctuate in terms of place and difficulty. While traditional DDoS defenses are applied to the target server and are fundamentally homogeneous, IoT-specific DDoS defenses are applied to IoT devices and are more complex, reflecting the heterogeneity of IoT devices. In both cases, detection techniques are used to detect abnormal activities in the network or host [4]. The rest of the paper is organized as follows: Section 2 presents related works; Section 3 elaborates the data set and describes the methodology followed in the research; Section 4 details the experimentation procedures, the result gotten and the observations from the results while Section MATEC Web of Conferences 348, 01012 (2021) https://doi.org/10.1051/matecconf/202134801012 INBES'2021 5 presents the conclusion of the work as well as highlighting the future work

Related works
According to [5] detection systems of network intrusion have traditionally been rule-based. Nevertheless, machine learning and statistical approaches have also made major contributions [5]. Machine learning have also proven to be effective in two main aspects of network security which are: feature engineering (i.e., the ability to extract the most important features from network data to assist model learning) [6] and classification. In security environment, classification tasks usually involve training both suspicious and benign data in order to create models that can detect known attacks [7]. The authors in [8] pointed out that steps such as collection of network information, feature extraction and analysis, and classification detection provide a means for building efficient software-based tools that can detect anomalies such as software-defined networking (SDN). Another study [9] provides a thorough classification of DDoS attacks in terms of detection technology. The study also emphasizes how the characteristics of the network security of an SDN defines the possible approaches to setting up a defense against DDoS attacks. Similarly, [10] have explored this area too. In other approaches to DDoS defense, [4] propose a scheduling based SDN controller architecture to effectively limit attacks and protect networks in DoS attacks. The growth of cloud computing and IOT has inevitably led to the migration of denial-of-service attacks on cloud computing devices as well. Thus, cloud computing devices must implement efficient DDOS detection systems in order to avoid loss of control and breach of security [11]. Studies such as [12] aim to tackle this problem by determining the source of a DDOS attack using Trace (powerful trace) source control methods. Trace controlled such attack sources from two aspects, packet filtering and malware tracing, to prevent the cloud from becoming a tool for DDoS attacks. Other studies such as [13] approach the problem of filtering by using a set of security services called filter trees. In the study, XML and HTTP based DDOS attacks are filtered out using five filters for detection and resolution. Detection based on classification has also been proposed and a classifier system for detection against DDOS TCP flooding attacks was created [14]. These classifiers work by taking in an incoming packet as input and then classifying the packet as either suspicious or otherwise. The nature of an IP network is often susceptible to changes such as the flow rate on the network and in order to deal with such changes, self-learning systems have been proposed that learn to detect and adapt to such changes in the network [15]. Many of the existing models for DDoS detection have primarily focused on SYN-flood attacks and haven't been trained to detect botnet attributes. More studies are thus needed where models are trained to detect botnet as botnet becomes the main technology for DDoS organization and execution [16]. Botnet DDoS attacks infect multitude of remote systems turning them to zombie nodes that are then used for distributed attacks. In detecting botnet DDoS attacks, authors in [17] used a deep learning algorithm to detect TCP, UDP and ICMP DDoS attacks. They also distinguished real traffic from DDoS attacks, and conducted in-depth training on the algorithm by using real cases generated by existing popular DDoS tools and DDoS attack modes. Also, [18] proposed a DDoS attack model and demonstrated that by modelling different allocation strategies. The proposed DDoS attack model is applied to game planning strategies and can simulate different botnet attack characteristics. According to [19] "DDoS detection approaches can operate in one of the following three modes: supervised, semi-supervised and unsupervised mode. For the detection approach in supervised mode, it requires a trained dataset (or a classifier) to detect the anomalies, where the trained dataset includes input variables and output classes. The trained dataset is used to get the hidden functions and predict the class of input variables (incoming traffic instances). This mode is similar to a predictive model. For example, Classification techniques comes under the category of supervised data mining" [20]. "For the Approaches that work in the semi-supervised mode, they have incomplete training data i.e., training data is only meant for normal class and some targets are missing for anomaly class" [21]. Unlike supervised and semi-supervised learning, unsupervised machine learning algorithms do not have any input-output pairs but the algorithm is trained such that it can accurately determine the unknown data point. The following subsections further discusses the unsupervised learning algorithms we used in this work. current effort that goes to detect IoT based attacks proposed MQTT transaction-based features Mustafa et al. (2019). But the authors used features based on the TCP protocol analysis, which do not provide sufficient information on the MQTT protocol parameters. In contrast, our proposed UDP features are based on unsupervised machine learning which can successfully detect and distinguish such attacks including the unknown attacks.

Restricted Boltzmann Machines (RBM)
The authors in [24] stated that Boltzmann machine (BM) is a bidirectionally connected network of stochastic processing units. BMs are commonly used to learn important features of an unknown probability distribution based on samples from the distribution. However, the training process of the BM is usually computationally intensive and tedious. The restricted Boltzmann machine attempts to solve the training problem of BMs by imposing key restrictions on the architecture of the BM.
The BM is a fully connected network of bidirectional nodes where each node is connected to every other node. The RBM on the other hand is presented as a relatively smaller network of bidirectional nodes with the restriction that nodes on the same layer are not connected to each other horizontally [24]. The restricted Boltzmann machine is a generative model that is used to sample and generate instances from a learned probability distribution. Given the training data, the goal of the RBM is to learn the probability distribution that best fits the training data. The RBM consists of m visible units V= (V_1, V_2…., V_ m) and n hidden units H= (H_1, H_2…., H_ n) arranged in two layers. The visible units lie on the first layer and represent the features in the training data (see Figure 2). Usually, one visible unit will represent one feature in an example in the training data. The hidden unit's model and represent the relationship between the features in the training data. The random variables (V, H) take on values (v, h) ∈ [0,1] ^m for continuous variables and the underlying probability distribution in the training data is given by the Gibbs distribution p (v, h) =〖1/z〗^ (-E (v, h)) with the energy function in equation 1; In equation 2, w_ij are real valued weights associated with v_ j and h_ i, and b_ j and c_ i are real valued bias terms associated with units i and j respectively. The contrastive divergence learning algorithm is one of the successful training algorithms used to approximate the log-likelihood energy gradient and perform gradient ascent to maximize the likelihood [24].
After a successful training, the RBM should be able to represent the underlying probability distribution of the training data and when presented with unseen examples, the RBM should be able to generate similar representations to the example provided.

K-Means
The K-means algorithm takes the full dataset consisting of multiclass data points, then clusters the datapoints into separate clusters to the best of its ability; this classification occurs when you feed in the input and the model assigns the input into one of the computed clusters. Given a set of observations (x_1, x_2, x_3, ..., x_ n), where each observation is a d-dimensional real vector, k-means clustering aims to partition the n observations into k (≤ n) sets S = {s_1, s_2, ..., s_ k} so as to minimize the within-cluster sum of squares (WCSS) (i.e., variance).

2.3Expectation-Maximization (EM)
The authors in [25] stated that EM algorithm is used for solving mixture models that assume the existence of some unobserved data. Mathematically, the EM algorithm can be described as follows; given the statistical model that generates a set of observed data X, latent data Z, unknown parameters θ and the likelihood function L (θ; X) =p (X, Z ┤| θ), the maximum likelihood of the unknown data θ is determined by maximizing the marginal log-likelihood of the observed data X using equation 3: In the expectation step, the likelihood of the unknown parameters is computed as the log-likelihood of the known parameter estimates, while in equation 4 the maximization step is used to select the new value that maximizes the log-likelihood given the estimates from equation 5. In our research, we differ from current work (e.g. [26]) in two ways. First, we work with unsupervised machine learning methods using both normal and suspicious network data to train. Second, we made use of dimension reduction methods such as K-means clustering with PCA, Expectation maximization, Restricted Boltzmann Machine and Autoencoder (where K-means and EM are both trained using normal and suspicious data; RBM and AE was trained on only suspicious data), all these methods were not only for feature engineering [22] but for classification as well.

METHODOLOGY
In this research, an expandable and flexible deep learning network intrusion detection system is presented. The system is planned by mixing machine learning model with knowledge graph. method and steps followed in this research are defined as follows.

An expandable and flexible Deep Learning
method. Literature on building a predictive model for distributed denial of service attack (DDoS) is rich, but the advanced DDoS does not cover the expandability and flexible behavior of the intrusion detection model. Scalability is be-coming increasingly required for today's network intrusion detection [17]. This is because of the rapid growth of the large volumes of modern network traffic that requests urgent monitoring with a repeatedly altering attack activity. In the interim, the novel method regulates and familiarizes itself with the newly updated network connections. Therefore, the deep learning robotically learns the novel difficulty while there is alteration in network connection behaviors. The execution for the proposed method is directed with the help of Python programming language and WEKA 3.9 machine learning tool, and WEKA library functions are used for feature selection and classifier building methods.
The future method for expandable and flexible network intrusion detection is offered in Figure 1. It contains of two major modules. So, in this section, we tried to discuss the details of the proposed approaches The BASHLITE dataset consists of 110,000 SYN-flood instances and 100,000 UDP-lag attacks. Both Mirai and BASHLITE are open-source malware that can be used for academic research purposes. Figure 1 indicates the architecture of novel network intrusion detection model representing its main modules and subsystems. As described in Figure 1, the novel method the establish of two main subsystems: the supervised Vector Space Model (VSM)), connector, and the knowledge-graph system (KGS). In fact, the learning subsystem is a cooperative outcome of database, pattern extraction, and update detection modules. The learning subsystem is mainly responsible for learning from the dataset incrementally and adaptively using ma-chine learning algorithm. On the other hand, the knowl edge-graph system signifies the deep learning outcome to detect the type of incoming network traffic, and it robotically updates the novel network connection as an attribute in original training dataset. To implement the above proposed method (see Figure 1), we design the algorithm showed in Algorithm 1 that incor porates deep learning and knowledge graph for detecting network intrusion. For the experiment, CICIDS2017 dataset is downloaded from "KDD Cup 1999 Data," http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html (accessed on March 12, 2018).

Preprocessing Section.
As stated by Aggarwal and Sharma [30], CICIDS2017 benchmark intrusion detection dataset is a refined version of CICIDS2017 in which there are 16,000,000 instances in the 10% training dataset. In CICIDS2017 intrusion dataset, four classes of attacks are incorporated, such as remote-to-user (R2L), user-to-root (U2R), denial of service (DoS), and probe in which 22 different attacks are included specifically. In CICIDS2017 dataset, 42 total attributes are identified and incorporated. For the dataset to be suitable for experimentation using machine learning algorithms, the data need to undergo data preprocessing step, where data cleaning, data size balancing, data size reduction, and dimensionality reduction (feature reduction) are performed.
Moreover, sampling and feature selection techniques are applied on the CICIDS 2017 intrusion dataset to produce manageable CICIDS 2017 dataset appropriate for the experiment. Finally, based on the aforementioned activities such as sampling methods, a total of 52,127 instances are prepared for the experiment Loss: The mean squared error loss function is used. Optimizer: The Adaptive (ADAM) algorithm is selected.

Attribute Selection.
In building high performance intrusion detection systems, one of the significant research difficulties is effective instances selection from intrusion detection datasets. Accuracy of intrusion detection model has been greatly affected by the presence of irrelevant and redundant attributes in the intrusion detection dataset. As described by Lee et al. [8], 41 attributes were constructed for each network connection on NSL-KDD intrusion detection dataset. To filter best attributes used in constructing DDoS attack detection model to identify abnormal network connections from a given dataset, attribute selection methods have been applied features present in the training data. That is, for instance for the CICDDoS2019, the number of hidden and visible units for the RBM is 77.
According to Neethu [16], constructing classification model is one of the main challenges for intrusion detection system, which is to construct effective models to identify normal behaviors from abnormal behaviors of network connection by observing collected audit data. In addition, one of the main challenges in intrusion detection systems is learning from static intrusion data to construct a classifier.
Thus, if for instance the autoencoder is trained on a dataset comprising only of benign packets, whenever a benign packet is presented to the autoencoder, we expect that the constructed output should be quite similar and therefore the reconstruction error should be low. However, if this same model is presented with a suspicious packet that is fairly different from the features of benign packet, then we should expect the reconstruction error to be quite high. The same logic can be applied to the restricted Boltzmann machine With this formulation established, it is easier to frame the classification problem using the autoencoder and RBM. Where in our example, a low reconstruction error means the packet is benign, while a high reconstruction error means the packet is suspicious. Using these predictions, The NMI provides a means of evaluating the clustering performance of the algorithm by comparing the correlation between the predicted class and the target class. If the predicted and target data are represented as two separate distributions, then we can also apply the NMI to determine performance of non-clustering algorithms.
In this session, we present the experimental results for each model across all datasets. The results are presented in subsections, with each subsection dedicated to a model. For the Autoencoder and Restricted Boltzmann Machine, their subsections consist of plots showing the training and test loss, a table summarizing the performance across the datasets and a detailed discussion of the results. For the rest of the models, they do not optimize a loss function and so only the summary tables and a detailed discussion of the results were presented. Performance evaluations are also carried out using the accuracy and Normalized Mutual score. The innovation of this work lies in the exact detection of the anomality behaviour of the nodes. DDoS attacker tried to affect network in its different forms. The basic nature of DDoS attacker is to flood the network with a large number of packets and then exhaust the network. the optimization techniques with unsupervised machine learning to achieve a detection accuracy of 99.93%. This paper focusses on detecting DDoS attack in IoT networks by classifying incoming network packets on the transport layer as either "Suspicious" or "Benign" using unsupervised machine learning algorithms. In this work, two deep learning algorithms and two clustering algorithms were independently trained for mitigating DDoS attacks. We lay emphasis on exploitation based DDOS attacks which include TCP SYN-Flood attacks and UDP-Lag attacks. We use Mirai, BASHLITE and CICDDoS2019 dataset in training the algorithms during the experimentation phase. Figure 4 shows the desired behavior of the backpropagation training algorithm where the training and validation loss decrease steadily and in unison as the training epoch increases. It is important to point out that the autoencoder is trained to reconstruct SYN-Flood data, meaning it should be unable to reconstruct benign data. We chose the SYN-Flood data for training because there were more instances than the benign data. The same choice is made for the UDP-Autoencoder model, where we train it on the UDP-Lag data instead of on benign UDP data.
Our experiments show that the random forest (RF) gives better accuracy for normal, DoS, probe, and R2L classes compared to SMO and Bayes Net and it gives the worst accuracy for detecting U2R class of attacks. For U2R class, both SMO and Bayes Net methods give the same perfor mance. There is only a small difference in the accuracy for ALGORITHM 1: Expandable and flexible deep learning method for DDoS attack detection.
Currently, various deep learning algorithms have become very public and concerned more and more benefits in current ages for classifying network connections into normal and abnormal [16]. Some of the popular machine learning algorithms used for classifying a given intrusion audit data include decision tree, support vector machine, neural net-work, genetic algorithm, Naïve Bayesian, and Fuzzy logic. Since the attackers and behaviour of network attacks are be-coming complicated and continuously changing their way of attacking and patterns, it is very difficult to detect several new attacks that come through the network. Therefore, Neethu [16] Tables 2 and 3, all the classifiers con sidered so far could not perform well for detecting all the attacks. To take advantage of the performance of the three classifiers, a random forest (RF) is selected for next inte gration with knowledge base to come up with a scalable and adaptive learning approach for intrusion detection

Discussion of Result
This study investigates that approach Stacked Auto Encoder (SAE) and Con-volitional Neural Network (CNN). for DDoS detection is probable over mixture of Stacked Auto Encoder (SAE) and Con-volitional Neural Network (CNN). As far as my knowledge, this study is the first study which gives practical demo on the likelihoods of hybrid approach for refining DDoS detection. the average accuracy of the three algorithms across the datasets is shown in Figure 2. Firstly, as presented in Figure 2, SMO, Bayes Net, and random forest classifiers have the best average accuracy, i.e., 99.35%, 98.68%, and 99.71%, respectively, when using supervised learning.  The additional problem we confronted in this effort is inaccessibility of prompt data to test the method. So, the method is verified on offline data openly obtainable online. Mostly, the study empirically proofs the option of joining deep learning-and knowledge-graph system for the sake of developing expandable and flexible deep learning method for DDoS attack detection at the same time. We observed that deep learning-and knowledge-graph systems are es-sential to each other. So, our experiment result shows that after integration of machine learning and knowledge base, 99.89% classification accuracy is achieved on the pre-processed NSL-KDD intrusion dataset.

CONCLUSIONS
This paper presents a novel approach for DDoS attack Detection based on hybrid modular of Stacked Auto Encoder (SAE) and Con-volitional Neural Network (CNN). A hybrid approach of Stacked Auto Encoder with Con-volitional Neural Network (CNN) is proposed for DDoS detection. It is analytically verified with CIC-DDoS2019 dataset of 41,749 entire occurrences, by using ten-fold cross validation. Experimental outcome displays that 99.97% performance is recorded after connection. Once proper formulations are established, the accuracy score can then be used to evaluate both models fairly. Although the autoencoder model is clearly the superior model, the DDOS-Detection class we developed provides methods that allow one to perform network packet classification using either the autoencoder model or the Expectation-Maximization model. The simulation results show that the DDOS-Detection tool built around these models can achieve a net accuracy of as high as 99.71%. Future studies should aim to replicate results in a larger system to detect compromised end-points and also ensure that algorithms are current by possible retraining approaches to handle abnormalities in network performance.

Data Availability
The dataset used in this work is publicly available as a benchmark for research purposes, https://www.unb.ca/cic/ datasets/nsl.html. So, the preprocessed data obtained to support the findings of this work are available from the authors upon request. All the supporting open-source codes for integration activities are available to the research community under an open-source license for the researcher