Research on Intrusion Detection Based on User Portrait Technology

Under the big data scene, the intrusion detection technology has some problems, such as weak data processing ability, difficulty of precise location and so on. The intrusion detection technology based on user portrait technology, on the basis of fully mining the practical value of big data technology, refines the detection granularity, improves the intelligent and accurate characteristics of intrusion detection. The experimental results show that the intrusion detection model based on user portrait can detect abnormal users quickly and accurately, and improve the effect of intrusion detection, and has good practical value.


An overview of intrusion detection technology based on user profile technology
Intrusion Detection System (IDS) is one of the important security measures in network security. It can provide real-time dynamic security protection for the system and take emergency response measures when threats are found. By establishing a user tagging system, the user's actions are extracted into short-text tags, the weights of each tag are set, and a series of tags are used to form a virtual user profile, and the anomaly indices among different users are calculated, complete the detection of abnormal users.

Overview of user profiling techniques
This paper introduces user portrait technology into security field, collects user operation data by itself, and extracts user behavior habit by user portrait. User portrait technology is based on the mathematical modeling of real-world users based on user data: In general, to build a user portrait requires data, based on users' social attributes, living habits and consumption behavior, etc. , the collection and accumulation of user data is the basis of user portrait; to have a clear business application scenario, user portrait and business application are inseparable, it is often necessary to profile and analyze specific users who meet the business requirements, and to have relevant user modeling algorithms to mine deep level information from existing user data that can touch user requirements, a tabbed user model that abstracts different data. The core work of user portrait technology is to label the user, and the label is a highly refined feature mark through the analysis of the user information, so as to classify and extract the user based on the label.

Overview of intrusion detection systems
Intrusion Detection technology can be divided into two categories: Abuse Detection (Misuse Detection) abuse Detection is to use the known intrusion methods and weaknesses of the system to identify illegal intrusion. The main disadvantage of this method is that all known intrusion patterns are embedded in the system, so any unknown intrusion can not be detected. But the detection efficiency of this method is high. Anomaly Detection is to detect whether the current user behavior deviates from the established normal behavior profile to identify whether there is illegal intrusion or ultra vires operation. The advantage of this method is that it does not need to know the system defect and has strong adaptability. But the possibility of false alarm is high. The user behavior in intrusion detection system is mainly in the form of data. According to the source of data, intrusion detection system can be divided into host-based and network-based two. The former comes from the operating system's audit data, and the latter from the packets that flow through the network. Since the behavior of users is data, the core of the problem is how to deal with the collected data correctly and efficiently, and draw conclusions from it.

Construction of intrusion detection system based on user profile technology
According to the rules of the rule base, intrusion detection detects the behavior of the current user, and adopts different countermeasures according to the results. In order to solve the problem that the user behavior will change periodically over time, the concept will drift and the false alarm rate will rise. In order to avoid the influence of historical transaction data to a certain extent, a sequential user behavior profile is proposed, and the diversity of future transactions is expanded, finally, the experiment is compared with other algorithms.

Detection rate: TP/(TP + FN)
It refers to the proportion of attack samples which are recognized as attack correctly, which is an important index reflecting the ability of IDS attack recognition

True negative rate: TN/(FP + TN)
The percentage of normal samples in the test set that are correctly identified as normal is an indicator of the accuracy of IDS in identifying normal samples

Accuracy: TP/(TP + FP)
Refers to all IDS identified as attack samples in the test set, the true ratio of the attack

F-score:
A Comprehensive Evaluation Index of IDS detection rate and accuracy rate;

Classification accuracy: (TN + TP)/(TN + TP + FN + FP)
The ratio of the number of correctly classified samples in the test set to the total number of samples in the test set, which is an overall evaluation index reflecting the ability of IDS to distinguish between normal samples and attack samples, to a certain extent, can reflect the overall identification ability of IDS;

Omission rate: FN/(TP + FN)
The ratio of the number of attack samples misidentified as normal in the test set to the total number of attack samples in the test set is an index which can reflect the ability of IDS to identify attacks

False alarm rate: FP/(FP + TN), also known as false alarm rate
The ratio of the number of normal samples in the test set to the total number of normal samples in the test set is an indicator of IDS' ability to recognize normal samples.

Content of intrusion detection system based on user profile technology
Network-based IDS can not provide the service of intrusion detection separately. At first, IDS was chosen because of its low cost and quick response. The policy configuration of network-based IDS has several key points for multiple systems and can observe the network communication of multiple systems, so many hosts do not require installation when managing and loading software. The configuration environment with fewer monitoring points is the network environment with lower cost. Network-based IDS will detect all the packets passing through, and will find some malicious programs and software, also can not detect the corresponding type of attack. For the network-based IDS can detect some effective load content, can detect some specific attack tactics and instructions.
By checking the packet to see if it has payload capability, and can view the malware, it can not be detected by an attacker, but host-based IDS can not detect its behavior, unable to identify the appropriate attack message. The attacker will not easily transfer the evidence, in the process of real-time monitoring of the attack, the network-based IDS will use the network to communicate. There are many ways to capture evidence, including methods of attack, which can identify the identity and information of a hacker, as well as record basic information and other bad behavior, do not give any opportunity to cover up the traces of crime, to prevent the use of this information to host-based system for intrusion detection.
Network-based IDS can detect suspicious malicious operations at any time and any place, and can make fast notification and response. TCP is one of the network protocols, a TCP based denial of service attack on the network, can send an emergency signal through IDS, make TCP Reset. For host-based systems, real-time monitoring is incomplete, and only suspicious information can be monitored and recorded during operation, so that its attack capability can be identified and corresponding response can be made, the system may have been compromised, or the host-based IDS may have been destroyed and a lot of information may have been leaked. Network-based real-time monitoring, can be real-time notification, according to the specific circumstances of the notification to make rapid response to prevent information leakage, resulting in losses. Network-based IDS data adds so much valuable data that it's hard to tell which ones are bad intentions, even if the network's firewall can resist these attempts, the network-based IDS outside the firewall can also detect the intention of the attack behind the firewall. The detection of the above data plays an important role in evaluating the security performance of the system.

Application steps of user behavior profile in intrusion detection
Host-based intrusion detection system (ids) is not so fast compared with network intrusion, but it does have some advantages that the local area network (Lan) system can not match, better identification analysis, close attention to particular events, and low cost. The first step is to determine whether the attack is successful or not. The host-based IDS uses information that contains events that have occurred, and it can more accurately determine whether the attack is successful or not. The host-based IDS is a perfect supplement to the network IDS. The network part can send warning message as early as possible, and the host part can decide whether the attack is successful or not.
Monitor specific system activities, monitor user and file access activities based on the host IDS, access to files, changes to file permissions, creation of new executable files, access to licensed services, and so on. Host detection can detect the attack that network system can not detect. Host-based systems can detect attacks that can not be detected by network-based products. Suitable for encrypted and switched environments, host-based systems can be installed on each host, more suitable for switched and encrypted environments than network-based detection systems. Switching devices can manage many large networks by dividing them into many small networks to cover a sufficiently large network range and perspective.

Method verification and analysis of intrusion detection system based on user profile technology
A user portrait is a virtual representation of a person, based on a set of data models. Based on the mobile phone perception data, the user portrait was constructed from age, sex and personality. It uses sensors and event listeners in the phone to collect data from the slide-screen unlock scenario, basic information about the phone, application usage, and screen-state scenarios. The quality of training data in data mining directly affects the accuracy of extracted user features and derived rules. If the intrusion detection system includes the behavior of the intruder in the data used to build the model, the detection system will not be able to respond to the intrusion, resulting in under-reporting. The data used for training must not contain any intrusions and must be formatted in a form that the data mining algorithm can process.
The system's data source is network-based, using sniffers to capture the user's packets, and then using protocol analysis to discard the payload and keep only the Baotou portion, the data pre-processed by a specific method consists of 7 fields: time, source IP, source port, Destination IP, destination port, connection ID, connection status. Because TCP connection building involves three handshakes, all the training data collected will include some unsuccessful connections that will negatively impact the subsequent data mining process, therefore, only those data that reflect the normal state of the network should be retained. This is not the case with UDP, where each UDP packet is treated as a single connection.
Using data mining technology in intrusion detection system, we can extract user's behavior characteristics and summarize the rules of intrusion behavior by analyzing historical data, so we can set up a complete rule base for intrusion detection. Data collection network-based detection system data from the network, available tools such as TCPDUMP. By writing a Python program to collect user behavior data to form a user profile, the detection model is established. The model extracts the features of the user behavior, uses the machine learning method to learn the normal user behavior, and uses the mahalanobis distance and isolated forest algorithm to determine whether the tested behavior is abnormal or not.

Conclusion
The existing user search portrait technology has the problem of large feature dimension and sparse matrix. User portrait technology for massive user behavior data still needs more complex and mature algorithm research to promote. User portrait technology is a combination of multi-disciplines, which needs knowledge fusion in knowledge mapping, natural language processing, machine learning and data mining. Using data mining technology in intrusion detection system, we can extract user's behavior characteristics and summarize the rules of intrusion behavior by analyzing historical data, therefore, a more complete rule base is established for intrusion detection.