IDENTIFYING THREATS IN COMPUTER NETWORK BASED ON MULTILAYER NEURAL NETWORK

Dep. «Electronic Computing Machines», Dnipropetrovsk National University of Railway Transport named after Academician V. Lazaryan, Lazaryan St., 2, Dnipro, Ukraine, 49010, tel. +38 (056) 373 15 89, e-mail ivzhuk@ua.fm, ORCID 0000-0002-3491-5976 Dep. «Electronic Computing Machines», Dnipropetrovsk National University of Railway Transport named after Academician V. Lazaryan, Lazaryan St., 2, Dnipro, Ukraine, 49010, tel. +38 (056) 373 15 89, e-mail viknikpakh@gmail.com, ORCID 0000-0002-0022-099X


Introduction
There have recently been increasingly frequent computer penetration reports and attacks on Web-server.Very often, intruders bypass established protective devices.Attacks are carried out in a very short time and the variety of threats is constantly increasing, which prevents from detecting and preventing them with standard protective equipment [2][3].Existing approaches are characterized by a number of features that hinder their use: low speed of work; poor accuracy [1,13,15].To eliminate these shortcomings, a neural network technology is proposed [1][2][3][4][6][7][8][9][10][11][12]: multilayer perceptron; Kohonen network; neuronetty network (hybrid system).
Attacks are divided into the following categories [2,[14][15]: DoS (Back, Land, Neptune, Pod, Smurf, Teardrop), U2R (Buffer_overflow, Loadmodule, Perl, Rootkit), R2L (Ftp_write, Quess_passwd, Imap, Multihop, Phf, Spy, Warezclient, Warezmaster), Probe (Ipsweep, Hmap, Portsweep, Satan).DoS attack is characterized by the generation of a large amount of traffic, which leads to overload and lockup of the server; U2R attack (User to Root) involves receiving by a registered user of administrator's privileges; R2L attacks (Remote to Local) are characterized by access of an unregistered user to a computer from a remote machine; Probe attacks include scanning of ports in order to obtain confidential information.
At the present stage, various solutions are offered for the modernization of existing computer networks, in particular, in the information and telecommunication system of the Prydniprovska railway [5,16].

Purpose
To develop a method for detecting threats in a computer network based on network traffic parameters using a multi-layer neural network in the Fann Explorer program.

Methodology
To detect threats we used 19 network traffic parameters ( i x ) [14]: package type (TCP, UDP, ICMP and others); service type (http, telnet, ftp_data, eco_i, private); flag; number of bytes from source to recipient; number of bytes from recipient to source; number of hot indicators; successful entrance; number of compromised conditions; number of connections to the host for 2 s; number of connections to one service for 2 s; percentage of connections to the service for 2 s; percentage of connection to different hosts; number of connections to the local host installed by the remote side for 2 s; number of connections to the local host installed by the remote side, uses 1 service; percentage of connections to this service; percentage of connections to other services; percentage of connections to the host with the port number of the source; percentage of connections with a rej-type error for the recipient host; percentage of connections with a rej-type error for the service.On the basis of the values of the set of the attack features, it is necessary to carry out the classification conclusion ( 1 y , 2 y , 3 y , 4 y , 5 y ) of the following threats: Back, Buffer_overflow, Quess_password, Ipsweep, Neptune.Detection of threats in a computer network is based on the analysis and processing of data on the parameters of network connections using the TCP/IP protocol stack.As initial data we used KDDCUP-99 database of 5,000,000 connection records (the sequence of TCP packets for the final period, the start and end points of which are clearly defined, during which the data is transmitted from the sender's IP address to the recipient's IP address using the defined protocol).As a mathematical tool of problem solution we took the neural network (NN) of the configuration 19-1-25-5, where 19 is the number of neurons in the input layer, 1 is the number of hidden layers, 25 is the number of neurons in the hidden layer, and 5 is the number of neurons in the resulting layer.

Findings
Purpose and features of the Fann Explorer program.Fann Explorer is portable graphical environment for developing, training and testing neural networks that supports animation training, creation, provides an easy to use browser-based interface for fast artificial neural network (Fann library).FannKernel provides a neural network with the kernel, which is a multi-threaded kernel, so several neural networks can be studied and explored at the same time.This program has a fairly wide-ranging functionality for training and researching neural networks.The View menu allows you to open three main windows: Controller (main panel, which defines all the parameters for teaching the neural network), Error plot (dependence graph of the mean square error on the number of passed training epochs), Weight Graph.On the Topology tab, you can get acquainted with the main characteristics of the topology (Fig. 1).
The Testing menu allows you to check how well the neural network is being trained and the ability to edit incoming data.The test results are presented as a comparison graph of the mean square error with the reference value shown in Fig. 2.
Error Function: Linear is a linear error function that calculates it as the difference between the actual result given by the neural network and the expected value that the operator has set; Tangential (Tanh) is a function of error, which makes a large deviation during training.The idea of the function is that it is better if 10 neurons have an error of 10% at the output, than one of them will have an error of 100%.This function is the default error,      When increasing the number of neurons in the hidden layer, the NN is trained faster, but in some cases, when the optimal amount starts to exceed (> 45), the training rate falls.In addition, model No. 2 is trained much faster than model No. 1.The con-structed graphs of dependence of the number of epochs on the number of neurons in the hidden layer under different training algorithms are presented in Fig. 8.

Number of hidden layer
Model No.The graphs of the dependence of the number of training epochs on the number of hidden layers based on different training algorithms are plotted and shown in Fig. 9.
From the figure it can be seen that when increasing the number of hidden layers (> 3) in NN, training accelerates only to a certain point, until the layers become too many, then the network begins to slow down.In addition, the model No. 4 is trained almost twice as fast as the model No. 3, but has a bigger error, which results from an increased training rate.

Originality and practical value
The originality lies in the fact that there are found dependencies of the training time (number of epochs) of the multilayer neural network on the number of hidden layers and hidden neurons according to different training algorithms.The practical value is that the network traffic parameters, using the 19-1-25-5 configuration neural network, will allow in real-time to detect the threats of Back, Buffer_overflow, Quess_password, Ipsweep, Neptune on the computer network and carry out appropriate control.

Fig. 3 .
Fig. 3. Fragment of the processed training sample Creating neural network in Fann Explorer.Fann Explorer (Fann Artificial Neural Networks) from Macromedia Inc is a portable environment for the development, training and testing of neural networks; supports animation of the training process; implements multilayer artificial neural networks; has a multi-threaded kernel that provides computing.The Neural Network was created with the participation of the student Mamenko D. V., setting of NN architecture is shown in Fig. 4

Fig. 4 .
Fig. 4. Setting of NN architecture Training of neural network.The purpose of NN training is to match such values of its parameters, in which the error of training is minimal.In the Fann Explorer program, the network training options are set using the Train tab (Training algorithm).Incremental algorithm is a standard reverse error propagation algorithm where weighting factors are updated after each training period.This means that weights are updated many times during one training.Resilent algorithm is a batch training algorithm that provides good results for many tasks, the algorithm is adaptive and does not use Training rate for training.Quick algorithm uses the Training rate parameter and gives good results when solving problems.Batch method is a standard reverse propagation algorithm where the weights are updated after calculating the mean square error for the entire training sample.Since the average square of the error is calculated more correctly than in the sequential

Fig. 2 .
Fig. 2. Testing menu but it can lead to poor training outcomes if you set the Training rate too high.The functions of Hidden layer activation and Output layer activation are as follows: Symmetric, Asymmetric-Linear, Sigmoid, Stepwize, Threshold.The Training menu manages the training process of NN, sets the adaptation parameters of the NN link weight factors such as: the number of training periods, the value of the mean square error, the initial values of the NN link weight factors using the Initialize button.Testing of neural network.Figure5shows a graph of NN testing of 19-1-25-5 configuration after training and testing on the appropriate samples.The bright line on the graph shows the expected response, while the darker one is the actual response.The test error is 0.05868; the lines almost coincide, so NN is well suited to the task.

Fig. 5 .
Fig. 5. NN testing schedule Analysis of the results.The NN of 19-1-25-5 configuration was trained according to the standard algorithm for error propagation and the training

Fig. 6 .
Fig. 6.Error after NN trainingBut when testing on a control sample, which consisted of 25 examples for each of the five threats, NM determined an error of 0.322.Thus, the NN copes well with Back, Buffer_overflow and Ipsweep attacks, but it does not recognize the Quess_password and Neptune attacks.From the test schedule on the control sample, it can be seen that the first neuron responsible for detecting the Back attack detected four of the five threats (the darker line is an expected solution, and the lighter is the actual one), Fig.7 (a).Also, NN recognized all five threats of Buff-er_overflow type, fig.7 (b).The lines on the chart are almost the same, but also one of the threats of Back type is incorrectly assigned by NN to the Buffer_overflow class.The obtained results of the experimental study are presented in Table1.Study of network training time versus the number of hidden neurons.The study was carried out on NN with different number of neurons in a hidden layer: from 10 to 55. Experiments were carried out on the following models: Model No. 1 (Initialize Resilent Sigmoid Stepwise Algorithm), Model No. 2 (Randomize Batch Sigmoid Symmetric Algorithm).The results of studies are presented in Table2. .

Fig. 7 .T ab le 2
Fig. 7. Graph of the resulting neuron of the network: a -the first one; b -the second one

Fig. 8 . 3 (
Fig. 8. Dependence of the number of epochs on the number of hidden neurons Study of training time versus the number of hidden layers.The study was carried out on NN with a different number of hidden layers: from 1 to 4; 15 formal neurons in each.Experiments were carried out on the following models: Model No. 3 (Incremental Training rate 0.4), Model No. 4 (Resilient Training rate 0.8).The initialization algorithm was Randomize, activation function -sigmoid (symmetric).The results of experimental studies are listed in Table.3. 3.

Fig. 9 . 1 .
Fig. 9. Dependence of the number of training epochs on the number of hidden layers Conclusions 1.At the preparatory stage, based on the data of the KDDCUP-99 database, the following samples were generated: training (430 examples), test (200 examples), control (25 examples).2. To determine the attacks (Back, Buff-er_overflow, Quess_password, Ipsweep, Neptune) in the computer network, the Fann Explorer program created NN of 19-1-25-5 configuration, with 19 network traffic parameters inputted; training, testing (error 0.1) and evaluation of the received results (error 0.3) on the corresponding samples were conducted.In particular, the first neuron re-sponsible for recognizing the Back attack detected four of the five threats.3. Experimental studies of dependence of the training time (number of epochs) on the number of hidden neurons (from 10 to 55) on NN were conducted: Model No. 1 (Resilient algorithm), Model No. 2 (Batch algorithm); the Batch algorithm NM is trained three times faster than the NN based on the Resilient algorithm.The experimental studies of the dependence of the training time on the number of hidden layers (from 1 to 4) on the NN were conducted: Model No. 3 (Incremental algorithm), Model No. 4 (Resilient algorithm); the Resilient algorithm NM is trained almost twice as fast as NN by the Incremental algorithm.