OPTIMAL ROUTE DEFINITION IN THE NETWORK BASED ON THE MULTILAYER NEURAL MODEL

Dep. «Electronic Computing Machines», Dnipropetrovsk National University of Railway Transport named after Academician V. Lazaryan, Lazaryan St., 2, Dnipro, Ukraine, 49010, tel. +38 (056) 373 15 89, e-mail viknikpakh@gmail.com, ORCID 0000-0002-0022-099X Dep. «Electronic Computing Machines», Dnipropetrovsk National University of Railway Transport named after Academician V. Lazaryan, Lazaryan St., 2, Dnipro, Ukraine, 49010, tel. +38 (056) 373 15 89, e-mail ihor.tsykalo@gmail.com, ORCID 0000-0002-1629-5873


Introduction
One of the main requirements for routing algorithms is their rapid matching to an optimal solution, dictated by the need for their protocol realization in real time in the conditions of continuous change in the characteristics of network traffic, topology and load of computer networks used in the rail transport.Classic algorithms for finding the shortest path on the graph used in modern routing protocols cannot do this.One of the approaches to solving routing problems in computer networks is the use of neural network technology [8,[15][16].For example, in [12] it is shown that with the help of a neural network (NN) it is possible to find a solution, close to the optimal, to the travelling salesman problem and to find the shortest path on the graph.In [3] for solving routing problems there is studied the possibility of applying the following neural networks: Multi Layer Perceptron; RBF network; Hopfield network.It is established that the most promising means for solving the routing problem are the direct distribution neural network and the Hopfield network, which are capable of operating under conditions of dynamic change in the topology of the computer network and the characteristics of the data transmission channels [1][2].In particular, when using the Hopfield network, additional research is required on the transfer functions of the neurons and on the energy of the neural network [18].In [7], it was discovered that the Hopfield network finds a satisfactory route that differs from the optimal one by 7-8% in average (in the case of more than 15 seats).The possibility of using the Hopfield network to find the shortest path on the route graph in the computer network of railway transport is analysed [5][6].In [3], the use of the direct distribution neural network created in MatLAB for the purpose of determining the route in a computer network of five nodes was investigated.But the integrated computer network of rail transport consists of a much larger number of nodes, which requires additional research.In particular, [20] proposed an intellectual control subsystem with the use of network technology, [17] a subsystem of prediction based on a neural fuzzy network.

Purpose
To develop a methodology for determining the optimal route in the unified computer network based on the created software model «MLP34-2-410-34» using the TensorFlow framework.

Methodology
A combined computer network that works on different technologies can be represented as an unoriented graph G (V, W), where V is the set of graph vertices, the number of which is N, with each vertex modelling a node (router) of the computer network; W is the set of graph edges, the number of which is M.Each graph edge is assigned with a certain weight corresponding to the bandwidth (the maximum amount of data transmitted by the network per unit time): where ij cbandwidth of the communication channel between the i-th and j-th network nodes, Mbps.
To solve the routing problem, it is necessary to find the optimal path between the two routers assigned to the unified computer network.As an example, we will consider a hypothetical computer network whose structure is shown in Fig. 1 Let us introduce the array: where ij xavailability of traffic transmitted within the network between the i-th and j-th vertices.As a limit , i.e. the variable takes value 1, if the traffic flows through the channel (i, j); otherwise -0.
As the criterion of optimality, the following expression is supported: where 125000000  , which guarantees the search for a path with the maximum bandwidth.
If there is no connection between the nodes of the unified computer network, then cij = cji = 0 (hence, ij f ).

Findings
Neural network as the main mathematical tool for solving the problem.In the unified computer network there are 30 routers and 34 communication channels.As an example, let us consider the solution to the problem of determining the optimal route between the nodes «12» and «1».Generally between the indicated nodes there are 14 unique paths.
Path 1: [12,11,13,14,15,20 To solve the routing problem, we used the NN, whose structure is shown in Fig. 2. To the NN input, there is applied a vector of bandwidth of the channels of the unified computer network X, which characterizes its current state X = {xi}, where i = 1, ..., m (m = 34).For example, for NN, when using the train sample of 1,400 examples, the number of required neurons in the hidden layer is estimated as follows:  Sample preparation (preparatory stage).Formation of the sample is carried out according to the fixed structure of the unified computer network (see Fig. 1).The input vector X is constructed by randomly generating channel bandwidth values ij c , while these values are formed by a uniform distribution onto the segments [100; 100,000,000].The response vector Y is generated by calculating the optimal path according to the Dijkstra algorithm using the Python language of the Networkx library (open source software library used to work with graphs and networks).
Justifying the choice of modelling tools.To solve the routing problem in the unified computer network, the Keras library was selected using Ten-sorFlow and Numpy in the Python programming language [9-11, 13-14, 19].
Keras is an open neural network library in Python language capable of working on top of Deeplearning4, TensorFlow and Theano, designed for quick neural network deep learning experiments.
TensorFlow is an open source software library for machine learning.It is the second-generation GoogleBrain machine learning system released as open source software.
Numpy is an extension of the Python language that supports large, multidimensional arrays and matrices, along with a library of high-level mathematical functions for operations with these arrays.
Python is an interpreted object-oriented highlevel programming language with strict dynamic typing.High-level data structures, along with dynamic semantics and dynamic linking, make it attractive for rapid development of applications, as well as a tool for existing components.Python supports modules and module packages, which facilitates modularity and reuse of the code.Python interpreter and standard libraries are available both in compilation and in source form on all major platforms.Python programming language supports several programming paradigms, including: objectoriented; procedural; functional; aspect-oriented.The overall structure of MLP 34-2-40-34 software model is shown in Fig. 4 In order to be able to unambiguously compare the NN models in two parametersthe probability of optimal responses and the probability of correct responses, we entered the value of the harmonic mean, which is calculated by the following formula:

Structure of MLP
where opt  probability of optimal responses, cor probability of correct responses.Testing MLP34-2-410-34 program.The resulting feature vector of channel entry to the optimal path is as follows: {0,0,0,1,0,0,0, 0,0,0,0,0,0,0,0,0,1,1,1,1,0,1,0,0,0,1,0,1,0,0,0,0,0,0},      The table shows that on the small volumes of data the activation functions Sigmoid, ReLU and Leaky-ReLU showed themselves the most successfully on the test sample.The best result of accuracy in training data was shown by the activation functions Tanh, ReLU, Leaky-ReLU, so they can be considered promising for larger sample sizes.T ab le 3 The table shows that on the data volumes bigger than the average, Sigmoid, ReLU and Leaky-ReLU activation functions were most successful on the test sample.Except for the linear activation function, all functions have given maximum accuracy on the training data.The resulting NN of 34-2-410-34 configuration with the activation function Leaky-ReLU (α = 0.1) in the hidden layer and the linear activation function in the output layer after training with 49,000 examples for 1,000 epochs reached MSE value of 0.0024 on the control sample and in 86% determines the optimal path.

Originality and practical value
The NN of 34-2-410-34 configuration with activation function Leaky-ReLU (α = 0.1) in the hidden layer and with linear function in the output layer was studied.Experiments were conducted for various training optimization algorithms during 100 epochs for the sample size of 49,000 examples.The results of experiments are shown in the table 6, the training process is illustrated in Fig. 15.
According to the results of experiments, it is evident that the NN in the classical gradient descent learns very slowly; the NN in the stochastic gradient descent shows a significant improvement.Adam, AdaMax and Nadam training algorithms showed almost identical results.  .The study showed that the accuracy of the neural network can be reached with 410 hidden neurons, and the further increase does almost nothing to improve the results.
3. Efficiency was studied based on the harmonic mean of the NN of 34-2-410-34 configuration under different activation functions: linear; sigmoid hyperbolic tangen-som; Softplus, ReLU; Leaky-ReLU using the Adam algorithm on train samples of different size (140, 1 400, 14,000, 49 000 examples).The study has shown that the activation functions ReLU and Leaky-ReLU train the most rapidly at all levels of the train sample and less than other activation functions are subject to re-training.The NN of 34-2-410-34 configuration for activation functions Tanh and Softplus can achieve 100% accuracy on the train sample, but these functions are less slowly learned than ReLU and Leaky-ReLU and are strongly subject to retraining with insignificant sizes of the train sample (140, 1,400 and 14,000 examples).When using the sigmoid activation function, the neural network is also retrained, but with a large size of train examples (49,000 examples) it is able to achieve accura-cy (83%) close to Leaky-ReLU (88%) or ReLU (86%).
4. Efficiency of the NN of 34-2-410-34 configuration was studied with activation function Leaky-ReLU (α = 0.1) in the hidden layer and with linear function in the output layer.Experiments were conducted by different optimization algorithms (BGD, MB SGD, Adam, Adamax, Nadam) during 100 epochs with the train sample of 49,000 examples.Adam, Adamax, and Nadam algorithms showed almost identical results, with 89, 87, and 90% accuracy, respectively.

Fig. 1 .
Fig. 1.Graph of router connections of unified computer network

Fig. 3 .
Fig. 3. Generation of sample: a -random; b -balanced «MLPModel» creates a neural network of 34-2-X-34 configuration (where X is the possible number of hidden neurons) and performs the following steps: training; testing; control on the corresponding samples and their normalization.«NetworkX» (standard class) builds a graph of the computer network (Graph), calculates the existing paths between stations (all_simple_paths), the path between the specified stations according to the Dijkstra algorithm (bidirectional_dijkstra).«Matplotlib» (standard class) builds a pie chart and histogram to show the ratio of the number of examples for each path.«Keras.Model» (standard class) performs compilation in accordance with the given configuration of the neural network (compile), represents the standard functions (fit, predict) that are used during the training and testing of the neural network.«Metrics» performs the calculation of the pro-bability of the optimal and of correct answers.«TensorFlow» (standard class) is called by the «Keras.Model» class when performing the appropriate calculations.The overall structure of MLP 34-2-40-34 software model is shown in Fig.4In order to be able to unambiguously compare the NN models in two parametersthe probability of optimal responses and the probability of correct responses, we entered the value of the harmonic mean, which is calculated by the following formula:

Fig. 11 .
Fig. 11.The MSE error-epoch dependency graph for train and test samples of 140 examples:

Fig. 12 .
Fig. 12.The MSE error-epoch dependency graph for train and test samples of 1,400 examples.

Fig. 13 .
Fig. 13.The MSE error-epoch dependency graph for train and test samples of 1 14,000 examples

Fig. 15 . 1 . 2 .
Fig. 15.The MSE error-epoch dependency graph for train and test samples for different training algorithms

Study of NN for different number of hidden neurons
. From the table it is clear that for the train sample the best result of 0.99 is already achieved with 410 neurons in the hidden layer.

Study of NN for various activation functions on train sample of 14,000 examples
14,000 examples Fig. 14.The MSE error-epoch dependency graph for train and test samples of 49,000 examples