NETWORK TRAFFIC FORCASTING IN INFORMATION-TELECOMMUNICATION SYSTEM OF PRYDNIPROVSK RAILWAYS BASED ON NEURO-FUZZY NETWORK

Purpose. Continuous increase in network traffic in the information-telecommunication system (ITS) of Prydniprovsk Railways leads to the need to determine the real-time network congestion and to control the data flows. One of the possible solutions is a method of forecasting the volume of network traffic (inbound and outbound) using neural network technology that will prevent from server overload and improve the quality of services. Methodology. Analysis of current network traffic in ITS of Prydniprovsk Railways and preparation of sets: learning, test and validation ones was conducted as well as creation of neuro-fuzzy network (hybrid system) in Matlab program and organization of the following phases on the appropriate sets: learning, testing, forecast adequacy analysis. Findings. For the fragment (Dnipropetrovsk – Kyiv) in ITS of Prydniprovsk Railways we made a forecast (day ahead) for volume of network traffic based on the hybrid system created in Matlab program; MAPE values are as follows: 6.9% for volume of inbound traffic; 7.7% for volume of outbound traffic. It was found that the average learning error of the hybrid system decreases in case of increase in: the number of inputs (from 2 to 4); the number of terms (from 2 to 5) of the input variable; learning sample power (from 20 to 100). A significant impact on the average learning error of the hybrid system is caused by the number of terms of its input variable. It was determined that the lowest value of the average learning error is provided by 4-input hybrid system, it ensures more accurate learning of the neuro-fuzzy network by the hybrid method. Originality. The work resulted in the dependences for the average hybrid system error of the network traffic volume forecasting for the fragment (Dnipropetrovsk-Kyiv) in ITS Prydniprovsk Railways on: the number of its inputs, the number of input variable terms, the learning sample power for different learning methods. Practical value. Forecasting of network traffic volume in ITS of Prydniprovsk Railways will allow for real-time identification of the network congestion and control of data flows.


Introduction
To make a forecast of the network traffic parameters there are used various methods and techniques that are widely spread in the analysis of time series of economic indicators [9][10].In general, if the set n of discrete values , ..., n t t t , then the forecasting problem lies in forecasting the value 1 ( ) n y t + at a future time point 1 n t + .The forecast usually has an error, but this error depends on the used forecasting system.High efficiency of the forecast is achieved with the use of neural networks [1,[11][12][13].The forecasting problem can be solved based on the following neural networks: multilayer perceptron (MLP), radial basis function (RBF), generalized regression neural network (GRNN), Volterra networks, Elman networks and ANFIS-system, the overview of which is done in [7].Fuzzy Neural Networks (hybrid systems) are designed to combine the advantages of neural networks and fuzzy inference.They allow you to develop and apply the models in the form of the rules of fuzzy production systems, for the building of which the neural network capabilities are used [5].In particular, the adaptive network of fuzzy inference (Adaptive-Network-Based Fuzzy Inference System, ANFIS), which is implemented in the Fuzzy Logic Toolbox application of Matlab program [4].The main stages of neuro-fuzzy network operation include: formation of the rule base of fuzzy inference system; phasing of input variables; aggregation; activation; accumulation; defuzzification of output variables, the functioning algorithm of such a system is provided in [6].Specifically, [2] proposed a hybrid forecasting system (24 hours ahead) for the suburban passenger flow and [3] formed a hybrid model for forecasting the wagon loading volume for two previous days.

Purpose
To develop the method for forecasting the volume of network traffic (incoming and outgoing) through the use of neuro-fuzzy network (hybrid system) for the considered fragment (Dnipropetrovsk-Kyiv) in ITS of Prydniprovsk railway.

Problem statement
Continuous increase in network traffic volume in ITS of Prydniprovsk Railways requires its fore-casting to prevent network congestion and improve service quality.One of the possible solutions can be the network traffic volume forecasting method that would avoid such an overload (including that of the server).The study used the real traffic data of the most important fragment (Dnipropetrovsk -Kyiv) in ITS of Prydniprovsk Railways for the period 21.03-26.03.2016.The analysis of inbound and outbound traffic in the direction of finding long-term dependency (hours, days) was conducted.For illustrative purposes we built the charts of network traffic volume for the analyzed ITS fragment.As an example, Figure 1 shows outbound traffic for fragment length of 24-hour time series on different days of the week.
Figure 1 shows the trend of behaviour of the network traffic volume for the week: it is about the same on Monday, Tuesday, Thursday and Friday; there are regular changes in a given period.So, in particular, the traffic volume is lower and more or less stable from 00:00 to 7:00, significant and unstable traffic from 8:00 to 17:00, and again the lower and relatively unchanged traffic from 18:00 to 23:00.On Wednesday the volume of network traffic is the highest, and on weekends the traffic volume is much lower than on weekdays.The figure shows that the volume of outbound traffic on Wednesday as compared to Monday, Tuesday, Thursday and Friday is about 1.3 times higher.To make a (day ahead) forecast of the network traffic volume we selected the interval from 8:00 to 17:00, where it has significant variations, but for weekdays (Monday, Tuesday, Thursday, Friday) when the nature of traffic is approximately the same.Thus it was decided to make a (day ahead) forecast of the traffic volume x(t) based on the data of the previous three days: x(t-1), x(t-2), x(t-3).To make a forecast it is necessary to prepare the following sets: learning, test, validation ones.The prepared set will affect the efficiency of learning and testing processes, as well as the ability of the network to solve the problems it faces during operation.To prepare the set we made a special array of 100 examples close to reality.To form the learning set the first 50 values of the created array were used while the other 50 values were used for the test set.To form a control set we used the real data of the fourth day, which is not considered.
2 -Creation of neuro-fuzzy network in Matlab.The task of forecasting the traffic (inbound, outbound) at the section Dnipropetrovsk-Kyiv is reduced to the problem of time series forecasting, usually for such problems there is selected Sugeno type system.For the purposes of linguistic assessment each input variable has two terms (maximum and minimum value), the membership function is chosen as Gaussian (gaussmf), for assessing the resulting variable the set membership function is of linear type.In the knowledge-base editor the set fuzzy inference rules are as follows: if x(t-1)=min and x(t-2)=min and x(t-3)=min, then x(t)=1; if x(t-1)=min and x(t-2)=min and x(t-3)=max, then x(t)=2; if x(t-1)=min and x(t-2)=max and x(t-3)=min, then x(t)=3; if x(t-1)=min and x(t-2)=max and x(t-3)=max, then x(t)=4; if x(t-1)=max and x(t-2)=min and x(t-3)=min, then x(t)=5; if x(t-1)=max and x(t-2)=min and x(t-3)=max, then x(t)=6; if x(t-1)=max and x(t-2)=max and x(t-3)=min, then x(t)=7; if x(t-1)=max and x(t-2)=max and x(t-3)=max, then x(t)=8.
The structure of the designed fuzzy inference system is shown in Fig. 3.As shown in Fig. 3, the system has 5 layers.The first layer (input) -has three nodes (х(t-3), х(t-2), х(t-1)), where the input data are conveyed.The first layer performs dividing phasing of each variable, defining for each j -th rule of inference of the membership coefficient according to the applicable phasing function.The second layer (inputmf) consists of 3 2 6 ⋅ = nodes, because each input variable corresponds to 2 terms, performs aggregation of individual variables i x , determining the resulting value of the membership coefficient for vector x (the activation level of inference rule); this layer is nonparametric.The third layer (rule) is TSK function generator; this is a parametric layer which involves adaptation of the linear weight determining the function of TSK model inference.The fourth layer (outputmf) consists of membership functions for each fuzzy inference rule (number of nodes of this layer corresponds to the number of rules 2 3 = 8); this layer is nonparametric.The fifth layer (output) is normalizing, it has a single node, which corresponds to the output of the system; this layer is nonparametric.
3 -Learning of fuzzy neural network.When learning the hybrid method (hybrid) was selected as the method of optimization (optim.method), which combines the least-square method and the reduced reverse gradient method; the number of iterations of learning (epochs) is 40.As an example, the diagram of membership function of the first input variable before and after system learning is shown in Fig. 2.
4 -Testing of hybrid system.The hybrid system testing is conducted on the test set.Testing results as compared to the system learning results are shown in Fig. 4.
5 -Analysis of hybrid system adequacy.To assess the quality and accuracy of the forecast of the created hybrid system we calculated MAPE (Mean Absolute Percentage Error) by the formula: where ( ) Z t -real data at time point t; 1 ( ) Z t -pre- dicted data at time point t; N -number of hours.
Forecasting of the network traffic volume was conducted from 8:00 to 17:00 (total hours N = 10).MAPE values are: 6.9% for the forecast of inbound traffic volume, 7.7% for the forecast of outbound traffic volume.As an example the actual and predicted volume of outbound traffic in ITS of Dnieper Railways (Dnipropetrovsk-Kyiv) is shown in Fig. 5. Findings 1 -The study of dependence of the average error of the hybrid system learning on the number of its inputs.The study involved the average error of the created hybrid system learning with different number of inputs: 2, 3, 4. In all the experiments, the length of the learning set was 50 examples, the number of epochs -40, system learning was conducted by hybrid method.The obtained data resulted in the built diagrams of the dependence of the average error of the hybrid system learning on the number of its inputs for inbound (outbound) traffic in ITS of Prydniprovsk Railways for the considered fragment Dnipropetrovsk-Kyiv and are presented in Fig. 6.
The figure shows that lowest value of the average error of the hybrid system learning is: 0.27•10 -3 •10 6 = 2.7•10 2 bursts for inbound traffic; for outbound traffic is provided by 4-input hybrid system at the learning set consisted of 50 examples.
2 -The study of dependence of the average error of the hybrid system learning on the number of terms of its input variable.The study was conducted on the created hybrid system, which has three input variables; in all the experiments the length of learning set consisted of 50 examples.Let us analyse the value of the average error of the hybrid system learning based on the number of terms of its input variable: 2, 3, 5.
The obtained values resulted in the built diagrams of the dependence of the average error of the hybrid system learning on the number of terms of its input variable by different learning methods that are presented in Fig. 7.The figure shows that when the number of terms increases (from 2 to 5), the average error of the hybrid system learning decreases: from 0.69•10 6 to 0.45•10 -5 •10 6 =4.5 bursts by the hybrid learning method; from 1.36•10 6 to 0.83•10 6 bursts by the back-propagation method.Thus, learning of the 3-input hybrid system (5 terms for each input variable) is more accurate by the hybrid method than by the back-propagation method.
3. The study of dependence of the average error of the hybrid system learning on the learning set power.For the study we took the learning set of different lengths: 20, 50, 100.The study was conducted on the hybrid system with three input variables; the learning cycle was 100 epochs.The obtained values resulted in the built diagrams of the dependence of the average error of the hybrid system learning on the learning set power according to the learning algorithms that are presented in Fig. 8.
The figure shows that when the learning set power increases (20 to 100 examples) onto 3-input hybrid system, its average learning error decreases: from 0.72•10 6 to 0.41•10 6 bursts by the hybrid learning method; from 2.28•10 6 to 1.07•10 6 bursts by the back-propagation method.Thus, learning of the hybrid system is more accurate by the hybrid method at learning set power of 100 examples.

Originality and practical value
The originality of the work includes the obtained dependences for the average hybrid system error of the network traffic volume forecasting for the fragment (Dnipropetrovsk-Kyiv) in ITS of Prydniprovsk Railways on: the number of its inputs, the number of input variable terms, the learning set power for different learning methods.The practical value is that forecasting of network traffic volume in ITS of Prydniprovsk Railways will allow for real-time identification of the network congestion and control of data flows.

Conclusions
1.The work presents the conducted analysis of the volume of network traffic (inbound and outbound) in ITS of Prydniprovsk Railways (Dnipropetrovsk-Kyiv) based on the real data.For forecasting (day ahead) the volume of network traffic the interval from 8:00 to 17:00 o'clock was selected, where there are significant variations, but at that time of the week (Monday, Tuesday, Thursday, Friday) when the nature of traffic is approximately the same.
2. There were prepared the learning, test and validation sets based on actual data for the period 21.03.-26.03.2016.Forecast of the network traffic volume in ITS of Prydniprovsk Railways (Dnipropetrovsk-Kyiv) is made using a neuro-fuzzy network (hybrid system), which was designed in Matlab program.The hybrid system input is supplied with the network traffic volume for the past three days; forecasting of the network traffic volume was conducted from 8:00 to 17:00 (total hours N = 10); MAPE values are: 6.9% for inbound traffic; 7.7% for outbound traffic.3. The experimental study was conducted over the dependence of the average error of the hybrid system learning on: the number of its inputs (first study), the number of input variable terms (second study), the learning set power (third study) by different learning methods: hybrid, back-propagation.Significant impact on the average error of the hybrid system learning has the number of input variable terms.In ITS of Prydniprovsk Railways (Dnipropetrovsk-Kyiv): -The first study showed that the most accurate volume forecast of the inbound traffic (learning error 2.7•10 2 ) and outbound traffic (learning error 19•10 2 ) is achieved with 4-input hybrid system at the length of the learning set of 50 examples; -The results of the second study showed that increase in the number of terms (from 2 to 5) of its input variable leads to decrease in the average learning error: from 0.69•10 6 to 4.5 bursts by the hybrid method; from 1.36•10 6 to 0.83•10 6 bursts by back-propagation method.Thus, learning of the 3input hybrid system that has 5 terms for each input variable, is more accurate by the hybrid method; -The results of the third study showed that increase in the learning set power (from 20 to 100 examples) onto 3-input hybrid system leads to decrease in the average learning error: from 0.72•10 6 to 0.41•10 6 bursts by the hybrid method; from 2.28•10 6 to 1.07•10 6 bursts by the back-propagation method.Thus, the learning is more accurate by the hybrid method at the learning set power of 100 examples.

Fig. 2 .
Fig. 2. Membership function of the first input variable before and after system learning

Fig. 4 .
Fig. 4. Results of learning and testing of neuro-fuzzy network

Fig. 6 .Fig. 7 .
Fig. 6.Dependence of average error of the hybrid system learning on the number of its inputs