Application of Levenberg-Marquardt Optimization Algorithm Based Multilayer Neural Networks for Hydrological Time Series Modeling

Recently, Artificial Neural Networks (ANN), which is mathematical modeling tools inspired by the properties of the biological neural system, has been typically used in the studies of hydrological time series modeling. These modeling studies generally include the standart feed forward backpropagation (FFBP) algorithms such as gradient-descent, gradient-descent with momentum rate and, conjugate gradient etc. As the standart FFBP algorithms have some disadvantages relating to the time requirement and slow convergency in training, Newton and Levenberg-Marquardt algorithms, which are alternative approaches to standart FFBP algorithms, were improved and also used in the applications. In this study, an application of Levenberg-Marquardt algorithm based ANN (LM-ANN) for the modeling of monthly inflows of Demirkopru Dam, which is located in the Gediz basin, was presented. The LM-ANN results were also compared with gradientdescent with momentum rate algorithm based FFBP model (GDM-ANN). When the statistics of the long-term and also seasonal-term outputs are compared, it can be seen that the LM-ANN model that has been developed, is more sensitive for prediction of the inflows. In addition, LM-ANN approach can be used for modeling of other hydrological components in terms of a rapid assessment and its robustness.


Introduction
The application of water resource engineering methods to evaluate the potential of water resource and the decision making strategies of water resource management, such as droughtflood analysis, irrigation, reservoir performances based on probability of failure and, the development of integrated river basin models under the certain climate scenarios, needs the forecasting of streamflow data and modeling of rainfall-runoff relations.In this context, the examination of hydrological processes and causalities of these processes deepens our understanding of modeling.Especially, the recent and apparent impacts of climate change have also popularized these models.
A basin can be considered as a system that transforms the rainfall to runoff.The modeling of this system can be set up to obtain the relation of the transformation by making simplifying assumptions because a basin has very complicated and uncertain components.There are different classifications presented in the literature to qualify basin models which include system definitions, area-time scales and solution techniques.But in general, there are three main approaches in representing the basin systems: white-box models (physical based distributed models), the gray-box models (conceptual models) and the blackbox models [1].The white and gray-box models aim to simulate physical creation mechanism in the ways of each of theirs components, such as surface, subsurface and groundwater flow, infiltration, percolation, and evapotranspiration.The relevant parameters of these components for a certain basin are determined by different optimization techniques.However, in terms of uncertainties, data requirements and complexities of model parameters, they can not use in some applications.Because of uncertainties and complexities in these modeling studies, the basin may be also considered as the black-box models which are applied to associate basin inputs and desired outputs without detailed consideration about the physical processes of the phenomena.In this context, conventional statistical models are commonly used in applications which contain regression analyses, curve fitting approaches and stochastic autoregressive models [2][3][4][5][6][7][8][9].In addition to these, artificial neural networks (ANNs) are also employed to streamflow modeling [10][11][12][13][14][15].The ANNs can be considered as complex and nonlinear regression models structured between basin inputs (precipitation, temperature, evaporation etc.) and basin output "streamflow" data.Although there are several ANN techniques, feed forward backpropagation (FFBP) algorithm based models used in applications typically.A number of ANN studies have been reported in literature.Some of them are given.Minns and Hall (1996) prepared a FFBBP algorithm based ANN model by using synthetic data set to forecast streamflows.Campalo et al.(1999) developed an ANN model to analyze and forecast the behavior of the river Tagliamento, in Italy [16].Mendez et al. (2004), Kisi (2005) and Okkan and Mollamahmutoglu (2010a) investigated the performance of ANN and autoregressive models in prediction of streamflow [15,17,18].They were shown that ANN methods yielded better results than autoregressive models.Cigizoglu (2003) also used an autoregressive model which was employed to generate synthetic monthly flows [14].These generated values were used as the training sets of ANNs to forecast the observed Goksu River monthly mean flows in the East Mediterranean part of Turkey.According to this, the forecasting results were compared with the ANN performance when only a limited number of observed flows were employed in the training data sets.Increasing the data sets in the training stage improved the forecasting performance significantly.In addition to FFBP algorithms, Generalized Regression Neural Networks [19,20] and Radial Basis Neural Networks [21][22][23] studies were also used in streamflow predictions.Briefly, all of these studies shown that the ANN is probably the most successful black box tool which is capable of modeling complex and uncertain relationships between input and output variables without the detailing of the physical process.
In the study presented, an application of Levenberg-Marquardt algorithm based ANN (LM-ANN) for the modeling of monthly inflows of Demirkopru Dam, which is located in the Gediz Basin, was presented.The LM-ANN results were also compared with gradient-descent with momentum rate algorithm based FFBP model (GDM-ANN).

2.The Multilayer Neural Networks
The basic concept of the multilayer neural networks is that they are typically made up of single neurons.And in the multilayer neural networks, the neurons are organized in the form of layers (Figure 1).
The first and last layer of multilayer neural networks is called the input and the output layers respectively.The input layer does not perform any computations, but only serves to feed the input data to the hidden layer which is between the input and output layers.In general, there can be any number of hidden layers in the multilayer neural networks structures.However, from practical applications, only one or two hidden layers are used.In addition to this, the number of hidden layers and also the number of neurons of hidden layers can be determined by trial and error [24][25][26].
There are also three important components of a multilayer neural network structure: weights, summing function and activation function.The importance and functionality of the inputs on neural network models are obtained with weights (W).
So the success of the model depends on the precise and correct determination of weight values.The summing function (net) acts to add all outputs; that is, each neuron input is multiplied by the weights and then summed.After computing the sum of weighted inputs for all neurons, the activation function f (.) serves to limit the amplitude of these values.
The activation functions are usually continuous, nondecreasing and bounded functions.
Various types of the activation function are possible but generally sigmoid function is preferred in applications [26].This activation function generates outputs between 0 and 1 as the input signal goes from negative to positive infinity. (.) In addition to the structure and its components of multilayer neural networks, the running procedure is also important which involves typically two phases; forward computing and backward computing.
In forward computing, each layer uses a weight matrix (W (v) , for v =1, 2) associated with all the connections made from the previous layer to the next layer (Figure 1).The hidden layer has the weight matrix (1)   hxn WR  , the output layer's weight matrix is (2)   mxh WR  .Given the network input vector which is the input to the output layer.The output of the output layer, which is the response (output) of the network Substituting (Eq.2) into (Eq.3)for x out,1 gives the final output y = x out,2 of the network as After the phase of forward computing, backward computing which depending on the algorithms to adjust weights is used in the multilayer neural networks.The process of adjusting these weights to minimize the differences between the actual and the desired output values is called training or learning the network.If these differences (error) are higher than the desired values, the errors are passed backwards through the weights of the network.In ANN terminology, this phase is also called the backpropagation algorithm.Once the comparison error is reduced to an acceptable level for the whole training set, the training period ends, and the network is also tested for another known input and output data set in order to evaluate the generalization capability of the ANN [24,26].
Depending on the techniques to train ANN models, different back propagation algorithms have been developed.In this study, the Levenberg-Marquardt algorithm (LM-ANN) was used for training of the network.The Levenberg-Marquardt algorithm is a second order nonlinear optimization technique that is usually faster and more reliable than any other standart back propagation techniques [27][28][29] and it is similar to Newton's method [30,31].

The Levenberg-Marquardt Algorithm
The Levenberg-Marquardt optimization algorithm represents a simplified version of Newton's method [31] applied to the training multilayer neural networks [30,32].Consider the multilayer neural network shown in Figure 1, the running of the network training can be viewed as finding a set of weights that minimized the error (e p ) for all samples in the training set (Q).If the performances function is a sum of squares of the errors as


(5) where Q is the total number of training samples, m is the number of output layer neurons, W represents the vector containing all the weights in the network, y p is the network output, and d p is the desired output.
When training with the Levenberg-Marquardt optimization algorithm, the changing of weights ΔW can be computed as follows where J is the Jacobian.matrix, I is the identify matrix, µ is the Marquardt parameter which is to be updated using the decay rate β depending on the outcome.In particular, µ is multiplied by the decay rate β (0<β<1) whenever E(W) decreases, while µ is divided by β whenever E(W) increases in a new step (k).
The LM-ANN training process can be illustrated in the following pseudo-codes, 1. Initialize the weights and µ (µ = 0.001 is appropriate).

Compute the sum of squared errors over all inputs, E(W).
3. Compute the Jacobian matrix J. 4. Solve Eq.6 to obtain the changing of weights ΔW.
5. Recompute the sum of squared errors E(W) using as the trial W, and judge [] ( 0.1)

Application
The application area covers the Demirkopru Dam's basin which is located in the Aegean region of Turkey.The study region has typical Mediterranean climate characteristics.Demirkopru Dam's basin is also called the Upper Gediz which has four rivers (Demirci, Deliinis, Selendi and Murat) located upstream of the dam with a total drainage area of 6590 km 2 (Figure 2).

Figure 2
The streamflow and the meteorological stations in the study area LM-ANN modeling was applied on the observed data of selected 5 meteorological stations and 4 streamflow gauging stations (Table 1).In the modeling application, 30 years (January 1977-December 2006) input-output data were used and divided into training and testing periods by proportions of 2/3 (January 1977-December 1996) and 1/3 (January 1997-December 2006), respectively.Before presenting the input-output data to ANN, the all data set were scaled to the range 0-1 so that the different input signal had the same numerical range.The training and the testing subsets were scaled to the range of 0-1 using the equation z t = (x t -x min )/(x max -x min ), where x t is the unscaled data, z t is scaled data, and x max and x min are the maximum and minimum values of the unscaled data, respectively.Then, the output values of the networks, which were in the range of 0-1, were converted to real-scaled values using the equation x t = z t (x max -x min ) + x min .
Because of the scaling range, the sigmoid function was selected as the activation function which generates outputs between 0 and 1.
In training, the number of hidden layers, the number of the neurons in the hidden layers and Marquardt parameters were determined after trying various network structures.The network structure providing the best result, i.e., the minimum root mean square errors, RMSE (Eq.7), and the maximum determination coefficients, R 2 (Eq.8) was also employed for the testing period.
where T is the number of training or testing samples, y t is the network output, d t is the observed (desired) data in the t th time period, and d mean is the mean over the observed periods.
The modeling study started with the networkinput data consisting of the concurrent monthly rainfall and temperature and the corresponding inflows at Demirkopru Dam as an output from the network.The maximum possible model determinations (R 2 ) and the minimum root mean square errors (RMSE) obtained with two inputs and one output, network was 70.94 % and 37.74 (10 6 m 3 ) and 61.17 % and 46.09 (10 6 m 3 ) for the training and testing periods respectively.The number of neurons in the hidden layer was tried between 2 and 20 and the one with 12 neurons gave the best performance on the testing data.To develop the performance of the LM-ANN model, antecedent rainfalls were included in the input.The best performance for the model determinations was obtained with 9 neurons in the hidden layer with a concurrent monthly rainfall, temperature and three antecedent rainfalls used as input.With these inputs, determinations of 93.31 % and 82.19 % were obtained for the training and testing periods respectively.When two and three antecedent rainfalls were added to the model, the performance of training period improved, but in terms of root mean square errors, the performance for the testing period was found to be deteriorating.Thus, the model (Model I) which used concurrent rainfall, temperature and one antecedent rainfall as input was found to be sufficient and suitable than the others.
To improve the performance even further, it was required to use antecedent inflow values as input.According to this, the best performance (Model II) was obtained using one antecedent inflow, in addition to the concurrent rainfall, temperature and one antecedent rainfall, as input to the network.With these inputs and 3 neurons in the hidden layer, determinations of 93.74 % and 84.16 % and root mean square errors of 17.71 (10 6 m 3 ) and 25.77 (10 6 m 3 ) were obtained for the training and testing periods respectively.When two antecedent rainfalls were added to Model II, the performance for the testing period was also found to be deteriorating.The results of all of these experiments are summarized in Table 2.The best performances of the LM-ANN models (I and II) during the training and testing periods are shown in Figure 3 and Figure 4.The results of suitable models are provided in Figure 5 as box-plots in order to compare the minimum, maximum and the median values of the observed and the predicted monthly inflows in the testing periods.Furthermore, the mean values of the observed and the predicted monthly inflows in the testing periods are also shown as a bar diagram in Figure 6.
When the box-plots were examined, in terms of the median values of the observed and the predicted monthly inflows, the all models were fitted well.But the results of some months (especially November and December) in Model II which used the one antecedent inflow data as input was much better than the Model I.
When the box-plots were also compared, in terms of the extreme (maximum and minimum) values of the observed and the predicted monthly inflows, it was noticed that there were different results.
For example the extreme values of March were fitted by Model I.However, the results of November and December in Model II were much better than the Model I as well as in terms of monthly means (See Figure 6).
In the study, LM-ANN Model I results were also compared by using gradient-descent with momentum rate algorithm based FFBP approach (GDM-ANN).The best performance of GDM-ANN model was determined with 3 neurons in the hidden layer.By using the same inputs as Model I, the determinations of 73.12 % and 62.20 % were computed for the training and testing periods respectively (See Figure 7).  it may be required to use antecedent inflow values as input.
 Further, the time required for training by the LM-ANN is not only the lowest but also only a fraction of the time taken by GDM-ANN algorithms.
 This was also proved with this study, that LM-ANN is the one of the most successful black box techniques which is capable of rainfall-runoff modeling without the detailing of the physical process and can be also used for modeling of other hydrological components in terms of a rapid assessment and its robustness.

Figure 5
Figure 5 Box-plots of the observed and the predicted monthly inflows of Model I-II in the testing periods.

Figure 6
Figure 6 Bar diagram of the observed and the predicted monthly mean inflows for the testing periods.

Table 1
Selected meteorological and streamflow gauging stations in the study area