Using genetic algorithms for estimating Weibull parameters with application to wind speed

Article history: Received: 31 October 2018 Accepted: 26 June 2019 Available Online: 31 January 2020 Renewable energy has become a prominent subject for researchers since fossil fuel reserves have been decreasing and are not promising to meet the energy demand of the future. Wind takes an important place in renewable energy resources and there is extensive research on wind speed modeling. Herein, one of the most commonly used distributions for wind speed modeling is the Weibull distribution with its simplicity and flexibility. Maximum likelihood (ML) method is the most frequently used technique in Weibull parameter estimation. Iterative techniques such as Newton-Raphson (NR) use random initial values to obtain the ML estimators of the parameters of the Weibull distribution. Therefore, the success of the iterative techniques highly depends on the initial value selection. In order to deliver a solution to the initial value problem, genetic algorithm (GA) is considered to obtain the estimators of the model parameters. The ML estimators obtained using the GA and NR techniques are compared with the method of moments (MoM) estimators via Monte Carlo simulation and wind speed applications. The results show that the ML estimators obtained using GA present superiority over MoM and the ML estimators obtained using NR.


Introduction
The increase in population and the inadequacy of existing energy resources put the human being into the search of alternative energy resources over the course of human history. In the last decades, there is extensive research on renewable energy due to the decrease in fossil fuel reserves and the increase in environmental awareness. As a clean and never-ending resource, the wind has become an important energy resource and distinguished among the other renewable energy forms such as geothermal energy, hydro energy, solar energy, and biomass energy. Converting the kinetic energy carried by wind to electrical energy is a clean and economical way to produce energy. Once the wind plant is set up, the maintenance cost is relatively low compared to other energy plants. However, the wind turbines and installation costs are high, therefore the wind energy potential of a region should be carefully estimated to determine the proper turbine type. Wind speed is the key factor in determining the wind energy potential of a region [1][2][3][4]. Statistical distributions are used to model wind speed and estimate energy potential. The Weibull distribution is one of the most commonly used distributions in wind energy studies due to its simplicity and flexibility [1,2,[4][5][6][7][8][9][10]. There are various techniques used in Weibull parameter estimation. Sohoni et al. [2] estimated the Weibull parameters using the method of moments (MoM). Seguro and Lambert [5] employed MoM, maximum likelihood (ML) method and modified maximum likelihood (MML) methods. They found that the ML method is more appropriate for the data sets in time series format. For the data sets in frequency distribution format, they recommended using MML method. Akgül et al. [6] compared the least square method, ML method and MML method. Although they found that ML is the most efficient method in overall, they mentioned that ML and MML has a similar efficiency for the large data sets, however, MML has less computational complexity. Arslan et al. [8] compared MoM, L-Moments (L-Mom) method and ML method, and showed that L-Mom method is more efficient for small data sets where ML method is more efficient for larger data sets. Kaplan [10] found that graphical method provides more efficiency than MoM in Weibull parameter estimation. Kollu et al. [11], Akpınar and Akpınar [12] used the ML method to estimate Weibull parameters in their studies. Teimouri et al. [13] compared their proposed L-moment estimator with several methods including the ML method, method of logarithmic moment, percentile method and MoM. They found that their proposed method and the ML method are the most efficient estimators. Akdağ and Dinler [14] proposed the power density method. They found it superior to commonly used methods including MoM and ML method. Saleh et al. [15] compared five different methods and recommended the mean wind speed method and the ML method for fitting Weibull distribution. Azad et al. [16] found MoM and ML method more efficient among several methods. Recently, Usta et al. [17] proposed a new estimation approach based on moments for estimating the Weibull parameters. It is seen from the previous studies that the ML method is one of the most frequently used parameter estimation methods for the Weibull distribution. Due to the nonlinear nature of the log-likelihood function of the Weibull distribution, numerical methods such as Newton-Raphson (NR) should be employed. However, when the iterative techniques are employed, the success of the technique highly depends on the initial value selection. This study departs from the literature by delivering a solution to the initial value problem by using genetic algorithms (GA), which is a heuristic search algorithm and uses a set of solution (search space) instead of single points, for ML estimation of the Weibull parameters. GA is a useful approach in the solution of optimization problems and applied in various studies such as signal control optimization [18] or optimization of mixture parameters of highperformance concrete [19]. In parameter estimation, GA was previously used for negative binomial gamma mixture distribution [20], skew-normal distribution [21] and nonlinear regression [22]. Parameter estimation of Weibull distribution using GA was introduced by Thomas et al. [23] for breakdown times of insulating fluid dataset. GA presented a comparable good performance based on the maximization of the log-likelihood function. With this motivation, the applicability of GA is used in wind speed data modeling. To the best our knowledge, this is the first time GA is used to estimate the parameters of Weibull distribution in wind speed distribution modeling. Observations were obtained from an existing wind farm and different meteorological stations. The efficiency of ML method estimation using GA was compared with ML estimation using NR, and MoM. Mean absolute error (MAE), bias and Kolmogorov-Smirnov (K-S) test were used as decision criteria. The remainder of this paper is structured as follows: Section 2 gives basic information about the Weibull distribution, Section 3 gives detailed information about the parameter estimation methods, Section 4 presents the simulation experiments and wind speed data analysis. Section 5 includes the conclusion.

Weibull distribution
The probability density function (pdf) and cumulative distribution function (cdf) of Weibull distribution are respectively given by: where is the wind speed, and are the Weibull shape and scale (dimensionless) parameters respectively. Probability density plots for some different parameter values are given in Figure 1.

Method of moments estimation
MoM is based on equating sample moments with theoretical moments of respective distribution. To estimate the parameters of the Weibull distribution, coefficient of variation of the sample should be calculated and set equal to the theoretical coefficient of variation as follows [8]: where is the number of data points, Γ is the gamma function. When the shape parameter is obtained from the Equation (3), scale parameter can be calculated by:

Maximum likelihood estimation
The ML method is based on the maximization of the log-likelihood function of the underlying distribution. The log-likelihood function of the Weibull distribution is given as follows: ln ( ; , ) = ln − ln By maximizing the log-likelihood function, taking derivative respect to each of the parameters and equating them to zero, the ML estimators of the shape and the scale parameters will be obtained as follows: and ML estimator of the shape parameter includes nonlinear function, therefore, it can be solved by numerical techniques such as NR algorithm, Nelder-Mead algorithm, simulated annealing algorithm or GA. In this study, we used the NR algorithm and the GA in the maximization of the log-likelihood function given in Equation (5).

Newton-Raphson algorithm
The steps of the NR algorithm are summarized in [21] as follows: 1. Determine the initial values (0) and (0) for and . 3. Compute the values of and at ( + 1)th iteration by using the following equation:

Compute the vector
4. Repeat the iterations until the convergence criterion is satisfied. NR is a fast-converging powerful algorithm, however, it is dependent on the initial guess. Therefore, we considered the GA in the maximization of the loglikelihood function of the Weibull distribution.

Genetic algorithm
GA is a heuristic search algorithm motivated by the principles of biological evolution of species, to obtain the estimators of the model parameters. Unlike the conventional optimization techniques, GA uses a set of initial solutions which are called as chromosome. A flowchart of GA is presented in Figure 2. The steps of the GA in this study are summarized as follows: 1. A range of possible solutions (search space) was defined as arbitrarily for both shape and scale parameters. A sensitivity analysis was carried out to determine the initial population size where it was taken 6, 10, 15, and 20 respectively. Most efficient outcomes were obtained when the initial population size was set to 6, therefore, initial population size was set to 6. 2. Each set of possible solutions is evaluated using the fitness function. The log-likelihood function of the Weibull distribution is the fitness function in this study. 3. The best solution in each iteration is kept as parent chromosome. 4. New offsprings are reproduced by crossover and mutation with the rate of 0.8 and 0.1 respectively. The size of the population including original parents, crossover and mutation offsprings is equal to the initial population size in step 1. 5. New population is evaluated as in step 2. Steps 3-5 are repeated. The algorithm stops if the decision criterion is satisfied or the maximum number of iterations is achieved. A flowchart of the study is given in Figure 2.

Monte Carlo simulations
In order to compare the parameter estimation methods for the Weibull distribution, a Monte Carlo simulation was conducted where the shape parameter is taken 0.5, 1, 3 and 6 and the scale parameter was fixed to 1. The parameter sets used in the simulation can also be seen in Figure 1. The simulation was repeated 1000 times for each of the sample sizes of 20, 50, 100 and 500. MoM estimations were considered as the initial values for the NR. For the GA, the population size was chosen 6, mutation rate and crossover rate were fixed to 0.8 and 0.1 respectively. ML estimations using NR and GA were obtained via "maxLik" [24] and "GA" [25] packages of R software. Mean absolute error (MAE) and bias are chosen as goodness-of-fit criteria for comparing the efficiencies of the parameter estimation methods. MAE and bias for the parameters and are given by: and Smaller values the absolute value of the bias and MAE indicate higher efficiency. Parameter estimations, absolute value of the bias and MAE for each parameter estimation method can be seen in Table 1. Accordinly, best results are highlighted in bold. It is seen from the simulation results that the GA approach was more efficient than NR and MoM in the estimation of the shape and scale parameters according to MAE and bias criteria. For the sample size of 20, 50 and 100, the GA approach provided the best efficiency for the shape parameter in each simulation scenario in terms of MAE and bias. For the sample size of 500, GA also provided the best efficiency for the shape parameter in each simulation scenario according to MAE.
In the estimation of scale parameter for the sample sizes of 20,50 and 100, GA provided the highest efficiency according to at least one of the decision criteria in almost each simulation scenario. For the sample size of 100, GA was the most efficient method in each simulation scenario according to MAE and bias. In overall, it can be said that GA is a very efficient method for small, moderate and large sample sizes. MAE and absolute values of the biases are also presented in    Figure 3 presents the MAE values for the shape parameter k. GA presented more efficiency than NR and MoM in all simulation scenarios. NR was the second-best method. MAE values were decreased when the sample size was increased. However, when the value of the shape parameter was increased, MAE values were also increased. Figure 4 shows the MAE values for the scale parameter . GA was the most efficient method for the sample sizes of 20, 100 and 500. MoM was the most efficient for the sample size of 50. MAE values were decreased when the value of the shape parameter was increased. Similarly, MAE values were also decreased when the sample size was increased. Figure 5 presents the absolute value of bias for the shape parameter k. GA presented the most efficient results. MoM presented better results than NR on some occasions. Similar to the MAE values, absolute values of the bias were decreased when the sample size was increased. However, when the value of the shape parameter was increased, the absolute values of the bias were also increased Figure 6 shows the absolute values of bias for the scale parameter . GA was more efficient than other methods for most of the time. NR was the second-best method. With the increase in the value of shape parameter and sample size, absolute values of bias were decreased.  Table 2. Accordingly, the wind speed data for the Belen Station were observed in 10-min basis. The wind speed data observed at other stations were collected on hourly basis. The descriptive statistics including mean, standard deviation, minimum and maximum for the data sets used in this study are presented in Table 3. It can be seen from Table 3 that the average and the maximum wind speed were observed at Datça station. Table 2. Geographical coordinates of the stations, selected period of observations and data collection process.  Weibull distribution is fitted at the monthly base for the Belen, Gökçeada and Datça data sets. To statistically test that monthly data sets come from Weibull distribution, the K-S test is separately applied to each data set. K-S test is used for testing if a sample distribution belongs to a population with a specific distribution. K-S test statistic is the maximum difference between the empirical distribution 0 ( ) and theoretical distribution ( ) [26].

Period of Observations Collection Basis Height
After the K-S test process, monthly distributions that come from Weibull distribution are selected for further analysis (p-value>0.05). The parameter estimates and K-S test results for Belen, Gökçeada and Datça data sets are presented in Tables 4-6 respectively.  Table 4 shows that GA provides the best fit in terms of the K-S test for Belen data set. It can be seen from Table 5 that GA provides the highest efficiency in 14 of 17 months in terms of the K-S test results in Gökçeada data set. MoM provides the best fit in 3 months. Table 6 shows that GA provides the best fit in 12 of 14 months. MoM is the second-best estimator and has the highest efficiency in 2 months for Datça dataset.

Conclusion
In this paper, we have obtained the ML estimators of the parameters of Weibull distribution using GA and NR techniques, and compared them with MoM. The efficiencies of the parameter estimation methods are evaluated based on MAE, bias and K-S test criteria. Results of the Monte Carlo simulation and real wind speed data analysis show that ML estimator using GA is more efficient than ML estimator using NR and MoM estimator in Weibull parameter estimation. Furthermore, it can be said that all data sets were observed in different geographical regions with different weather characteristics. GA showed superiority on these data sets including different types of weather conditions. Finally, arbitrary search spaces were used in this study which can be seen as a limitation. In the future works, we will focus on developing a data-based search space in GA for Weibull parameter estimation.