An improved diﬀerential evolution algorithm with a restart technique to solve systems of nonlinear equations

ABSTRACT


Introduction
Difficult nonlinear problems arise in a variety of fields in science and engineering, which demands for the effective computational methods and leads to the development of various intelligence methods [1,2] such as genetic algorithm, particle swarm optimization, and differential evolution algorithm. These methods are recognized as the alternative approaches to the analytical ones and have found applications in many areas. For example, they have been applied to data clustering problems in data science [3,4], the weight training of artificial neural networks [5,6], and numerical treatments of nonlinear systems [7][8][9][10][11]. This research focuses on solving nonlinear systems of equations which is common and important both in theoretical and application aspects [12][13][14][15][16]. It has been used as a key part of solving complex problems involving decision variables of nonlinear models. New systems of nonlinear equations often emerge from computational problems and can range from moderate problems with a few variables to the hard ones with many variables.
For example, physical models that are expressed as nonlinear partial differential equations become large systems of nonlinear equations when discretized [17]. The systems may also have strongly dependent variables or a large number of local solutions, which makes them much more difficult to solve. So the analytical solution approach aiming to get the closed form solution is usually impractical or impossible. There are two practical approaches to solving a system of nonlinear equations: the local methods that directly solve the original system and the global methods that transform the system into an optimization problem (with box constraints of the variables) and solve that equivalent optimization problem instead. The local methods consist of iterative procedures that require an approximate solution as a starting vector point and use the local information (the derivatives or gradients) from the equational functions of the system to compute a new better approximate solution point for the next iteration. They include all various Newtontype methods [12] and can produce solutions with good computational speed and solution quality if *Corresponding Author the functions satisfy some analytical properties and an approximate solution sufficiently close to a real solution is given. Since these critical requirements of the local methods are often not fullfilled, the derivative-free global solution methods are needed.

Related work
There are a few global methods proposed for solving nonlinear systems in the literature. In 1998, Karr et al. [18] presented a hybrid scheme using a genetic algorithm to locate initial guesses of solutions, which are then supplied to a Newton method. Their results on one selected test problem of finding the nodes and weights for Gauss-Legendre quadrature showed that the genetic algorithm can effectively locate an initial guess that allows the Newton method to converge to an accurate solution. Later, Grosan and Abraham [19] applied a multi-objective optimization approach to solve nonlinear systems in 2008. They used a genetic algorithm and considered the nondominated solutions stored in an external set. Several problems are tested and the obtained solutions of various qualities both in number of different solutions and accuracy are reported. In 2009, Hirsch et al. [20] used the continuous GRASP optimization method, a multi-start local search procedure, to find all roots of nonlinear systems. In order to find different solutions, the objective function is adaptively modified to create an area of repulsion (or penalty region) around solutions that have already been found, and the continuous GRASP is run multiple times. The method showed promising results on four selected nonlinear systems from the literature. Jaberipour et al. [21] used a particle swarm optimization algorithm to solve nonlinear systems in 2011. They proposed a new way of updating each particle and a mechanism to replace some of the worst particles. Several test problems including both nonlinear systems of a single equation and a few equations are tested, and the solutions and their accuracies are reported. In 2012, Pourjafari and Mojallali [22] proposed a hybrid scheme of a twophase root finder for a nonlinear system using an invasive weed optimization algorithm and a clustering technique. They also aimed to locate all roots of the system. The approach gives successful results on several constructed nonlinear equations in single variable that have many local solutions and on three real world problems of small size that have a few different solutions. Oliveira and Petraglia [23] proposed a stochastic optimization method known as fuzzy adaptive simulated annealing (fuzzy ASA) to find many solutions of a nonlinear system in 2013. The fuzzy ASA is run several times to explore different regions during different activations. The method is stopped when the solutions with the predefined accuracy are found. Several test problems are tested and the obtained high accuracy solutions are reported. In 2016, Raja et al. [24] presented a memetic algorithm (GA-SQP), a hybrid of a genetic algorithm and a local search method based on a sequential quadratic programming (SQP) technique, for solving nonlinear systems. Several variants of GA-SQPs are proposed and tested on six different application problems. The results showed that the hybrid approaches give higher precision solutions and their proposed methods outperform several methods: simulated annealing (SA), patternsearch (PS), Nelder-Mead (NM) and Levenberg-Marquardt (LM) algorithm. Recently in 2018, Zhang et al. [25] proposed a modified cuckoo search algorithm (CSA) for solving nonlinear equations and nonlinear systems. Four application systems are used to evaluate the CSA performance. By setting high precision tolerance as the termination condition, solutions with high accuracies are obtained and reported. They have shown that the CSA gives more accurate solutions than those obtained by GA-SQPs. And also in 2018, Raja et al. [10] have presented the particle swarm optimization hybrid with Nelder-Mead method (PSO-NMM) to solve nonlinear benchmark models. PSO-NMM exploits the strength of PSO as an efficient global search method to find good initial solutions and then applies the Nelder-Mead simplex method to refine the solutions for rapid local convergence. They have shown that for moderate to difficult nonlinear system problems, the hybridization of NMM can enhance the convergence and give quality solutions with high precision.

Innovative contribution
From the development of computational intelligence and evolutionary optimization methods to solve various complex real world and synthetic test problems of nonlinear systems, it is clear that the global methods become the essential tools and need more researchers attentions. This approach will also make solving nonlinear systems an important field of global optimization since it can supply plenty of challenging test problems. In addition, research contributions in this direction still lack common ground of how to compare and establish the obtained results. In this research, we aim to apply the differential evolution algorithm (DE) to solve nonlinear systems. DE is a popular and efficient global optimization method introduced by Storn and Price during the years 1995-1997 [26,27] and it is surprising that no explicit research contributions on using DE to solve the problems are found among our extensive related literature reviews. We propose an improved differential evolution algorithm with a restart technique (DE-R) for solving the problems and offer a basic ground of applying DE in this direction. The main features of the proposed algorithm can be summarized as follows: • The mixing strategy of the xbestmutation to the basic DE mutation that utilizes the current best vector solution to enhance the search and the convergence to an optimal solution. • The incorporating of a restart technique to prevent premature convergence and stagnation during the evolutionary search. • Its performance both in term of the convergence speed and the achievement of high precision solutions are tested on several nonlinear benchmark problems of varying difficulty.
The remainder of the paper is organized as follows. The basic differential algorithm is presented in the next section and the proposed algorithm is described in Section 3. Section 4 gives details of the experimental design and lists all the benchmark problems. In Section 5, the performances of the DE-R method are compared with those of the basic DE algorithms based on the setting of the value to reach, and with those of the PSO and PSO-NMM methods [10] based on the setting of the maximum number of function evaluations. Finally, the conclusion are given in the last section.

The differential evolution algorithm (DE)
The basic or classic DE algorithm [26,27] for solving a continuous optimization problem is a stochastic search method using a population of real vectors with four main operations: initialization, mutation, crossover and selection. The pseudo code of the basic DE is illustrated in Table 1. Its main features are the differential mutation, the combined binomial crossover and the greedy selection. First, the initial population is generated uniformly in feasible region. For each generation and each target vector x i , three different random population vectors x r1 , x r2 , x r3 , which are also different from the target vector, are used to generate a mutant vector xm = x r1 + F (x r2 − x r3 ) by adding the scaled difference of two vectors to another one with the scaling factor F . Then, some components of the target vector are exchanged with those of the mutant vector according to the crossover rate CR to produce the trial vector. The trial vector will replace the target vector in the selection process if it is fitter. For more than two decades, DE has been shown to be one of the most efficient methods for continuous optimization problems [28][29][30]. However, DE's performances depend on five main factors: the population size N P , the control parameters F and CR, the dimension D, the objective function f to be solved, and the amount of computations allowed [31,32]. There are numerous research contributions on modifying DE to improve the performances in solving practical problems. These include the main approaches of adapting the control parameters and adjusting the basic mutation operation [33][34][35][36][37][38][39][40][41][42].

An improved differential evolution algorithm with a restart technique (DE-R)
We propose an improved differential evolution algorithm with a restart technique (DE-R) as a general purpose method for solving nonlinear systems through their equivalent transformed objective functions. Let the form of a nonlinear system be .., m are nonlinear equations (including linear functions) and x = (x 1 , x 2 , ..., x n ) is a real vector. We want to find a solution x * such that f i (x * ) = 0 simultaneously for all i. The problem is transformed into the corresponding optimization problem by defining the objective function f as the sum of the absolutes or the squares of all f i .
For the smoothness of f , we use the sum of the squares. So the objective function f is as follows:

The DE algorithm
(1) Inputs: Objective function to be minimize (f ), problem dimension (D), and population size (N P ). For each target population vector x i , construct the mutant vector xm by where r1, r2 and r3 are randomly generated distinct indices (in the range of 1 to N P ) which are also different from the target index i, and F is a scaling factor. (4) Crossover: Construct the trial vector (the candidate vector) xc by replacing some components of x i with the corresponding components of xm as follows: where rand() is a uniform random number in [0, 1], CR is a crossover rate, and IC is a randomly fixed index from 1 to D. To be able to optimize the objective function by using small number of function evaluations, we tend to use small population size. But optimizing with small populations will tend to loss diversity and lead to premature convergence or stagnation easily [31,32,43,44]. We can also accelerate the convergence by utilizing the information of the population xbest but this again increases the chance of those convergence problems. DE-R utilizes the xbest information and at the same time prevents premature convergence and stagnation by incorporating an xbest-mutation and a restart technique.
The pseudo code and the flowchart of the proposed DE-R method are presented in Table 2 and Figure 1, respectively. After initialization, the DE-R creates a mutant vector by using the mixing scheme of two mutation operations: the basic mutation and the xbest-mutation [28] that uses the xbest information where x r1 , x r2 , x r3 , and x r4 are different random vectors from the population and different from the target vector x i . Each mutation operation is randomly chosen and applied with the proportion of 50% : 50%. And instead of using a fixed value for scaling factors, the DE-R algorithm uses random values in the range of [0.5, 0.7] for F, F 1 and F 2 . These basic proportion and range of scaling factors are chosen from the preliminary experiments. For the crossover operation, the fixed value of crossover rate CR = 0.9 is used to generate the trial vector. Then the same greedy selection as in basic DE is applied. To prevent premature convergence or stagnation, the restart technique randomly restart P R of the population vectors by replacing them with the new generated vectors as in the initialization for every N RS generations. This incorporated restart operation periodically supplies small amount of new contents to the evolving population. Note that the xbest vector is kept, updated and used in the xbestmutation along the entire optimization process.

The improved DE algorithm with a restart (DE-R)
(1) Inputs and control parameters: Randomly generate distinct indices r1, r2 and r3 (in the range of 1 to N P ) which are also different from the target index i. Construct the mutant vector xm by 3) The xbest-mutation: Randomly generate distinct indices r1, r2, r3, and r4 (in the range of 1 to N P ) which are also different from the target index i. Construct the mutant vector xm by where xbest is the current best solution and F 1 , F 2 are randomly generated in [0.5, 0.7]. (4) Crossover: Construct the trial vector (the candidate vector) xc by replacing some components of x i with the corresponding components of xm as follows: where xm is the mutant vector from the step (3), CR is the crossover rate, rand() is a uniform random number in [0, 1], and IC is a randomly fixed index from 1 to D.

(5) Selection:
Apply the greedy selection and check for an update of xbest.
If f (xc) < f best, then update xbest with xc and update f best with f (xc). (6) Restart: Apply the restart technique every N RS generations by randomly choosing P R × N P distinct population vectors to be re-initialized as in the initialization where P R is the restart rate. (7) Stopping condition: Repeat all the steps (3) -(6) until f best is less than V T R or maxnf is reached. Then report the obtained best solution.
Set the generation number G=1 and number of function evaluations nf=0. i=1 Set a target vector xi.
(Mutation) Generate the mutant vector xm.
(Crossover) Generate the candidate vector xc from xi and xm.

Experimental design
To evaluate the performance of the DE-R algorithm, two experiments with different settings and measurements are performed. The first experiment assesses the proposed DE-R algorithm against the basic DE algorithms on 10 nonlinear benchmark models by setting the value to reach (V T R) while the second experiment compares the DE-R algorithm to the PSO and PSO-NMM methods [10] by setting the maximum number of function evaluations (maxnf ).

Experiment 1: Performance comparison of the DE-R with the basic DE algorithms
The DE-R algorithm and the basic DE algorithms are tested on 10 selected nonlinear systems consisting of 6 real world problems (case study 1-6) and 4 synthetic problems (case study 7-10). Their definitions, parameters, variable bounds and the solutions (for the case of synthetic problems) are listed as follows.
Case study 1: Neurophysiology application [23,45]. The system consists of the following six equations in six variables [23] and −10 ≤ x i ≤ 10. Case study 2: Robot kinematics application [46]. This problem concerns the indirect-position problem which is to find the desired position and orientation of the robot hand and the relative joint displacements. This problem can be reduced to the following system of eight equations in eight variables Case study 3: Automative steering application [20,47]. This problem describes the kinematic synthesis of a trailing six-member mechanism for automotive steering. The system contains three equations in three unknown as follows The constants ϕ i and ψ i are given as follows Case study 4 : Economics modeling application [23,45]. This problem arises in economics modeling. It can be extended for general dimensions n as follows where c i = 0 for all i as in [23] and −10 ≤ x i ≤ 10. Case study 5 : Chemical equilibrium application [23,46,48]. This problem describes a chemical equilibrium system. It concerns the combustion of propane in air to form ten products, which are transformed to ten equations in ten variables.
To solve this problem, the system can be reduced to the following systems of five equations in five where −100 ≤ x i ≤ 100 and the constants used in this system are

Case study 6 :
Combustion application [23,45]. This problem is a typical chemical equilibrium problem which represents a combustion problem.
The system consists of ten equations in ten unknowns as follows Case study 7 : Rosenbrock function [49]. This function is well-known for testing the performance of the optimization methods. The twodimensional function has the global minimum inside a long, narrow, parabolic shaped flat valley. It is unimodal for dimensions 2 and 3, and has 2 minima for higher dimensions. The highdimensional function is highly nonseparable and is used as one of the difficult test functions. Rosenbrock function can be written in the form of system of nonlinear equations in general dimensions n as follows { In this work, we set n = 10 and −100 ≤ x i ≤ 100. The global solution is (1, 1, ..., 1). [50]. This function is multimodal and nonseparable and it is one of the test functions from CUTE: Constrained and Unconstrained Testing Environment. It can be scaled up to arbitrary dimension n and can be written in the form of the following system
Case study 9 : Proposed function 1. This function can be written in the form of the system of three nonlinear equations in n variables where n ≥ 3. The first two equations give the intersection of two n-spheres which is an (n − 1)sphere. The last equation chooses the two intersection points where x 1 = 0.05. This system is expected to be highly nonseparable since the searching points must lie in the (n − 1)-sphere. It is shown as follows We set n = 10 and −100 ≤ x i ≤ 100. There are two solutions which are (0.05, a, a, ..., a) and (0.05, −a, −a, ..., −a) where a = √ (100 − 0.05 2 )/(n − 1). Case study 10 : Proposed function 2. This function can be written in the form of the system of three nonlinear equations in n variables where n is even. The system has one obvious solution at (n, n, ..., n). It is presented as follows We set n = 10 and −100 ≤ x i ≤ 100. For the first experiment, the proposed DE-R is tested and compared with two basic DE algorithms. The basic DEs follow the settings as recommended in [27] to use N P in the range of 5×D to 10 × D, F = 0.5 and CR = 0.9. Since the maximum D of all test problems is 10, N P = 50 and N P = 100 are chosen. So we denote these basic DEs as DE59-50 and DE59-100, respectively.
The DE-R method, denoted by DE59-50-R, uses N P = 50, F in [0.5, 0, 7] and the same CR = 0.9. The settings and features of DE59-50, DE59-100 and DE59-50-R are summarized in Table 3. Each algorithm is run 30 times for each problem. The maximum number of function evaluations maxnf is set to 1000000 and the V T R (value to reach) for f best is set to 10 −20 to guarantee that each f i (xbest) is less than 10 −10 . If the f best value is less than the V T R before reaching the maxnf , the successful run and the number of function evaluations used (nf ) are recorded. We report the number of successful runs (NS), the mean of function evaluations (Mean nf) and the percentage of standard derivation (%SD).

Experiment 2: Performance comparison of DE-R with PSO and PSO-NMM methods
The second experiment compares the performance of the DE-R algorithm with those of the PSO and PSO-NMM using the same setting as in [10]. The objective function f is the mean square defined by We choose the following PSO and PSO-NMM variants that obtain good results in [10]:

Results and discussion
This section shows performance comparison of the DE-R with the basic DE algorithms and the performance comparison of DE-R with PSO and PSO-NMM methods. In each report table, the best values are indicated in bold. The discussion for each experiment is given as follows.

Performance comparison of the DE-R with the basic DE algorithms
The performance comparison of DE59-50, DE59-100 and DE59-50-R are shown in Table 4, Figure  2 and Figure 3. The table presents the NS, Mean and %SD of each method for all 10 test systems. For the ability and stability of solving each problem, the number of successful runs out of the total 30 runs is considered first. From Table 4, the results show that the DE59-50 algorithm can successfully solve 6 out of 10 problems. It cannot solve the problems 6, 8, 9 and 10, and gives no successful runs out of total 30 runs for these problems. It can solve problem 5 (Chemical Equilibrium Applications) and problem 7 (Rosenbrock problem), but gives high Means and %SDs. However, DE59-50 gives the best results for the first four problems which appear to be relatively easy problems. We can conclude that DE59-50 is good for easy problems but cannot be recommended for solving difficult problems. This is because of its too small population size. The DE59-100 algorithm can solve all 10 problems. This shows that the increased population size of 100 can increase the solving ability of the DE algorithm with F = 0.5 and CR = 0.9. However, DE59-100 gives very high Mean values. This shows that its speed of convergence is rather slow. The proposed DE59-50-R algorithm can successfully solve all 10 problems. It also gives the best results (smallest Means) for the last 6 problems which appear to be difficult problems. For the first four easy problems, DE59-50-R gives the second best results. This shows that the incorporated new mutation strategy and restart technique can prevent the premature convergence and stagnation, and still gives fast convergence speeds for difficult problems as clearly shown in Figure  2 and Figure 3.
Some solutions obtained by DE59-50-R for all application systems and SINQUAD function are reported. We show only 4 different solutions for each problem. Since we set the V T R = 10 −20 , the absolute values of each f i (x) must be less than 10 −10 . Table 5 shows some solutions of Neurophysiology application. Our method gives 30   Table 6. The authors in [46] claimed that there are 16 solutions. Our results show that the proposed method gives 10 different solutions in 30 runs. All |x i | are not equal but have roughly the same order. Table 7 Table 8. Each variable can be positive or negative. The absolute values of x 10 are much smaller than others which have the same order. There are four real solutions of the Chemical equilibrium application reported in [48]. Our method gives all four solutions as shown in Table  9. For the Combustion application, the proposed method gives 30 different solutions in 30 runs. Some solutions are presented in Table 10. It shows that the absolute values of x 5 , x 6 , x 9 and x 10 are bigger than those of other components which are rather small. In case of the synthetic test problems, the numbers of solutions for each of Rosenbrock function, proposed functions 1 and 2 are at most 2 and all these solutions are found. Thus we show only the solutions of SINQUAD function which has 2 n−1 + 1 solutions. Some of them are shown in Table 11. After running the proposed method 30  and 10 −40 . The solutions once reaching each V T R level are recorded in the same run in order to investigate their behaviors and accuracies. The results are shown in tables 12, 13, and 14, respectively. One set of three solutions (each one at each accuracy level) from the same run is reported for each problem.     Table 14, we can see the same trend

Performance comparison of DE-R with PSO and PSO-NMM methods
The performance comparison of DE-R, PSO variants, and PSO-NMM variants on 4 nonlinear benchmark problems are presented in Table 15 and Table 16. The DE-R is the same as DE59-50-R in the first experiment except that it uses the setting of maxnf as described in Section 4.2.
Both tables report Min, Mean and SD of f best based on the corresponding maxnf 's. From the results, the first two problems are relatively easy problems whereas the last two problems are more difficult problems with the Chemical equilibrium

Conclusions
In this paper, we have proposed an efficient improvement of the differential evolution algorithm by using a mixing scheme of two mutation operations and a restart technique for solving the nonlinear systems. The designed algorithm has the advantage of integrating both the global and local search techniques to balance the exploration and exploitation. It can successfully solve all ten selected test problems with varying degrees of difficulty and outperforms the two basic differential evolution algorithms using the recommended setting from the literature. It also outperforms the compared methods recently developed in the literature. This performance results from the proper modification to the basic DE algorithms and shows that the DE algorithm with the restart technique is a promising tool for solving complex systems of nonlinear equations. Future study could investigate on designing and applying the differential evolution algorithms to more complicated nonlinear problems in high dimensions such as those derived from difficult nonlinear ODEs and PDEs, and those from learning models of artificial neural networks.