Minimization over randomly selected lines

This paper presents a population-based evolutionary optimization method for minimizing a given cost function. The mutation operator of this method selects randomly oriented lines in the cost function domain, constructs quadratic functions interpolating the cost function at three different points over each line, and uses extrema of the quadratics as mutated points. The crossover operator modifies each mutated point based on components of two points in population, instead of one point as is usually performed in other evolutionary algorithms. The stopping criterion of this method depends on the number of almost degenerate quadratics. We demonstrate that the proposed method with these mutation and crossover operations achieves faster and more robust convergence than the well-known Differential Evolution and Particle Swarm algorithms.


Introduction
Population-based evolutionary optimization methods try to minimize a given cost function by using a set of points [1,2,3].They are called evolutionary as they create variations in population through mutation and crossover operations and then select the variations that improve previous generation.In general they create variations based on only cost function evaluations thus they are easy-to-use for the cost functions whose gradient and Hessian are not known due to their complexity, discontinuity, or non-numeric structure.This paper presents a new mutation operation based on quadratic functions and a new crossover operation for creating variations.
Quadratic functions are frequently used in both deterministic and stochastic optimization methods.
The Newton's method uses quadratic approximation of the cost function at current iterate and assigns its minimizer as the next best guess [4].The quasi-Newton algorithm finds a descent direction based on the BFGS (Broyden-Fletcher-Goldfarb-Shanno) Hessian approximation and performs an inexact line search by using quadratic and cubic function models in [5].The line search in [6] also uses quadratic interpolation for finding a minimizer over the line.The stochastic method Controlled Random Search [7] has a mutation operator which randomly chooses three points in population and fits a quadratic to these points.A similar mutation operator for Differential Evolution (DE) is also used in [8], which constructs the mutated vector by using either the DE mutation scheme or the quadratic interpolation based on a fixed mutation probability.
This work is supported in part by the US National Science Foundation under grant DMR-0520547.
The proposed mutation operator tries to learn cost function surface by fitting quadratics over one-dimensional slices taken from the search space.These slices are defined by lines passing through pairs of points in population.The points are randomly paired in order to achieve as uniform approximation as possible to the function.Using minima of convex quadratics as mutated points improves convergence rates as they often indicate regions of the search space with smaller function values.We note that this mutation approach is different than DE's mutation [9,10,11] which depends on only the points in population, whereas the proposed approach uses both the points as well as function values at these points.The crossover operation involves two points in population for improving robustness and efficiency.It replaces randomly selected entries of a mutated vector by corresponding entries of two other vectors in population.The proposed method's stopping criterion stops the search when at least a pre-specified number of quadratics are almost degenerate.

Formulation
Consider a set of n-dimensional real vectors X = {x 1 , x 2 , . . ., x n P } with n P elements.The algorithm pairs each point x i with another point x j in X randomly.The paired points (x i , x j ) are called the parent points.For instance a population of five points x 1 , x 2 , x 3 , x 4 , and x 5 are illustrated as white circles in Figure 1 for two dimensional Rosenbrock function, where random pairing resulted into these pairs: (x 1 ,x 5 ), (x 2 ,x 1 ), (x 3 ,x 4 ), (x 4 ,x 2 ), and (x 5 ,x 4 ).The parent point x i is also called a target point since it competes with a trial point xi , which is a variational point constructed by the mutation and crossover operators described in this section.The pairing process results into n P pairs, where i th pair has x i as its first entry thus guaranteeing each point in X to be a target point.
The parametric form of the line passing through x i and x j can be written as x i +µ(x j −x i ) where µ is a real number.For instance five lines passing through the paired points are illustrated in Figure 1.The algorithm randomly chooses a number µ k and finds a third point x k = x i + µ k p i over the line where p i = x j − x i .Note that a step µ k p i from x i toward x j is taken if µ k > 0, otherwise a step in the opposite direction is taken.When x k is far from x i and x j , the resulting quadratic model may have large mismatches with the underlying cost function.Therefore we choose relatively small step sizes by uniformly drawing µ k from the union of two intervals: Note that µ k chosen from this distribution satisfies µ k ̸ = 0 and µ k ̸ = 1 thus x k ̸ = x i and x k ̸ = x j , and a unique quadratic passing through (x i , f (x i )), (x j , f (x j )), and (x k , f (x k )) exists.In Figure 1 five sampled points are represented by yellow circles.
Consider the function ϕ(µ) = f (x i +µp i ) representing the one-dimensional cross section of f (x) along the line passing through x i and x j .Since function values ϕ(0) = f (x i ) , ϕ(1) = f (x j ) , and ϕ(µ k ) = f (x k ) are known, the coefficients of the interpolating quadratic function φ(µ) = aµ 2 + bµ + c can be found as: The critical point of φ(µ) is µ * = −b/(2a) and corresponding point is x * = x i + µ * p i .If the quadratic is convex (a > 0), then x * is assigned to be the mutated point.When the quadratic is concave, it has a maximizer instead of a minimizer.In this case, the step µ * p i from x i leads to the maximizer, thus the step −µ * p i away from the maximizer is taken, i.e. x * = x i + (b/ (2a)) p i .This concave case handling is in contrast to our previous study [12] which allows steps toward the maximizer under a condition.For numerical stability we treat the model being convex if a > 10 −6 and concave if a < −10 −6 .When |a| 10 −6 , the quadratic is considered to be degenerate and x i remains in the next generation.Figure 1 demonstrates a concave quadratic over the line passing through (x 1 , x 5 ) and four convex quadratics over the other lines.
The crossover operator constructs the trial vector xi by replacing some entries of x * with the corresponding entries of either x i and x j by the following rule: where superscript k denotes the k th entry of the vector, r k is drawn from Uniform[0,1], the uniform distribution between 0 and 1, and ρ CR denotes a pre-fixed crossover constant chosen between 0 and 1.This rule means that 100 • ρ CR percent of the trial vector xi is determined by x * , half of the remaining entries is determined by x i , and the other half by x j .For instance, if ρ CR = 0.8, the contributions of x * , x i , and x j vectors are 80, 10, and 10 percents on average respectively.
The selection operator selects xi as the i th member of the next generation if f (x i ) < f (x i ).Otherwise, the target vector x i remains in the next generation.
The above mutation, crossover, and selection operations are performed until a stopping criterion is satisfied.The stopping criterion of this method depends on the number of almost degenerate quadratics.The method stops when the number of quadratics n D satisfying |a| < ϵ D and |b| < ϵ D is equal to or larger than n D max , where ϵ D is a small positive number and n D max is a number such that 1 n D max n P .This means that if at least n D max quadratics are approximately constant functions, then the search stops.When this stopping criterion is not satisfied for some cost functions, the method also stops if improvement in the function value is smaller than a positive number ϵ N I over at least n N I max generations [13].
Since minimization is based on extrema over randomly selected lines, we shortly refer to this approach as the Random Lines (RL) method and summarize it in Figure 2.

Performance Evaluation
Performance of a stochastic optimization method is often determined by its robustness and efficiency.
When the same function is minimized multiple times, a robust method finds the global minimizer more frequently and an efficient method finds the minimizer after smaller number of function evaluations.We minimize each cost function 20 times and count total number of successes n S as a measure of robustness.Since the stopping criterion using the absolute values of quadratic coefficients is only relevant to the RL method, for a systematic comparison of all methods we consider that a run is successful when a method finds a point x best satisfying f (x best ) − f (x opt ) < ϵ opt where x opt is the known global minimizer and ϵ opt = 10 −5 .The run is unsuccessful if improvement in function value is less than ϵ N I = 10 −12 for n N I = 50 generations.In order to compare efficiency, the average of number of function evaluations n F E for minimizing each cost function is also determined.
Since performance may vary for cost functions with different properties, we perform comparison over 50 cost functions listed in Table 1.The test suite includes separable and nonseparable, normally-scaled and ill-scaled, unimodal and multimodal cost functions which are often used in optimization literature [14,15,17].In order to compare scalability of the methods to higher dimensional problems, we minimize 2, 10, and 20-dimensional forms of the 16 extendible cost functions listed in Table 1.
The RL method is compared with the DE [9] and the Particle Swarm with constriction (PS) f o r e a c h i i n {1, 2, ..., n P } 8 draw j from {1, 2, . . ., i − 1, i + 1, . . ., n P } // p a i r i n g and f i n d i n g x k . . .9  Table 1.The cost functions used in performance comparison.The initial population is generated by uniformly choosing points from the hypercube defined by α k ∈ [l 1 , l 2 ] for k = 1, 2, . . ., n where [l 1 , l 2 ] are listed under search space columns.Dimensions of cost functions are listed in parenthesis, where the extendible functions are indicated by (n) symbol.

Cost Function Search Space
Reference Cost Function Search Space Reference [19,20] methods.All methods have the same population size of n P = 20n.The DE and RL methods have the crossover constant of ρ CR = 0.9.Since the DE/rand/1/bin variant of the DE algorithm is robust and fast convergent, this variant is used in many studies [9,14,21,22], therefore it is also used in this section.The scaling factor for DE is ρ F = 0.5.The constricted Particle Swarm method uses the default constriction constants c 1 = 2.8 and c 2 = 1.3.The RL uses the default parameters specified in Figure 2 except n P .The initial generations are constructed by uniformly drawing points from the hypercubes specified in Table 1.Table 2 presents performance results for the cost functions with fixed dimensions and for 2dimensional forms of the extendible functions.We note that RL is the most robust method by achieving total number of 985 successes over 1000 runs, while PS and DE converge over 932 and 910 runs respectively.Since number of function evaluations for one cost function can be an outlier, total number of function evaluations may not indicate efficiency without bias.We count total number functions for which an algorithm requires the least number of function evaluations.In this sense, RL is the most efficient method by requiring least number of function evaluations for 29 functions, which is followed by DE and PS with 14 and 7 functions respectively.
To compare scalability to higher dimensional cost functions, the results for 10 and 20dimensional forms of the extendible functions are listed in Table 3 and 4 respectively.Results for PS are not presented in these tables as its robustness decreases substantially for these cost functions.Table 3 shows that the RL is more robust than DE since they converge over 303 and 260 runs out of 320 runs respectively.Notice that DE cannot achieve any success for 3 functions whereas RL achieves at least 6 successes for any cost function.The RL method is more efficient for 12 cost functions while this number is 1 for DE.When the problem dimension increases to 20, robustness of DE decreases as its n S reduces from 260 to 206.In contrast, robustness of RL slightly increases as its n S increases from 303 to 308.RL achieves at least 10 successes for any 20-dimensional function while DE does not have any success for 4 functions.RL is also more efficient than DE for 11 functions and DE is more efficient for 1 function.
In the next experiment, the RL method described above remains the same but its crossover operation given by ( 2) is replaced with a crossover operation similar to DE's crossover: Notice from Table 5 that the RL crossover slightly increases the robustness as there are 13, 4, and 5 more number of successes for the cost functions in Tables 2, 3, and 4 respectively.The RL crossover also achieves smaller number of function evaluations than the DE crossover over 39, 11, and 11 functions in Tables 2, 3, and 4 respectively.
In order to determine suitable values for the stopping criterion parameters ϵ D and n D max for the RL method, 12 cost functions are selected and minimized for 20 times.These cost functions are Ackley(2), Alpine(2), Bard(3), Branin(2), Camel6(2), Exponential(2), Goldstein(2), Rosenbrock(2), Schwefel12(2), Schwefel221(2), Schwe-fel222(2), and Sphere (2).Let f error be the difference between the function value f best at the converged point x best and the function value f opt at the global minimizer.Note that f best f (x i ) for all x i ∈ X.Table 6 shows the average f error and the associated average number of function evaluations for different ϵ D and n D max values, where n D max = ρ ratio • n P , ρ ratio is a number satisfying 0 < ρ ratio 1, and n P = 20n.Notice that accuracy of the solution and the number of function evaluations increase for increasing ρ ratio and decreasing ϵ D values.For instance average error is less than 10 −4 for ϵ D 10 −3 .Similarly error is smaller than 10 −2 for all ρ ratio 0.3.Even though accuracy figures may vary largely for different cost functions, ϵ D = 10 −4 and ρ ratio = 0.2 might be good initial choices for achieving smaller errors within smaller number of function evaluations.In order to achieve more accurate results, one needs to further increase ρ ratio and/or decrease ϵ D .

Conclusion
The paper proposes an evolutionary optimization method with new mutation and crossover operations.The mutation operation uses quadratic interpolating functions over randomly selected lines in the cost function domain.The crossover operation replaces entries of a mutated vector with the corresponding entries of either of two parent vectors.Its stopping criterion is satisfied if a pre-specified number of quadratics is almost degenerate.Numerical performance evaluation over many types of cost functions demonstrates that proposed method yields promising results compared to the well-known DE and PS algorithms.It is also shown that this method scales to minimize higher dimensional functions successfully.

1 2 Figure 1 .
Figure 1.The contours of 2-dimensional Rosenbrock function with five random lines.The minimizer of this function x opt = [1, 1]T is illustrated by the star sign.White circles represent parent points, yellow circles represent randomly sampled points, and red diamonds represent extrema of the quadratics passing through three different points over the lines.

Figure 2 .
Figure 2. The Random Lines (RL) method.The default parameter values are µ 1 = 0.3, µ 2 = 0.7, ρ CR = 0.9, ϵ D = 10 −4 , n D max = 0.2n P , ϵ N I = 10 −12 , n N I max = 50, n P = 10n.The symbols X C and X + denote the current and next generations.The keyword "continue" means to skip the rest of the for loop and continue with the next i.Comments are written after two slash characters.Increase n P for increasing robustness.Increase n D max and/or decrease ϵ D for increasing accuracy of solution.

Table 2 .
The total number of successes n S and the average and standard deviation of n F E for 50 cost functions

Table 3 .
The total number of successes n S and the average and standard deviation of n F E for 10-dimensional cost functions

Table 4 .
The total number of successes n S and the average and standard deviation of n F E for 20-dimensional cost functions S ave(n F E ) std(n F E ) n S ave(n F E ) std(n F E )

Table 5 .
Results for the RL method with the RL crossover given by (2) and with the DE's crossover operator.Last column specifies the number of cost functions for which a method requires less number of function evaluations than the other method.

Table 6 .
The average number of function evaluations ave(n F E ) and average error in function values ave(f error ) after RL finds at least n D max = ρ ratio • n P quadratics satisfying |a| < ϵ D and |b| < ϵ D .