A randomized adaptive trust region line search method

Article History: Received 12 December 2019 Accepted 31 May 2020 Available 27 July 2020 Hybridizing the trust region, line search and simulated annealing methods, we develop a heuristic algorithm for solving unconstrained optimization problems. We make some numerical experiments on a set of CUTEr test problems to investigate efficiency of the suggested algorithm. The results show that the algorithm is practically promising.


Introduction
As the most basic nonlinear optimization problem with continuous variables, unconstrained optimization naturally arises in many disciplines such as regression, image and signal processing, physical systems, optimal control and so on. Even, based on penalization schemes, constrained nonlinear programming problems can be reformulated as unconstrained problems [1]. Generally, the problem can be defined as minimization of an objective function that depends on real variables without any restriction on their values. Among the efficient tools for solving unconstrained optimization problems there are the trust region (TR) methods and the line search (LS) techniques [1]. In each iteration of a TR method, a neighborhood is defined around the available approximation of the solution, called the trust region, and then, an approximation of the objective function is minimized within the region to achieve the new estimation. The term used for the method originates from the fact that a local approximation is trusted as the predictor of the objective function behavior. In another guideline, in each iteration of an LS method a search direction is defined at the available approximation of the solution and then, the objective function is minimized along the given direction to achieve the new estimation. As known, an LS method often requires more iterations to find a minimizer of the objective function than does a TR method, while computing the successive approximations of the solution more quickly.
To evaluate acceptability level of the local approximate model of the objective function in an arbitrary iteration of a TR method, a ratio is defined often by dividing the distance of the objective function values to the distance of their local approximations in the recent iterates. When the TR ratio is small, the approximate model is found to be a poor predictor of the objective function behavior. In such situation, the model should be resolved in a smaller region. However, when the TR ratio is large enough, the approximate model is found to be a locally suitable predictor of the *Corresponding Author objective function behavior. So, the generated estimation of the solution should be accepted and the region can be enlarged in the next iteration. It is worth noting that to decrease computational cost of the TR methods, the LS techniques can be effectively employed in the case where the TR ratio is small, as an alternative of resolving the approximate model in a reduced neighborhood. A review of the literature reveals an abundance of the studies on the TR methods; see for example [2][3][4] and the references therein.
Here, based on the simulated annealing strategy, we develop a randomized TR-LS method. The method is discussed in details in the next section. We provide a test bed to shed light on the advantages of our heuristic algorithm in Section 3. Finally, in Section 4 we come out with concluding remarks.

A randomized trust region line search algorithm
Consider the unconstrained optimization problem min x∈R n f (x) in which the objective function f : R n → R is assumed to be continuously differentiable. Iterative formula of the optimization algorithms is generally in the following form: where s k is the step taken from x k . In a TR method, often s k is an approximate solution of the following subproblem, being a local quadratic approximation of the objective function: where is called the step length.
To describe our randomization scheme, we use the framework of the TR-LS algorithm proposed in [2]. Firstly, we adopt the adaptive choice of the TR radius suggested in [5], that is in which B k is a positive definite quasi-Newton approximation of the Hessian and q k ∈ R n is a vector parameter satisfying the angle condition for some constant τ ∈ (0, 1]. To evaluate local consistency between the objective function and the quadratic model (1), we apply the following traditional TR ratio [1]: Now, for a prespecified constant µ ∈ (0, 1), if ρ k is large enough in the sense that ρ k > µ, then we set x k+1 = x k + s k . Otherwise, to avoid resolving the TR subproblem (1), we set x k+1 = x k + s k with a specific probability which depends on the value of ρ k , or (similar to the approach of [2]) we use the Armijo-type LS procedure proposed by Wan et al. [6] as follows: Line search 2.1. Let L k be an approximation of the Lipschitz constant of the gradient and set i=0 which satisfies the following inequality: where t ∈ (0, 1), σ ∈ (0, 1/2), and r ∈ [0, +∞) are real constants.
As seen, the distinct feature of our algorithm is that we may accept a trial step s k even when ρ k < µ, despite the classical TR algorithms for which such trial steps are rejected and the subproblem (1) is resolved with a smaller radius, or an LS strategy is employed. So, we need to define a reasonable probability for the mentioned randomized part of the algorithm. In this context, we apply the probabilistic approach of the simulated annealing (SA) strategy. Among the earliest and most popular metaheuristic techniques of optimization, there is the simulated annealing (SA) algorithm. The method origins from the successful annealing process of the materials which involves the cautious control of the cooling schedule [7]. SA is a local search algorithm capable of escaping from local optima by use of random hill-climbing moves in the search process [8,9]. It is very efficient in practice [9,10] and well-developed in theory [11,12].
To provide a detailed description of the SA method [8], note that similar to the TR technique, at the iteration t of the method a neighborhood N t is defined around the iterate x t . Then, a neighbor y ∈ N t is randomly selected. If y is better than x t (often in the cost function point of view, i.e. f (y) < f (x t )), then we move to y in the sense that we set x t+1 = y. However, when x t is better than y, we move to y with the probability and stay in x t otherwise, where T is a positive constant commonly called the temperature and d(x t , y) is a nonnegative function which demonstrates the measure of unfitness of the feasible solution y in contrast to x t . The temperature T controls the likelihood of cost increases in the sense that when T is small, cost increases are highly unlikely while when T is large, the value of d(x t , y) has an insignificant effect on the probability p t and any particular transition. In order to guarantee the global convergence with probability one, the temperature needs to be decreased logarithmically with the iteration number t [13], making the process too slow. In practice, the temperature is usually updated by with a prespecified constant 0 ≪ λ < 1 [11].
In order to allow probable moves to some inferior solutions as well as to reduce the effect of unsuccessful iterations (with ρ k < µ), we apply the SA scheme in our algorithm. In this context, when at the kth iteration of the algorithm the TR ratio is negative or a small positive number near to zero, we may accept the trial step s k . More exactly, if ρ k < µ, then we set x k+1 = x k + s k with the following probability: and stay in x k otherwise, where T is the temperature. Considering (4) and (5), here we set d(x k , y) = µ − ρ k with y = x k + s k . As seen, the given probability is small when ρ k ≪ µ or the temperature T is small. Here, based on the above preliminaries, we are in a position to describe the algorithm in details.
Step 3: Solve the subproblem (1) to find the trial step s k .
Step 4: Compute ρ k by (4). If ρ k ≥ µ, then set x k+1 = x k +s k , and goto Step 6; otherwise, with the probability p ρ k given by (7) set x k+1 = x k + s k and goto Step 6.
Step 5: Find the step length α k using Line search 2.1 and set x k+1 = x k + α k s k .
Step 6: Compute the new Hessian approximation B k+1 by a quasi-Newton updating formula. Set k = k + 1, decrease the temperature T and goto Step 1.
Note that if the temperature is decreased logarithmically, then, based on the classical convergence properties of the SA [13] and the convergence analysis conducted in [5], with probability one Algorithm 2.1 can be globally convergent.

Numerical experiments
Here, we present some numerical results obtained by applying MATLAB 7.14.0.739 (R2012a) implementations of RTRLS (Algorithm 2.1) and the efficient accelerated nonmonotone TR-LS algorithm proposed in [2] (in which Andrei's initial choice of the step length is employed [14]), here called AccTRLS. The runs were performed on a set of 84 unconstrained optimization test problems of the CUTEr collection [15] with the minimum dimension being equal to 50, as specified in [3], using a computer Intel(R) Core(TM)2 Duo CPU 2.00 GHz with 1.50 GB of RAM.
For both algorithms, we adopted the parameter values suggested in [2] as well as the same stopping criteria. In addition, for RTRLS we set T 0 = ||g 0 || and in Step 4, we decreased T by (6) with λ = 0.9, found to be appropriate. Among the wide scope of the choices of q k satisfying (3), here we set q k = −B −1 k g k . Similar to the approach of [2], to compute the Hessian approximation we used the scaled memoryless DFP formula where its inverse can be effectively determined in a memoryless form [1]. Also, we used the double Dogleg method [1] to solve the subproblem (1). Efficiency comparisons were drawn using the Dolan-Moré performance profile [16] on the number of iterations, number of objective function evaluations, number of gradient evaluations and the running time. Performance profile gives, for every ω ≥ 1, the proportion p(ω) of the test problems that each considered algorithmic variant has a performance within a factor of ω of the best. Figures 1-4 illustrate the results of comparisons. As seen, generally RTRLS outperforms Ac-cTRLS. It is worth noting that in 64% of the iterations RTRLS achieves the solution faster than Ac-cTRLS. Thus, in general our randomized strategy based on the SA method turns out to be practically promising. Especially, it can be employed as an alternative of the acceleration/nonmonotone schemes used in the TR-LS algorithms.

Conclusions
Employing the simulated annealing aspects in a recent adaptive trust region line search method, a heuristic algorithm has been suggested to be used in unconstrained optimization. The method can also be considered as a randomized version of the trust region line search algorithm. Numerical experiments showed that the proposed randomization scheme can enhance efficiency of the classical trust region line search algorithms; especially, it can serve as an alternative of the acceleration/nonmonotoe approaches used in the algorithms. As a future work, one can investigate possible employing of other metaheuristic algorithms in the trust region line search methods. In addition, effect of such randomized schemes on the backtracking line search techniques can be studied.