Minimization of Molecular Potential Energy Function Using newly developed Real Coded Genetic Algorithms

. The problem of finding the global minimum of molecular potential energy function is very challenging for algorithms which attempt to determine global optimal solution. The principal difficulty in minimizing the molecular potential energy function is that the number of local minima increases exponentially with the size of the molecule. The global minimum of the potential energy of a molecule corresponds to its most stable conformation, which dictates the majority of its properties. In this paper the efficiency of four newly developed real coded genetic algorithms is tested on the molecular potential energy function and their supremacy is established over other existing algorithms. The minimization of the function is performed on an independent set of internal coordinates involving only torsion angles. Computational results with up to 100 degrees of freedom are presented.


Introduction
Finding the most stable conformation of a molecule is a captivating problem as it is highly complex and its complexity increases with the increase in number of atoms.Amongst many different spatial configurations for a given molecule, the most stable one is of particular importance as it dictates most of its properties.Experimental evidence [1] shows that in the majority of the cases the most stable conformation corresponds to the one involving the global minimum of potential energy.So it can be formulated as a global optimization problem.It is an eminently challenging global optimization problem as the number of local minima increases exponentially with the size of the molecule [2].These local minimizers correspond to metastable states of the molecule and the global minimizer defines the energetically most favorable molecular conformation.Many optimization methods have already been applied to this problem, such as branch and bound methods, smoothing methods, simulated annealing, and genetic algorithms.References [2][3][4] provide an overview about these and other methods for molecular conformation problems.Further, a function [5] has been developed to test methods applied to global minimization of potential energy of molecules.In literature, many researchers for example [6][7][8][9] have applied computational intelligence methods for minimizing the potential energy function.
The aim of the present paper is to investigate the effect of newly developed RCGAs on a highly complex molecular potential energy problem and to check the efficiency of the new operators of RCGAs and to look for the their contribution in the success of an algorithm.In this paper the potential energy problem with up to 100 degrees of freedom is solved using four newly developed real coded genetic algorithms: WX-PM, WX-LLM, LX-LLM (Communicated to Applied Mathematics and Computation) and LX-PM [10].This paper is organized as follows: Section 2 describes the molecular potential energy function mathematically; Section 3 describes the real coded genetic algorithms applied for solving the problem; and Section 4 presents the performance evaluation criteria used in this paper for evaluating the performance of all algorithms used.Computational results and discussions are presented in Section 5 and conclusions in Section 6.

Problem Discussion
In a simplified molecular model consists of a linear chain of n beads centered at 1 ,..., n xx in a 3- dimensional space for every pair of consecutive beads i x and


be the bond angle corresponding to the relative position of the third bead with respect to the line containing the previous two.Likewise, for every four consecutive beads  be the angle, called the torsion angle, between the normals through the planes determined by the beads all of which can be understood from Figure 1.The potential energy of a system of atoms is explained through force field potentials, where a force field is a mathematical function which returns the energy of a system as a function of the conformation of the system.Now the forces can be written in terms of potential energy functions of various structural features such as bond lengths, bond angle, non bonded interactions etc.
Here the force field potentials corresponding to bond lengths, bond angles and torsion angles will be defined respectively as where ij r is the Euclidean distance between the beads i x and j x .The general problem is to minimize the total molecular potential energy , leading to the optimal spatial position of the beads.Using the parameters defined in [5] potential energy function takes the following form where 1,..., 3 in  and n is the number of beads in the given system.The problem is thus reduced to the existence of only one global minimum is guaranteed.As given in [6] it can also be shown that for all value of n the difference between the global minimum value E * and second best value of (3) i.e.E 2 always satisfies the following relation Although many simplifications have been done in the function E despite these, the problem remains very difficult because of the large diversity of local minimizers possible.It can be seen from the fact that corresponding to 20 beads, the number of local minimizers will be 131072 2 17  .

Real Coded Genetic Algorithm
Genetic algorithms are population based heuristics which are used to determine solution of non-linear optimization problems.GAs mimics the Darwin's principal of survival of fittest.GA uses three basic operations: selection, crossover and mutation in moving from one generation to another.GAs which make use of the real-encoding of chromosomes are termed as Real Coded GAs (RCGAs).Four different RCGAs are used in this paper, which use two real coded crossover operators WX and LX [11] and two mutation operators LLM and PM [10].The schema of RCGAs is given in Figure 2.
More details of operators used in all the algorithms used here are defined in the following subsections: where is the distance between the parents and  is a random number following Laplace distribution in case of LX and Weibull distribution in case of WX.

Mutation
A mutated solution M is created in the vicinity of the solution P as follows: Generate a random number where L andU are the lower and upper bounds of decision variable, and  is a random number following Power distribution in case of PM and Log Logistic distribution in case of LLM.

Selection Technique
A selection technique in a GA is simply a process that favors the selection of better individuals in the population for the mating pool.All RCGAs used in this paper uses tournament selection.

Computational Steps
Computational steps of algorithms used are as follows: 1. Generate a suitably large initial set of random points within the domain

Performance Evaluation Criteria
For evaluating the performance of each method the following are recorded: is the known global minimum of the problem is considered to be successful.For each problem size, MNFE represents minimum and MXFE represents the maximum number of function evaluations needed to achieve the threshold in the 100 runs performed.Also, AFE represents the average number of function evaluations and SDFE represents the standard deviation of the successful runs out of all the 100 runs performed.
The Performance Index (PI) given by Bharti [12] and used in [11] is utilized to compare the relative performance of all the four RCGAs simultaneously.This index gives prescribed weighted importance to SR, AFE and computational time.For each of the algorithms the value of PI is computed as follows: , 0 , 0 , , 0, 0 0, 0 1, 2,...
and Sr i = number of successful runs of i th problem.Tr i = total number of runs of i th problem.At i = average time used by an algorithm in obtaining the solution of i th problem.Mt i = minimum of average time used by all the algorithms in obtaining the solution of i th problem.Af i = average number of function evaluations of successful runs used by an algorithm in obtaining the solution of i th problem Mf i = minimum of average number of function evaluations of successful runs used all algorithms in obtaining the solution of i th problem N = total number of problems considered.are the weights assigned by the user to the average execution time, average number of function evaluations and the percentage of success respectively.The same methodology is adopted as given in [13] by assigning equal weights to two of these terms ( ) at a time so that, PI become function of a single variable.The resulting cases are as follows: All the RCGAs have the same termination criteria i.e. if either the maximum number of generation (5000) is reached or known global minimum is attained.

Computational Results and Discussion
In this section we present numerical results obtained for the energy function E .Since we are using a probabilistic technique, 100 independent runs are performed, each time using a different seed for the generation of random number.The parameter setting for all algorithms is given in Table 2. Computational results are presented in Table 3 in terms of function evaluations: average, minimum, maximum, and standard deviations of successful runs as well as the computational time of all new RCGAs.These are compared with the existing results of Bansal et al. [14] and Barobosa et al. [6].In Table 3 rHYB [2] denotes the staged hybrid GA with a reduced simplex and a fixed limit for simplex iterations and qPSO [14] is a hybrid PSO in which quadratic approximation operator is hybridized with PSO.In case of qPSO, no run is found to be successful, i.e. the difference between the best solution found by the algorithm and the known global minima is always greater than 0.01.
It is quite clear from the Table 3 that the newly developed RCGAs have outperformed both rHYB and qPSO.Table 3. Computational results for simplified molecular model for n=20 to 100 using WX-PM, LX-PM, WX-LLM, LX-LLM, rHYB [6] and qPSO [14].[14] To further analyze the relative performance of all RCGAs in terms of average function evaluations, a graphical representation in the form of box plot is shown in Figure 3, where the best performer is marked with star.The average function evaluation is directly proportional to the computational cost of the method.It is clear that LX-LLM is the best performing algorithm and so in terms of computational time (see Figure 4).

Conclusions
In this paper, newly developed RCGAs are successfully applied to a scalable simplified energy function.Although the energy function is taken in simplified form yet it keeps the main difficulty that the number of local minima of the function grows exponentially with problem size, making it difficult to find the global minima.Computational tests are performed with degrees of freedom varying from 20 to 100.It is clear that LX-LLM is the best performing algorithm as it is less computationally expensive (in terms of AFE and time) as well as more reliable (in terms of SR) amongst all other RCGAs applied for obtaining the global minimum.Also the consolidated effect of all factors can be seen together in the plots for PI and it is very clear that LX-LLM is performing the best.
Although LX-LLM is the best, from this it can not be concluded that Laplace crossover (LX) is better than Weibull crossover (WX) because in that case LX-PM should have performed better than algorithms which use WX.This fact compels us to notice the importance of mutation operators in the efficiency of a genetic algorithm.Then again, mutation alone can not be credited for the success of an algorithm.It is the appropriate combination of the crossover and mutation operators which guides the search of a genetic algorithm in an effective manner.In other words, exploration of the search space should be backed with appropriate diversity of the population.So here in the case of molecular potential energy function this job is done by LX-LLM.
Finally, LX-LLM has successfully obtained the global minimum of molecular potential energy function, it is observed that LX-LLM is an efficient search algorithm which is it not limited to the cases considered here but can also be applied to some other and more complex functions.Methods Applied to Global Minimization of Potential Energy of Molecules.Numerical Algorithms, 35 (2-4), 287-300 (2004).

Figure 1 .
Figure 1.a) Euclidean distance, b) Bond angle, c) Torsion angle of Coordinate Set of Atomic Chain

where 1 ijc
is bond stretching force constant, 2 ij c is angle bending force constant and 3 ij c is the torsion force constant.The constants 0 ij r and 0 ij  represent the "preferred" bond length and bond angle respectively and 0 ij  is the phase angle that defines the position of the minima.

4 E
pair of atoms separated by k covalent bonds.In addition to the above, there is also a potential which characterizes the 2-body interactions between every pair of beads separated by more than two covalent bonds along the chain.We use the following function to represent

2 P
obtained after selection, in the following manner: Generate a random number

2 )
Average computational time of successful runs (in seconds).3)Average number of function evaluations of successful runs (AFE).4) Minimum number of function evaluations of successful runs (MNFE).5) Maximum number of function evaluations of successful runs (MXFE).6) Standard Deviation of function evaluations of successful runs (SDFE).7) Performance Index (PI).A run in which the algorithm finds a solution found when the algorithm terminates and opt f

Figure 3 .
Figure 3. Box plot of Average function evaluations (AFE) for simplified molecular model.

Figure 4 .
Figure 4. Box plot of computational time (in sec) for simplified molecular model.
[5]imization of Molecular Potential Energy Function Using newly developed Real Coded GAs These local minimizers correspond to a state which is not truly stationary but is almost stationary called the metastable state of the molecule.Lavour and Maculan[5]have shown that the number of local minimizers of the function (3) is N ii in    .As E is a nonconvex function it involves numerous local minimizers even for small value of n.
[6] program is coded in C++ and executed on a Pentium IV with 1.66GHz speed and 512 MB of RAM.The potential energy function E is minimized in the specified search space [0, 5] n , where n is the total number of beads in a system.Table1reproduces the global minimum values of[6]attained for the function E corresponding to different chain sizes i.e. corresponding to n equal to 20, 40, 60, 80 and 100.top edge indicated.

Table 2 .
Parameter setting of all RCGAs used for finding the Global minimum of E