Simultaneous model spin-up and parameter identification with the one-shot method in a climate model example

We investigate the One-shot Optimization strategy introduced in this form by Hamdi and Griewank for the applicability and efficiency to identify parameters in models of the earth’s climate system. Parameters of a box model of the North Atlantic Thermohaline Circulation are optimized with respect to the fit of model output to data given by another model of intermediate complexity. Since the model is run into a steady state by a pseudo time-stepping, efficient techniques are necessary to avoid extensive recomputations or storing when using gradient-based local optimization algorithms. The One-shot approach simultaneously updates state, adjoint and parameter values. For the required partial derivatives, the algorithmic/automatic differentiation tool TAF was used. Numerical results are compared to results obtained by the BFGS and L-BFGS quasi-Newton method.


Introduction
Parameter optimization is an important task in all kinds of climate models or models that simulate parts of the climate system, as for example ocean or atmospheric models.Still, some processes are not well-known, some are too smallscaled in time or space, and others are just beyond the scope of the model.All these processes are parameterized, i.e. simplified model functions (parameterizations) are used.These necessarily include lots of -most of the time -only heuristically known parameters.A main task thus is to calibrate the models by optimizing the parameter w.r.t.data from measurements or other (more complex) models.
Similar to many applications in engineering applications of fluid mechanics, also in geophysical flows (e.g.ocean models) an optimization is at first performed for steady states of the equations before proceeding to transient problems.This means that only the stationary solution is used in the cost or objective function to be minimized.Moreover (and this is the second point where engineering and geophysical flow problems are similar), the computation of steady states is often performed by running a transient model into the steady state.This strategy is called pseudo time-stepping, since the time variable may be regarded as a kind of iteration counter.
It is well known from optimal control of differential equations that the classical adjoint technique (that allows the representation of the gradient of the cost) leads to a huge amount of recomputations, storing or both.This problem looks even more frustrating in the pseudo-time stepping context, since here only the final, numerically converged state is important for the cost.Nevertheless a classical adjoint technique would need all intermediate iterates.
If the number of parameters to be optimized is small, a sensitivity equation approach is also reasonable.On the discrete level this is comparable to the application of the forward mode of Automatic or Algorithmic Differentiation (AD).
Here, the sensitivity equation has the same temporal integration direction (namely forward) as the original pseudo time-stepping.But nevertheless it is worthwhile investigating how the two (for a non-linear model) coupled iterations for state and sensitivity are performed.
Griewank described in [1] the differences between two-phase (where the iteration for the state is run to the steady state or fixed point first, and then the sensitivity is computed) and piggy-back approaches (where both iterations are combined to one).Christianson in [2] proposed to perform the sensitivity iteration with the converged state instead of using its iterates.Giering, Kaminski and Vossbeck in [3] used the so-called Full Jacobian approach, where they directly used the steady state equation and differentiated it to obtain an equation for the gradient.
The approach used here is called One-shot approach, which was in this form developed by Hamdi and Griewank, and can be seen as an extension of the piggy-back strategy aiming for optimality and feasibility simultaneously with the so-called bounded retardation.That means that the number of One-shot iterations shall not too much exceed the number of fixed point iteration steps that are necessary for the computation of feasible states itself.Theoretical results were published in [4], [5], an engineering application was presented by Özkaya and Gauger in [6].
The idea of simultaneous solution of state equations and parameter correction is not new.In [7], S. Ta'asn uses a pseudo-time embedding for the state and adjoint state equations and the design equation is solved as an additional boundary condition.This still results in a differential algebraic equation which requires some strategy to solve the design equation alone.
In [8], the authors construct a system of only ODEs which is solved by a time-stepping method in the spirit of reduced SQP-methods.They develop a preconditioner working on the whole system of equations with state, costate and design equations.
In the One-shot approach used here, the idea is that for fixed parameters there is a given (not necessarily (pseudo-) time-stepping) strategy to solve the state equations.This strategy is assumed to demand no or disallow any changes.In each iteration step the update of the state is augmented by an update of the adjoint state and a kind of quasi-Newton step for the design correction with the distinctive feature that the required preconditioner controls convergence of the whole system.Here, the preconditioner is a squared matrix of only the size of the number of parameters.
Since the assumptions in the theoretical analysis of the One-shot method are very strict and the computation of the preconditioner seems at first glance laborious and expensive, the intention of this paper is to check the applicability of the One-shot strategy for real world problems and possibly propose simplifications.We compare numerical results to the gradient based BFGS and limited-memory BFGS (L-BFGS) methods.We set aside the comparison to genetic or so-called intelligent search algorithms, see e.g.[9], because the aim of the One-shot approach according to the authors of [4] and [5] is to offer an alternative to local gradient-based optimization techniques.Genetic algorithms usually demand a high number of function evaluations which we want to avoid because of the costly computation of steady states needed for the function evaluation.
In this paper, we apply the One-shot approach to a box model of the North Atlantic.This problem is different from the application in [6] in that the parameters enter in a nonlinear fashion resulting in so-called non-separable adjoints where the adjoint is no longer only the sum of a term on the state and a term on design.
The outline of this paper is the following.In section 2 we recall the One-shot optimization strategy according to [4] and [5].We apply the One-shot method to an example in earth system modeling in section 3. There, we describe the Rahmstorf 4-box-model, the optimization problem and present numerical results.Section 4 draws conclusions.

One-shot Optimization Strategy
In this section, we recall the One-shot optimization strategy according to [4] and [5], its quintessence and difference to conventional optimization methods, and we derive and explain the One-shot iteration step.First of all, we describe the mathematical problem behind the parameter optimization problem.

Problem formulation
Parameters u of a model describing physical, biological, chemical or other real life phenomena are usually determined by fitting model output y = y(u) to observed data denoted by y data .This data can also be taken from other, more comprehensive models.
The fitting procedure then is a mathematical optimization problem with a least-squares cost functional with some regularization term under the constraint that model equations, namely c(y, u) = 0, are fulfilled.
In climate modeling, model equations are usually partial and/or ordinary differential equations solved by an iterative process.
The problem will become more difficult with respect to uniqueness of minima and computation of derivative information, if the quantity to be fit to data g data is computed from a functional g(y, u) such that J then is In the finite dimensional case or the discretized version, where y ∈ Y ⊂ R n , u ∈ U ⊂ R m and g : Y × U → R l , the cost function is the sum of the squared differences Here, the objective function J is J : Y × U → R, y ∈ Y is the state, u ∈ U is the design or parameter vector to be optimized.With the help of the regularization term α 2 ∥u − u guess ∥ 2 2 parameters u are kept in an acceptable or presumed range around parameter values u guess , where elements u i,guess can for example be taken as mean values of some maximum and minimum values.We assume J to be C 2,1 , which means twice continuously differentiable in y and once in u.We further assume the Jacobian of c with respect to y, denoted c y , to always be invertible, such that with the mean value theorem, there exists only one y * with c(y * , u) = 0 for a fixed u.

One-shot iteration and its properties
In practice, finding an analytical solution for a feasible state y * with c(y * , u) = 0 is often impossible.That is why usually an iterative method is called upon.
For the One-shot strategy, we assume that there is a given fixed point iteration, also called model spin-up , which has already been found reliable and successful in the search for the feasible state y * for parameters u.Included step size or preconditioner strategies can be carried over and do not have influence on the One-shot iteration.Thus, there is a given contraction, (pseudo-) time-stepping strategy or fixed point iteration G, where y * satisfies y * = G(y * , u) = lim k→∞ G(y k , u).
The fundamental idea of the One-shot approach is to reformulate the condition c(y, u) = 0 into the fixed point equation y = G(y, u).The iteration function G : Y × U → Y is assumed to be C 2,1 with the contraction factor ρ < 1, i.e. for a suitable inner product norm ∥ • ∥ we have for G y , denoting the Jacobian of G with respect to y, that from which follows With the contraction property of G we can infer from the Banach fixed point theorem, for fixed u, the sequence y k+1 = G(y k , u) converges to a unique limit y * = y * (u).
The assumptions on the model function c and the contraction G are very strict and rarely analytically or even numerically provable.However, we will see in our numerical example, that the One-shot strategy even converges under weaker assumptions on the contraction G. Here, in our example of the 4-box-ocean-model only the Ciric or quasi-contraction property, see [10], is fulfilled.With the help of the fixed point reformulation, the optimization problem can be written as min y,u

J(y, u) s.t. y = G(y, u). (P)
A conventional optimization strategy performs the following steps: In the outer loop do in the k-th iteration step: • Perform a complete model spin-up (inner loop) with parameter values u k and obtain an admissible state End the outer loop when a sufficient optimality condition is satisfied.
Of course, adjusting the parameters demands further full model spin-ups and/or expensive derivative information for whose computation again full model spin-ups are necessary.
The main idea of the One-shot strategy is to adjust model parameters already during the model spin-up.
Using the method of Lagrange Multipliers, in the finite dimensional case, the associated Lagrangian to problem (P) with the Lagrange multiplier or adjoint state ȳ ∈ Ȳ is where we introduce the shifted Lagrangian N as A Karush-Kuhn-Tucker (KKT) point (y * , ȳ * , u * ) fulfilling the first order necessary optimality condition must satisfy Motivated by this system of equations, the following coupled full step iteration, called One-shot strategy according to the authors of [4], [5], to reach a KKT point is derived: Do in the k-th iteration step: until there is (numerically) no change in Here, B k is a design space preconditioner which must be selected to be symmetric positive definite.As mentioned above, we do not want to introduce additional preconditioners for the updates of y and ȳ, because of the assumption that the model spin-up has already been found reliable and successful in the search for steady states.
The contractivity (2) ensures that the first equation in the coupled iteration step (4) converges ρ-linearly for fixed u.Although the second equation exhibits a certain time-lag, it converges with the same asymptotic R-factor (see [11]).As far as the convergence of the coupled iteration ( 4) is concerned, the goal is to find B k that ensures that the spectral radius of the coupled iteration (4) stays below 1 and as close as possible to ρ.In subsection 2.3, we recall the formula of appropriate preconditioners B k according to the authors of [4], [5].

Required derivatives and automatic differentiation
For the One-shot update (4) and also later in the computation of the preconditioners B k , a lot of derivative information is needed.The costs for its calculation are small compared to those of a conventional approach, because they only depend on the previous iteration step value.The storing/recomputation of intermediate partial derivatives, as for example ∂y ∂u for the computation of derivatives of J or N with respect to u, is not necessary which is one of the main differences and advantages compared to traditional optimization techniques.
Applying a tool for automatic/algorithmic differentiation (AD) can even more reduce costs and most importantly, AD computes exact derivatives without any approximation errors.
AD is a software technology to compute the derivative of a function at costs of only a small multiple of the costs for the evaluation of the function itself.With the help of source code transformation or operator overloading an AD tool provides the user with a computer programme containing the derivatives.Those tools are for example TAF or ADiMat, which use the source code transformation approach to generate Fortran or Matlab subroutines to calculate function values and derivative information in one call, see [12] and [13], or for example ADOL-C using the operator overloading concept in C/C++ codes, see [14].
Regarding the One-shot optimization strategy, we need gradients (namely J y , J u ) and vector-Jacobian-products which can cheaply be obtained with the reverse mode of AD.For the calculation of the preconditioner B also second derivatives and full Jacobians are needed which are calculated via the application of the reverse mode first and the forward mode afterwards.In our testings, we apply the (commercial) AD tool TAF for Fortran subroutines.

Preconditioner B and the doubly augmented Lagrangian
In this section, we explain the choice of the preconditioners B k according to [4] and [5].For the sake of simplicity, we omit the iteration index k using the notation B.
For the derivation of the preconditioner B, we introduce the doubly augmented Lagrangian L a which is the Lagrangian of the original problem augmented by the errors in primal and dual feasibility.Here α L > 0 and β L > 0 are weighting coefficients.The authors of [4] prove that under certain conditions on α L and β L (see below), stationary points of problem (P) are also stationary points of L a and that L a is an exact penalty function.This leads to the idea to choose B as an approximation to the Hessian of L a , i.e.B ≈ ∇ uu L a .
In [4], it is proven that descent of the augmented Lagrangian is provided for any preconditioner B fulfilling i.e.B − B 0 is positive semidefinite, and where The authors of [4] propose to choose α L and β L such that B −1 0 is as large as possible.Using (5) we get Minimizing the right most formula as a function of α L and β L and replacing σ with (6) yields: Under the assumption that 2 ∥N yy ∥ holds and ∥N yy ∥ ̸ = 0 we obtain and As mentioned above, we pursue to B ≈ ∇ uu L a .It turns out that at a stationary point of L a , where primal and dual feasibility hold, the Hessian of L a , namely ∇ uu L a , is As L a is an exact penalty function, we have ∇ uu L a ≻ 0 in a neighbourhood of the constrained optimization solution.Assuming that N uu ≻ 0 implies that the preconditioner fulfills (5) and thus the step ∆u k = −B −1 N u (y k , ȳk , u k ) of the coupled iteration (4) yields descent on L a .

BGFS update to avoid computation of full Jacobians and 2nd order derivatives
In the calculation of the preconditioner B full Jacobians and second derivatives are needed.On the one hand, those can also be calculated by algorithmic differentiation, but on the other hand, a possibility to avoid this is the application of a Low-Rank BFGS update to update the inverse approximation H k of B k .In view of the relation B ≈ ∇ uu L a , we use the following secant equation in the update of where Imposing to the step multiplier η to satisfy the second Wolfe condition A simpler procedure could skip the update whenever (8) does not hold by either setting H k+1 to identity or to the last iterate H k .Provided (8) holds, we use The weights α L , β L of L a require norms of second order derivatives.In [5], the authors propose simpler approximations according to two different approaches.In the first version then in the second approach ∥N yy ∥ 2 can be computed via the power iteration.
For the BFGS update, the calculation of R k requires a pure design step (step with fixed primal and dual variables y and ȳ respectively), which might be computed at high costs.We will pay attention to this fact in our numerical example.

Application in Earth System Modeling
To exemplify the benefit of the One-shot optimization strategy in the case of climate research, we present the application to a 4-box-model of the Atlantic Thermohaline Circulation.The 4box-model described in [15] simulates the flow rate of the Atlantic Ocean known as the 'conveyor belt', carrying heat northward and having a significant impact on climate in northwestern Europe.Temperatures T i and salinity differences S i in four different boxes i = 1, ..., 4, namely the southern, northern, tropical and the deep Atlantic, are the characteristics inducing the flow rate.The surface boxes exchange heat and freshwater with the overlying atmosphere, which causes a pressure-driven circulation, compare figure 1.In [16] a smooth coupling of the two possible flow directions is proposed.The resulting time dependent ODE system reads: T * i , i = 1, 2, 3 are so-called restoring temperatures, which can be seen as counterparts of the three surface temperatures.Further model parameters are physical, relaxation and coupling constants among which there are well-known fixed parameters and those which are tunable parameters.See [15] for an explanation of the occurring constants, fixed parameters and tunable parameters.

The optimization problem
As mentioned in the introduction, in climate modeling an optimization is at first performed for steady states, which means in this example for temperatures and salinities which do not change in time anymore.Given fresh water fluxes (f 1,i ) l i=1 , corresponding to different warming scenarios, the aim is to fit the overturning values m i = m(y(f 1,i ), u) computed from stationary temperatures and salinities (T 1 , T 2 , S 1 , S 2 ) i obtained by the model spin-up for f 1,i to data m d,i from a more complex model Climber2, see [17].u = (T * 1 , T * 2 , T * 3 , Γ, k, a) are the control parameters.Here, Γ is a thermal coupling constant in the computation of the thermal relaxation constants λ i , i = 1, 2, 3.All other parameters occur in the model description of the previous subsection.Using notations from section 2, the state is y = (y i ) l i=1 with If F (y, u) denotes the right-hand side of the ODE system of the model, we get min y,u The regularization term incorporates a prior guess u guess for the parameters.The larger α the more the parameters u are kept close to u guess .
The difficulty here is that m : R 8l × R 6 → R l is not injective.There are several combinations of steady/feasible T 1 , T 2 , S 1 , S 2 and the parameter u(5) = k to compute the same overturning m.The smaller α the more likely the different optimization strategies find completely different optimal parameters with almost the same function values J(y * , u * ).
In [15] the authors apply the Explicit Euler time stepping with a fixed step size of one year, i.e. ∆t = 1, to run the model into a steady state.Otherwise, known model constants scaled on a time span of one year must be adjusted.Thus G defined in section 2 here represents one full Euler step y k+1 = G(y k , u) = y k + F (y k , u) operating on all freshwater fluxes f 1,i together, i.e. for fixed u we have G(•, u) : R 8l → R 8l .
In this example, contractivity of G is not given in general, i.e. ρ in (1) exceeds 1 for several steps.However, in average it is less than 1.Here, for the explicit Euler sequence y k+1 = G(y k ) = y k + F (y k , u), the quasi-contraction property [10] for 0 ≤ q < 1 holds.In our testings, G converges for fixed u but different starting values y 0 to the same stationary y * .

Numerical results and discussion
In our numerical testing, we compare the two versions of the One-shot method, with full computation of the preconditioner B on the one hand and BFGS update of B on the other hand, to a traditional BFGS-quasi-Newton optimization approach.Furthermore, we compare results to values obtained by the Limited-memory BFGS (L-BFGS) algorithm implemented by Zhu, Byrd, Nocedal and Morales, see [18], version 3.0 from 2011, without and finally with box constraints on the control parameters (L-BFGS-B) because we find that computed optimal parameter values of the BFGS and L-BFGS method are far away from actual real world values.In the three different BFGS approaches, for each parameter value u k during the optimization process the box model has to be run into a steady state.In our example, that takes between 4,000 and 15,000 Euler steps.Compared to more complex climate models, here the Euler time step evaluation is not expensive.However, during the optimization process a large number of Explicit Euler time steps will accumulate and for derivative calculation a huge amount of recomputations, storing or both is necessary.That becomes obvious in the calculation of derivatives using automatic differentiation.Whereas for the BFGS method in the reverse mode it is necessary to store all Euler steps until a steady state is reached, in the One-shot method the required derivatives depend on the current values only, i.e. on only one Euler step.
For better initialization especially of the adjoint, we propose an update of only the state and adjoint state for the first 500 iteration steps.
The One-shot-BFGS strategy demands a linesearch procedure, otherwise the method fails.Here, we applied a simple strategy constantly halfing the steplength until there is a reduction in the costfunction with the resulting step.
We perform our numerical testings on a SUN-W-Ultra-SPARC-IIIi CPU 1.3GHz machine.

Influence of rare update of weighting coefficients of the preconditioners B k on the optimization
In the first version, we calculate preconditioners B k defined in (7) in every iteration including all first and second order derivatives.Also the weighting coefficients α L , β L and σ are adjusted.We find, that the weights do not change significantly from iteration to iteration.As one can see in Table 1, an update performed only after several time-steps does not significantly influence the optimization but the computational time needed.Therefore, we prefer the version with a calculation of α L , β L and σ every 1,000 iterations.

Effect of the weighting factor α on the numerical results
In the following, our attention is drawn on the effect of the weighting factor α in front of the penalty term ∥u − u guess ∥.For the last parameter a we chose the additional factor 0.01, because a is of higher dimension than the other parameters and can vary more.Here in the example of the 4-box-model, without any regularization, i.e. α = 0, the One-shot method and the L-BFGS method without constraints do not converge or fail.The BFGS method and the L-BFGS with box constraints terminate with parameter values u * where ∥J u (y(u * ), u * )∥ still is very large, but the algorithms cannot find descent directions.
We recall from section 3.1 that the considered optimization problem has several local minima which might be of the same quality regarding the data fit, even though the obtained model parameters are completely different.The larger the weighting factor α the less the obtained parameters vary.
In our testings with α > 0, we compare the optimal value of the cost function, the data fit weighted to the number of observations, the number of iteration steps, the number of needed Euler steps, and the computational time in minutes.Furthermore, we take a look at the quality of optimality, which means for the One-shot strategies the norm of L (y,ȳ,u) (y * , ȳ * , u * ) and for the BFGS methods the norm of J u (y(u * ), u * ).The numerical results are collected in tables 2 and 3 and illustrated in figures 2 and 3.
Not surprisingly, one generally detects that the smaller α the better the fit of data becomes.
We observe that for different α the qualities of the methods vary.Especially for large fresh water fluxes f 1,i the outputs of the different optimization strategies strongly differ.These are f 1,i for which the model switches the flow direction of m during the model spin-up.
Comparing the original One-shot and the Oneshot-BFGS methods, the presumption that the One-shot-BFGS strategy might be rather time consuming due to the additional pure design steps is confirmed.Here, in an example with a very small number of parameters to be optimized, the One-shot-BFGS approach is not recommended.However, in problems with a large number of design variables, the One-shot-BFGS approach might be an alternative.The computed data fit can be regarded as equally good in this example.
For α = 10 the strategies show almost no difference in their results, neither in the fit of the data nor in the computed optimal parameters.Concerning computational time and the number of Euler steps, the original One-shot strategy perform best.
For α = 1 and α = 0.1 the One-shot strategy shows difficulties in performance.We suspect that here the balance between keeping parameters close to u guess and reducing the misfit has a disadvantageous influence on the One-shot method.However, also the BFGS method does not perform well for α = 0.1.
For smaller α, we observe significant differences.The unconstrained BFGS strategies find the best fit, but parameter values (u * 1 , u * 2 , u * 3 ) which are not acceptable in this real world problem.L-BFGS-B computes similar results as the One-shot method, but needs far more Euler steps and therewith a much longer computational time.
We detect that the parameters computed by the One-shot method stay in acceptable ranges without any box constraints.Computed parameters are to some extend similar to those of the L-BFGS-B method.
One main goal of the One-shot strategy was to achieve so-called bounded retardation for the speed of convergence compared to the number of time-steps needed to run the model into a steady state.Since the Explicit Euler timestepping does not show quick converge and ratios θ k = ∥y k+1 −y k ∥ ∥y k −y k−1 ∥ even exceed 1 for several steps k, one cannot expect the One-shot method to converge very fast in this special example.The average value for θ in a pure model spin-up with parameters taken from [15] is 0.992 and for the One-shot strategy (α = 0.1) θ = 0.9999884.
Furthermore, the number of One-shot iteration steps was intended to exceed the number of Euler steps of a single model spin-up not too much.Especially for parameter sets near the computed solution, a model spin-up with fixed parameters needs 12,000 to 15,000 Euler steps.For α ∈ {10, 0.01, 0.001} where the One-shot strategy shows good performance, the observed number of iterations is about 10 to 40 times larger than the number of Euler steps for one single spin-up.Considering that the BFGS strategies need at least about 30 optimization steps requiring further function evaluations and model spin-ups the factor is not very large.Even in those cases, where the One-shot strategies does not show quick convergence, the number of iterations still is not too far away from the number of Euler steps required by the BFGS strategies.In applications, where the fixed point iteration G is more expensive than the Euler time-stepping applied in this example, the One-shot strategy  then might catch up with the needed computational time.

Conclusions
We have successfully applied the One-shot method according to Hamdi and Griewank, [4], [5], to a parameter optimization problem in ocean modeling.We have analyzed its applicability and find that the One-shot strategy presents a promising approach to optimize models consuming much time and calculational costs for their spin-ups using (pseudo-)time stepping or a fixed point iteration.Our numerical example was the parameter optimization of the Rahmstorf 4-boxmodel of the North Atlantic with steady states achieved via an Explicit Euler spin-up.Optimization results of the original One-shot strategy and the One-shot-BFGS method with an BFGS update of the preconditioner of the parameter correction step are compared to a classical BFGSquasi-Newton method and the L-BFGS-method with and without box constraints on the parameters.
We observed that the One-shot-BFGS strategy does not show good performance in this example with only 6 parameters.The original version with full computation of the preconditioner performs well for large and very small weighting factors α in front of the penalty term.Further analysis on why One-shot has difficulties in finding optimal values for weights α ∈ {1, 0.1} can be valuable.
We have found out that the One-shot method can be applied even though contractivity is not given in general and that fixing the contraction factor ρ to a number close to 1 is adequate.Furthermore, computation of the weights of B is not mandatory in each iteration step.
Considering examples with more expensive model spin-ups, the One-shot method might on the one hand even gain (or at least catch up in those examples with slow convergence) concerning computational time and on the other hand be the only applicable alternative for derivative based optimization methods, because derivatives depend on one spin-up step only instead of the whole spin-up, which is the main difference and advantage compared to standard methods.The application to earth system models involving nonlinear PDEs and/or a higher spatial resolution with computationally more expensive model solvers and periodic solutions will be of great interest for future investigations to demonstrate the efficiency of the One-shot approach.
Table 2. Results of the optimization comparing values of the cost functional, the weighted data fit, the number of iterations of the optimization procedure, the number of needed Euler steps, the quality of optimality (∥L (y,ȳ,u) (y * , ȳ * , u * )∥ for methods 1-2 and ∥J u (y(u * ), u * )∥ for methods 3-5 respectively) and the computational time in minutes Method J(y
) where for some positive a, m + = m 1−e −am almost coincides with the meridional volume transport or overturning m = k(β m (S 2 − S 1 ) − α m (T 2 − T 1 )) for positive m and is almost zero for negative m.The term m − = −m 1−e am becomes almost zero for positive m and −m for negative m.That means the summands including m + and m − are activated or deactivated depending on the flow direction.The deviation from the physically correct model becomes smaller the larger a is.Several model parameters are involved, the most important being the freshwater flux f 1 containing atmospheric water vapor transport and winddriven oceanic transport; they are used to simulate global warming in the model and are chosen in the interval [−0.2, 0.15].

Figure 2 .
Figure 2. Results of the optimization comparing the One-shot strategies with the BFGSquasi-Newton methods for α = 0.1 (left) and the differences to the Climber data (right).

Figure 3 .
Figure 3. Results of the optimization comparing the One-shot strategies with the BFGSquasi-Newton method for α = 0.001 (left) and the difference to the Climber data (right).

Table 1 .
Effect on the optimization of rare update of the weights α L , β L and σ for α = 0.1.We compare the values of the cost functional, the weighted data fit, the number of iterations and the computational time in minutes.update of weights of B J(y * , u * ) data fit # iterations comp.time , u * ) data fit #iterations #Euler steps opt.cond.comp.time *Legend of Methods: 1 One-shot, 2 One-shot-BFGS, 3 BFGS, 4 L-BFGS, 5 L-BFGS-B