Solution of Dynamic Optimization Problems Constrained by the Fraction Penalty Method

s. This article discusses the application of fractional penalty method to solve dynamic optimization problem with state constraints. The main theories supporting the use of the method are described in some theorems and corollary. The theorems give sufficient conditions for the application of the method. Therefore, if all conditions mentioned in the theorems are met then the resulted solution will converge to the analytic solution. In addition, there are some examples to support the theory. The numerical simulation shows that the accuracy of the method is quite good. Hence, this method can play a role as an alternative method for solving dynamic optimization problem with state constraints.


Introduction
The problem of dynamic optimization or also known as optimal control plays an important role because of many problems such as engineering, industry, social, economic, financial, biological, medical that can be formulated as the problem. In general, this dynamic optimization problem is a matter of choosing a policy/control that will optimize a function of the objectives to be achieved. In choosing a policy/control, it should also be noted how the observed system changes in time. The rules that govern the system are usually written in a differential equation or a different equation depending on whether the system is formulated in a continuous or discrete form. In addition, it is also possible that some constraints limit policy/control variables, state variables or mixtures between the two variables. This makes the dynamic optimization problem very complex. Therefore, in addition to special cases such as linear or quadratic solution, analytic solutions to dynamic optimization problems are difficult to obtain. Thus, the numerical method becomes an important alternative method to solve the dynamic optimization problem.

Theoretical Framework
One important method in numerical methods that is related to constrained dynamic optimization problems is the penalty method. This method is widely used because it is very simple and easy to implement. Broadly speaking, this method works as follows: if at any time by choosing certain policies/controls the resulting system state violates the constraints then a penalty is given to the objective function. Conversely, if at any time with a certain policy/control the state of the system produced does not violate the constraints then no penalty is given to the objective function. Thus, for each time this method will choose a particular policy/control that results in a system state that does not violate the constraints. Penalties commonly given to solve dynamic optimization problems are linear as in [1]- [4]. This paper presents an alternative penalty method to solve constrained dynamic optimization problems by using penalties in the form of fractions that can be seen as an extension of linear form.

Methodology
The results of the research are literature review that is supported by numerical experimental results. The literature review is used to develop theories that provide assurance that the method developed will provide the correct solution. While numerical experiments through simulations are used to verify the proposed hypothesis. By comparing the results of the simulation with the previously-known solution, the accuracy level of the developed method can be seen. This experiment took some examples of problems with known analytical and numerical solutions.

Discussion
In this section, the basic theory of convergence using the fractional rank method in solving the dynamic optimization problem is constrained. This research is using the minimum principle of Pontrygin, which can be seen in [5], [6] and the results are tested with numerical examples that support the theory developed.

Problem Formulation
The problem discussed in this paper is a dynamic optimization problem with the state constraints described below. The objective function is minimized as the following: On condition that the initial value problem: And state constraints: In this problem, refers to state vector, refers to control vector, and refers to time. Constant > 0 refers to end time of observation. The functions : ℝ + +1 → ℝ , : ℝ → ℝ, : ℝ + +1 → ℝ and ℎ: ℝ + +1 → ℝ are known and differentiated level two of all the arguments continuously.
The problem of dynamic optimization constrained above can be solved more easily if it is changed to a problem of dynamic optimization without constraints. The step commonly performed is to add constraints as a number called penalty number into the objective function. If at any time in a state and policy/control the constraints are met, the number of penalty equals to zero. Conversely, if the constraints are not met, the number will have great value, so it is against the goal to minimize the objective function. Thus, this method will choose policy/control which at a certain time results a state that meets the given constraints. In this case, the penalty number is to the power of the following fraction: which must meet the initial value problem, The vector = ( 1 , 2 , … , ) is penalty factor which elements are the functions of time with values always greater than 0 ( ( ) > 0, ∀ = 1,2, … , , ∀ ) for each time. The function valued vector ℎ 1/ = (ℎ 1 1/ , ℎ 2 1/ , … , ℎ 1/ ) is defined as follows: Whereas the constant is a member of a set of natural numbers.
The following is given a theorem which states that solving the constraints of dynamic optimization without constraints with the fractional rank penalty method will be the same as solving the dynamic optimization with constraints.

Theorem 1
For example ( * , * ) is stationary point and is Hamiltonian function from dynamic optimization problems to constraint that is: Proof: For = 1, that is linear penalty case, the solution can be seen in (Xing, 1994). The solution will be broadened for case > 1, ∈ ℕ as follows.
For example, * + with range of policy/control and * + with (0) = 0 is a state in accordance with the policy/control, so for ‖ ‖ → 0.
In other words, it is proven that * ( ) is the local solution from the optimization without any constraints above.

Consequence 1
Based on the same assumptions on the theorem, * ( ) is also a local solution to the problem of optimization with constraints.
As aforementioned, the penalty number to the power of ℎ 1 ( , , ) is not differentiable when the value is 0. Therefore, the standard optimization algorithm that involves derivatives cannot be used. Because of that, in the practice the number should be changed into the differentiable version as follows: With smoothing constant of > 0 (small enough). In this case, the undifferentiability ℎ 1/ when the value is 1 it does not make any problem because what needs to be considered is the convergence ℎ 1/ when the value is 0, that is when the constraint is active.

Numeric Simulation
The theoretical results that have been described in the previous scheme, then is verified using numeric simulation. There are 2 samples that are used in the simulation. The software used in this simulation is MISER 3.3, a program for completing dynamic optimization/ optimal control constraints. The algorithm in MISER 3.3 is based on the parameterizatoin method of policy/control in [7]. For the sake of this study, some parts of the subprograms in MISER 3.3 need some modification. Further explanation about MISER 3.3 can be seen in [8]. is positive definite. So, all assumption in Theorem 1 is fulfilled. If ( ) ≥ ( ) for every time, the penalty method powered to a fraction with varied s value with smoothing constant ( = 0,01) will converge to the analytic solution as shown in Table 1. More specifically, for = 2, = 1 the more detailed simulation result can be seen in Table 2. From Table 2, it is seen that the numeric result for state function and optimization policy/control show very small number of differences compared to state function and optimization policy/control obtained from the analytic solution.  Figure 1 shows the function graph and optimal policy/control to time. It is seen that the result of numeric simulation is very close to the analytic solution.
The analytic solution for this constrain is difficult to obtain so the best numeric solution is used as in [8] which objective function is 0,1736 as comparison to the result of numeric simulation that has been performed.
The result of the constraint numeric simulation for various value of s with smoothing constant is shown in Table 3.  Figure 3 shows that the difference between the values that is obtained through numerical simulations performed does not differ greatly compared to the values obtained from [7] that is less than 0,01.
More specifically, for = 3, = 1, the detailed numerical simulation results are given in Table  4. Figures 2, 3, and 4 illustrate optimal control functions and optimal state functions obtained from simulation results consecutively. The similar pictures with the simulation results used as comparison can be seen in [7].

Conclusion
The conclusion of this research is theoretically the fractional rank penalty method can be used to solve dynamic optimization problems constrained by circumstances. This result is reinforced by the results of numerical simulations which show the settlement of using the fractional rank accuracy method does not differ much with the analytical solution or comparative settlement obtained by other methods. Thus, the fractional rank penalty method is effective to be used as an alternative method in solving dynamic optimization problems constrained by circumstances.