Simple Software for Optimal Control

Igor Ladnik

4.97/5 (94 votes)

23 Jun 2022CPOL15 min read

144.1K

3.1K

Simple software to solve quadratic optimal control problem

This article presents simple software for solving quadratic optimal control problem for linear and nonlinear dynamic systems.

Introduction
Problem Statement
Linear Dynamic Systems
Discrete Time
Solution for the Linear Case
Discussion of the Solution for the Linear Case
Nonlinear Systems
Method Limitations
Constant Feedback
Usage of Software
- .NET
- Python
Numeric Examples
Conclusions
Appendix
History

Introduction

In this article, I'd like to present a compact and simple-to-use software tool for optimal control of dynamic systems. Optimal control is a very broad topic. Its foundation was laid with works by Norbert Wiener, John Ragazzini, Lotfi Zadeh, Richard Bellman, Lev Pontryagin, Rudolf Kalman, and many others in the 1940s - 60s. Optimal control has numerous practical applications. Here, we will focus on the following classic optimal control problem. Our goal is to transfer a given dynamic system from its initial state to some final state with external control inputs. We desire that during this transition system, parameters follow their prescribed trajectories as close as possible. This is a very fundamental task of optimal control with a lot of practical applications. For example, to start of electrical motor limiting initial jump of electric current to acceptable value; promptly damping undesirable oscillations in electronic and mechanical systems, and countless sophisticated applications ranging from spacecraft control to missiles interception.

Dealing with this problem and even its statement requires a lot of math. I will confine myself to the necessary minimum required to describe the problem and understand how to use the software. Some additional recursive formulas are given in an appendix without proof.

Problem Statement

Let's formalize our problem. A dynamic system is described with the following set of differential equations:

(1)

where t is time, x is the state vector consisting of parameters characterizing system behavior, ẋ is vector of the first derivative of x by time, m is the control vector. Normally, the dimension of the control vector does not exceed the dimension of the state vector. In this statement, vector m is independent on vector x, and may be considered as control of the open-loop system. The open-loop control system with input control vector m and state vector x is depicted below:

Open-Loop Control System

We’d like to obtain control policy m(t) during the time interval between 0 and the final time of t_f which minimizes the cost function:

(2)

where superscript ^T denotes transposing of vector or matrix, r is the desirable state vector, u is the desirable control vector, Q and Z are weight matrices for the deviation between desirable and actual parameters of state and control vectors respectively. In most cases, Q and Z are diagonal matrices (but not necessarily!). Each diagonal element of them defines the relative weight of the square of the difference between desirable and real values of a parameter at a given moment of time. The matrices may be time-dependent, e.g., it is common practice to increase the relative weight of the differences at the final time to ensure a desirable final state. The problem formulated here is often referred to as a "tracking problem."

Linear Dynamic Systems

Before we come to a solution, let's discuss a special case of dynamic systems. The most well-studied class of the systems is linear dynamic systems, i.e., systems in which dynamics is described with a set of linear differential equations. For such a system, equation (1) may be rewritten as follows:

(3)

where A(t) and B(t) are matrices. First, we will provide a solution for this kind of system, and then spread it to nonlinear dynamic systems.

Discrete Time

Since we are going to offer a numeric solution, we will deal with the system in discrete moments of time. So we will reformulate our problem for discrete time t = k ⋅ Δt, where Δt constitutes a small time interval (sampling interval) and k is a given time step, 0 <= k < N, t_f = (N-1) ⋅ Δt. Now equations (1) may be rewritten for discrete time:

(4)

and the linear system is defined as:

(5)

where F_k and H_k are matrices obtained from A(t) and B(t) as follows:
F_k = exp(A(k⋅Δt)⋅Δt) ≅ I+A(k⋅Δt)⋅Δt, H_k = A^-1[exp(A(k⋅ Δt)⋅Δt) - I]B ≅ B(k⋅Δt)⋅Δt, where I is unit matrix.

Cost function (2) in discrete time will be reshaped to:

(6)

Solution for the Linear Case

A numeric solution to the problem may be obtained based on Bellman's principle of optimality. Results are quite bulky and presented in the appendix. For general understanding, it is essential to know that m_k control policy is calculated in two major runs. First, an inverse calculation run is performed starting with final time step N-1 and going down to initial time 0. The inverse calculation run using system matrices F and H, desirable state r and control u vectors, and weight matrices Q, Z yields some vector c_k and matrix L_k. Then the direct run is carried out from time 0 till N-1 obtaining vector m_k at each k time step as:

(7)

Both inverse and direct calculation runs are shown in the picture below:

Calculation scheme

Vectors c_N-k and matrices L_N-k are known from the inverse run and dependent on system matrices F and H, user-defined desirable vectors, and weights. They also depend on auxiliary matrices P_N-k and vectors v_N-k participating in the inverse run and starting from zero initial conditions. On each step of the direct calculation run, we obtain control vector m_k based on state vector x_k for this time. Then calculate the next state x_k+1 using equation (5). This procedure is repeated until the last time moment N-1 giving us both open-loop control policy m_k and states of the system x_k vs. time. Statement (7) provides us with closed-loop control as depicted below:

Closed-Loop Control System

where vector c constitutes closed-loop control and matrix L provides feedback from state vector x. These parameters allow the designer to generate vector m based on the actual state of the system, thus making the system more stable and robust.

Discussion of the Solution for the Linear Case

For linear systems, control policy m_k minimum of the cost function (6) is achieved with just one iteration consisting of one inverse and one direct run. As we have stated above, this policy depends on system matrices and user-chosen parameters, namely desirable vectors and weight matrices. These parameters are intuitive enough to reduce the control policy design to a play with them. The user chooses a set of parameters and calculates the control policy and appropriate state trajectories. If she/he is not satisfied with the result, the calculation is repeated for another set of desirable vectors and weight matrices. It is also interesting to observe that if system matrices F, H, and weight matrix Q are time-invariant then feedback matrix L will be also time-invariant, which considerably simplified the design of the system’s controller. For linear case system matrices F, H are normally time-invariant. So, to attain time-invariant feedback, the user should choose constant weight matrix Q, i.e. control without formal strict achievement of desirable final state at the final moment of time. In real life by proper choice of Q, we can achieve a desirable final state in an acceptable time, even before our final time. Although it is convenient to implement control policy using state vector coordinates, in real life usually not all those coordinates are available for measurement. Various techniques have been developed to overcome this difficulty. Different kinds of state coordinate observers may be used to estimate unmeasured coordinates using information for the coordinates available for measurement.

Nonlinear Systems

The nonlinear system case is much more complex. Unlike linear systems, it does not have an overall decision for all kinds of systems. Here, we can try to reduce the nonlinear optimization problem to a linear one using the quasilinearization approach. To use the same math solution, we have to present a general dynamic system description given by equation (1) in a linearized form similar to equation (5). To achieve this, we should consider an iterative calculation process. We can linearize (1) to (5) in a particular point on x_k and m_k of i-th trajectory:

(8)

where:

(9)

Linearized system matrices F and H are matrices of appropriate partial derivatives of functions f at point k on i-th trajectory. State and control vectors on (i+1)-th trajectory may be presented as:

(10)

Equation (8) may be formulated for increments Δx and Δm. In this case, in cost function (6), new vectors r and u correspond to:

(11)

respectively. This allows us to reduce the nonlinear problem to a sequence of quasilinear iterations, each of which is depicted in the computational scheme figure above. The algorithm consists of the following steps:

Assuming for iteration i=0 vectors x_k and m_k (normally all zeros accept x_k=0 initial conditions, but the choice is up to the user) calculate linearized all F_kⁱ and H_kⁱ for i-th trajectory according to (8).
Calculate new r_kⁱ and u_kⁱ for i-th iteration according to (11).
Calculate Δm_kⁱ and Δx_kⁱ using inverse and direct runs as for linear case.
Calculate x_kⁱ⁺¹ and m_kⁱ⁺¹ for (i+1)-th iteration.
Check condition to stop iterations. This condition may be based, e.g., on the difference between values of cost function compared to the previous iteration, or simply contain a fixed number of iterations. Anyway, if the condition is not satisfied, then steps 1-5 should be repeated until the stop condition will be satisfied. After the condition was satisfied, policy m_k and states x_k may be considered as the final solution.

Obtaining exact formulas for partial derivatives of functions f may be tedious work, and becomes impossible when functions f are defined numerically, e.g., as a table. In these cases, it is possible to use simple numeric approximation by giving small increments to a state vector coordinate of partial derivative and computing the result value of f function. The code sample provides appropriate examples.

Method Limitations

Like any numeric approach, this one has its limitations. One limitation is rather obvious and should be considered for both linear and nonlinear cases: the time sampling interval should be small enough to guarantee convergence of calculations. Desirable values of r, u, and weight factors Q and Z should be reasonable to achieve intuitively optimal system behavior. In the case of nonlinear systems, we should clearly understand that this is a "game without strict rules," and thus there is not any guarantee that the technique is applicable to a particular case. The method is sensitive to the computing of linearized system matrices accuracy. More complex methods may be used to estimate the applicability of this approach to a particular nonlinear system, but usage of the try-and-fail approach seems more practical for engineers. Again, in some way dealing with nonlinear systems is still "art rather than science."

Constant Feedback

As in the linear case, control with constant feedback may be considered for nonlinear systems. To obtain constant feedback in nonlinear case weight matrices Q, Z, the initial trajectory for state x and control m should be time-invariant (except for initial state x₀), and an acceptable result should be achieved in a single iteration (since system matrices F and H calculated according to (9) depend on previous iteration trajectory). Normally, engineers adopt a rather informal approach to system optimization, when delivering minimum to our cost function is not the ultimate goal but a means to attain acceptable system behavior. So it may be wise in some cases to sacrifice further cost function minimization for the sake of feedback simplicity. It is also possible that in some cases constant feedback may to some extent compensate for nonlinearity. Of course, like everything in nonlinear systems, it is possible not in all cases. An appropriate numeric example will be discussed below.

Usage of Software

.NET

The software is written in plain C# without any additional libraries. In the last update, it has been converted to .NET 6 (DLLs are in .NET Standard 2.0) . .NET demo can be run in Windows and Linux alike by command

dotnet Test.dll

Project MatrixCalc provides arithmetic operations with vectors and matrices including transposition and inversion. The project SequentialQuadratic supports calculations for system optimization and simulation. Project Test uses static methods and interfaces SequentialQuadratic to solve particular problems. DLL SequentialQuadratic has several internal types responsible for different optimization problems (linear with constant weights and desirable values, linear with changed in time weights and desirable values, and appropriate nonlinear). Instances of these types are created with overloaded static methods SQ.Create() and exposed to the caller via a set of appropriate interfaces IBase, IBaseLQExt, ITrackingLQ and INonLinearQ. In order to perform the optimization procedure, the user should call the appropriate method providing input and getting the required interface. Then method RunOptimization() should be called to actually carry out optimization calculations, followed by the method Output2Csv() to write results in comma-separated format to an Excel-compatible *.csv file. To simulate the system with a given control policy, the method Simulate() is called. A sample of the code for optimization of a nonlinear system (Van der Pol oscillator) is shown below:

public static void VanDerPol()
{
	const string TITLE = "Van der Pol";
	Console.WriteLine($"Begin \"{TITLE}\"");

	var lstFormatting = new List<OutputConfig> 
	{
		new OutputConfig { outputType = OutputType.Control, index = 0, title="m0",
		                      scaleFactor=1 },
		new OutputConfig { outputType = OutputType.State, index = 0, title="x0" },
		new OutputConfig { outputType = OutputType.State, index = 1, title="x1" },
	};

	const int dimensionX = 2;
	const int dimensionM = 1;

	const double dt = 0.1;
	const int N = 51;

	const int maxIterations = 17;

	var r = new Vector(dimensionX) { [0] = 0, [1] = 0 };
	var u = new Vector(dimensionM) { [0] = 0 };
	var Q = new Matrix(dimensionX, dimensionX) { [0, 0] = 1, [1, 1] = 1 };
	var Z = new Matrix(dimensionM, dimensionM) { [0, 0] = 1 };

	var xInit = new Vector(dimensionX) { [0] = 1, [1] = 0 };

	var mu = 1;

	var functions = new CalcDelegate[]
	{
		(k, deltaT, m, x) => x[1],
		(k, deltaT, m, x) => mu * (1 - Sq(x[0])) * x[1] - x[0] + m[0]
	};

	// Exact formulas for gradients calculation
	var gradientsA = new CalcDelegate[dimensionX, dimensionX];
	gradientsA[0, 0] = (k, deltaT, m, x) => 0;
	gradientsA[0, 1] = (k, deltaT, m, x) => 1;
	gradientsA[1, 0] = (k, deltaT, m, x) => -(2 * mu * x[0] * x[1] + 1);
	gradientsA[1, 1] = (k, deltaT, m, x) => mu * (1 - Sq(x[0]));

	var gradientsB = new CalcDelegate[dimensionX, dimensionM];
	gradientsB[0, 0] = (k, deltaT, m, x) => 0;
	gradientsB[1, 0] = (k, deltaT, m, x) => 1;

	(SQ.Create(functions, gradientsA, gradientsB, Q, Z, r, u, xInit, dt, N,
				(currCost, prevCost, iteration) => iteration > maxIterations)
		.RunOptimization()
		.Output2Csv("VanDerPol_ExactGrads", lstFormatting) as INonLinearQ)
		.OutputExecutionDetails(isExactGradient: true);

	// Numeric gradients calculation
	var delta = 0.1;
	(SQ.Create(functions, new Vector(dimensionM, delta), new Vector(dimensionX, delta),
						 Q, Z, r, u, xInit, dt, N,
						 (currCost, prevCost, iteration) => iteration > maxIterations)
		.RunOptimization()
		.Output2Csv("VanDerPol_NumGrads", lstFormatting) as INonLinearQ)
		.OutputExecutionDetails(isExactGradient: false);

	Console.WriteLine($"End \"{TITLE}\"{Environment.NewLine}");
}

Python

I provide a little simplified version of the code written in Python3. Currently, it contains one linear and one nonlinear example. This code runs both in Windows and Linux (I tested it under Ubuntu 20.04) environments. To run the code, you need to install the latest version of Python3 and its numpy and scipy packages, unzip source code, and start file Program.py

in Linux with

python3 Program.py

and in Windows with

python Program.py

command.

Python version supports transients visualization. To activate this feature, please install matplotlib package and remove all comments #1 in file Program.py.

Numeric Examples

Linear System

There is a mechanical system: mass M is moving by force. According to Newton's second law, the force is equal to the result of the multiplication of mass and acceleration which is the second derivative of displacement. Let's denote displacement as x₀, velocity as x_1, and force as m₀. The system is described with the following second-order matrix differential equation:

ẋ = Ax + Bm, where:

A =

\begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}

B =

\begin{bmatrix} 0 \\ 1/M \end{bmatrix}

M = 1

x_k=0 =

\begin{bmatrix} 0 \\ 0 \end{bmatrix}

Cost function is ∫₀⁴⁰[(10 - x₀)² + 3x₁²]dt

Q =

\begin{bmatrix} 1 & 0 \\ 0 & 3 \end{bmatrix}

Z =

\begin{bmatrix} 0 \end{bmatrix}

r =

\begin{bmatrix} 10 \\ 0 \end{bmatrix}

u =

\begin{bmatrix} 0 \end{bmatrix}

Δt = 1, N = 41.

Transients of the linear system

Closed-loop input and feedback factors obtained as a result of calculations are

input 4.48,

feedbacks: from x₀ -0.448 and from x₁ -1.22 .

Van der Pol Oscillator

This is a nonlinear system. Our goal is smooth and fast damping of ongoing oscillations.

ẋ₀ = x₁
ẋ₁ = μ⋅x₁⋅(1 - x₀²) - x₀ + m₀, μ=1

x_k=0 =

\begin{bmatrix} 1 \\ 0 \end{bmatrix}

Cost function is ∫₀⁸(x₀² + x₁² + m₀²)dt

Q =

\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}

Z =

\begin{bmatrix} 1 \end{bmatrix}

r =

\begin{bmatrix} 0 \\ 0 \end{bmatrix}

u =

\begin{bmatrix} 0 \end{bmatrix}

Δt = 0.1, N = 81.

Transients of the controlled Van der Pol oscillator after 3 iterations

In the above case with chosen desirable state and control vectors and weight matrices, we observe good convergent iterative calculations.

Rayleigh Equation with Scalar Control

This is another nonlinear oscillation system. Our goal is the same as in previous case: smooth and fast ongoing oscillations damping.

ẋ₀ = x₁
ẋ₁ = -x₀ + (1.4 - 0.14⋅x₁²)⋅x₁ + 4⋅m₀

x_k=0 =

\begin{bmatrix} -5 \\ -5 \end{bmatrix}

Cost function is ∫₀⁵(x₀² + m₀²)dt

Q =

\begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}

Z =

\begin{bmatrix} 1 \end{bmatrix}

r =

\begin{bmatrix} 0 \\ 0 \end{bmatrix}

u =

\begin{bmatrix} 0 \\ \end{bmatrix}

Δt = 0.01, N = 501.

Transients after 3 iterations

In this case, even the first iteration provides an acceptable result. But the value of the cost function does not steadily reduce for chosen optimization parameters.

Nonlinear System with Constant Feedback

This is a nonlinear system that can be adequately translated from its given initial state to zero state using constant feedback.

ẋ₀ = x₁
ẋ₁ = -x₀⋅[π/2 + artan(5⋅x₀)] - 5⋅x₀² / [2⋅(1 + 25⋅x₀²)] + 4⋅x₁ + 3⋅m₀

x_k=0 =

\begin{bmatrix} 3 \\ -2 \end{bmatrix}

Cost function is ∫₀¹⁰(5x₁² + m₀²)dt

Q =

\begin{bmatrix} 0 & 0 \\ 0 & 5 \end{bmatrix}

Z =

\begin{bmatrix} 1 \end{bmatrix}

r =

\begin{bmatrix} 0 \\ 0 \end{bmatrix}

u =

\begin{bmatrix} 0 \\ \end{bmatrix}

Δt = 0.01, N = 1001.

Transients for a single iteration

m₀ vs. x₁ for a single iteration

The above figures depict transients calculated with our software for chosen weight matrices after a single iteration. The last figure shows relations between controlled state coordinate x₁ and control m₀ for our calculations.

Two-Link Robotic Manipulator

The application of the described technique to optimal control of a two-link robotic manipulator is presented in this CodeProject post. Implementation may be foumd in the code.

Quadcopter

This CodeProject article presents optimal control of a quadcopter. The code is a little updated there.

More numeric examples are provided in the code sample.

Conclusions

The article briefly explains the numeric solution of sequential quadratic optimal control problem for linear and nonlinear systems. The quasilinearization approach is used to reduce the nonlinear problem to a set of sequential linear-quadratic problems for a linearized system. Compact and simple-to-use software is provided along with examples of its usage.

Appendix

Parameters c and L in recursive equation (7) for calculation of control vector m in the inverse run may be obtained from dynamic programming Bellman equation (format of this article does not permit to provide appropriate maths transformations) as a set of the following recursive operations:

where P and v are recursively calculated auxiliary matrix and vector respectively, k starts from k = N - 1 with P₀ = 0, v₀ = 0.

Note that the above calculations require inverting auxiliary W matrix dependent on transfer matrix H, weight matrices, and matrix P.

History

9^th January, 2015: Initial version
9^th June, 2021: Update
11^th June, 2022: An error fixed in the code

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Simple Software for Optimal Control

Table of Contents

Introduction

Problem Statement

Linear Dynamic Systems

Discrete Time

Solution for the Linear Case

Discussion of the Solution for the Linear Case

Nonlinear Systems

Method Limitations

Constant Feedback

Usage of Software

.NET

Python

Numeric Examples

Linear System

Van der Pol Oscillator

Rayleigh Equation with Scalar Control

Nonlinear System with Constant Feedback

Two-Link Robotic Manipulator

Quadcopter

Conclusions

Appendix

History

License