Optimal Output Regulation for General Linear Systems via Adaptive Dynamic Programming

1 Introduction

In this study, we consider an adaptive optimal output regulation problem for general linear systems. The purpose is to obtain both optimal feedback control gain and optimal feedforward control gain, which appear in the optimal controller and can help realize asymptotic and disturbance rejection. First, adaptive dynamic programming technique is used to solve the optimal feedback control gain. In reality, the problem of solving the optimal solution of a regulator equation can be transformed into a certain minimum norm problem in a Hilbert space. With the help of the projection theorem in the Hilbert space, the optimal solution of the regulator equation can be explicitly expressed.

2 Problem Formulation

In this study, the dynamics of the plant is described by

$\begin{aligned} &\dot{x}=Ax+Bu+Ew,\\ &\dot{w}=\Gamma w,\\ &e=Cx+Du+Fw.\qquad (1) \end{aligned}$

Let $J_\Gamma={diag}(J_1,J_2,\cdots,J_l)\in \mathbb{R}^{q_m\times q_m}$ be the Jordan form corresponding to the matrix $\Gamma$ in (1),
where

$J_i=\begin{pmatrix}\lambda_i(\Gamma)&1&0&\cdots&0&0\\0&\lambda_i(\Gamma)&1&\cdots&0&0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\vdots\\0&0&0&\cdots&\lambda_i(\Gamma)&1\\0&0&0&\cdots&0&\lambda_i(\Gamma)\end{pmatrix}\in \mathbb{R}^{q_{i}}$ , $q_{1}+q_{2}+\cdots+q_{l}=q_m.$

There must exist a nonsingular matrix $T\in \mathbb{R}^{q_m\times q_m}$ , such that $T^{-1}\Gamma T=J_{\Gamma}$ .
Denote $w_1=T^{-1}w\in \mathbb{R}^{q_m}$ .

Considering the dynamics (1), one has

$\dot{w}_1=J_\Gamma w_1. \qquad (2)$

The linear output regulation problem is solved by designing a controller as follows

$u=-Kx+Lw_1, \qquad (3)$

where $K$ is a feedback gain matrix, $L=U+KX$ is a feedforward gain matrix to be designed,
and $X$ , $U$ satisfy the following regulator equation

$\begin{aligned} XJ_\Gamma=&AX+BU+ET,\\ 0=&CX+DU+FT. \qquad (4) \end{aligned}$

Let $\hat{x}=x-Xw_1$ and $\hat{u}=u-Uw_1$ , then one has

$\begin{aligned} \dot{\hat{x}} =&A\hat{x}+B\hat{u}, \qquad (5) \end{aligned}$

$\begin{aligned} e =&C\hat{x}+D\hat{u}. \qquad (6) \end{aligned}$

The optimal output regulation problem is solved if and only if there exist an optimal feedback control gain $K^*$ and an optimal feedforward control gain $L^*$ such that $K^*$ and $L^*$ are respectively the minimizer of these two predefined performance indices.

To solve the optimal feedback control gain matrix $K^*$ , we construct the following problem.
Problem 1

$\begin{aligned} & \underset{(\hat{x},\hat{u})}{\text{minimize}} & & J(\hat{x},\hat{u})=\frac{1}{2}\int\limits_{t}^\infty \hat{x}^TQ\hat{x}+\hat{u}^TR\hat{u}d\tau \qquad (7) \\ & \text{subject to } & & (5), \end{aligned}$

where $Q=Q^T\geq 0$ , $R=R^T>0$ , and $(A,\sqrt{Q})$ is observable.
Since $L=U+KX$ , equally, $L^*=U^*+K^*X^*$ , which means that the computation of $L^*$ can be solved by the computation of $U^*, K^*$ and $X^*$ . With the optimal feedback gain matrix $K^*$ from Problem 1, to solve the optimal feedforward control gain matrix $L^*$ , Problem 2 is formally stated as follows.

Problem 2

$\begin{aligned} & \underset{(X,U)}{\text{minimize}} & & \bar{J}(X,U)=\text{Tr}(X^TX+U^TU) \\ & \text{subject to } & & (4). \end{aligned}$

3 Optimal Feedback Gain Matrix

In this section, a learning strategy is proposed, which can properly approximate the positive definite matrix $P^*$ and the optimal feedback control gain $K^*$ .

Let $A_{j}=A-BK_{j}$ . Then the system (1) can be written as

$\begin{aligned} \dot{x}&=A_{j}x+B(K_{j}x+u)+Ew. \qquad (8) \end{aligned}$

Let $Q_{j}=Q+(K_{j})^TRK_{j}$ . Then, one has

$\begin{aligned} &x^T(t+\delta t)P_{j}x(t+\delta t)-x^T(t)P_{j}x(t)\\ =&\int_t^{t+\delta t}\Big[-x^TQ_{j}x+2\big(u+K_{j}x\big)^TRK_{j+1}x+2w^TE^TP_{j}x\Big]d\tau. \qquad (9) \end{aligned}$

By Kronecker product representation, the expression in (9) can be further transformed as

$\begin{aligned} x^TQ_{j}x&=(x^T\otimes x^T)\text{vec}(Q_{j}),\\ \big(u+K_{j}x\big)^TRK_{j+1}x&=\Big[\Big(x^T\otimes x^T\Big)\Big(I_{n}\otimes(K_{j}R)\Big)+\Big(x^T\otimes u^T\Big)\Big(I_{n}\otimes R\Big)\Big]\text{vec}(K_{j+1}),\\ w^TE^TP_{j}x&=(x^T\otimes w^T)\text{vec}(E^TP_{j}). \qquad (10) \end{aligned}$

Furthermore, for positive integer $s$ , considering (10), let

$\begin{aligned} \delta_{xx}&=\Big[\text{vecv}(x(t_1))-\text{vecv}(x(t_0)), \cdots, \text{vecv}(x(t_s))-\text{vecv}(x(t_{s-1}))\Big]^T,\\ \Gamma_{xx}&=\Big[\int_{t_0}^{t_1}x\otimes xd\tau, \int_{t_1}^{t_2}x\otimes xd\tau, \cdots, \int_{t_{s-1}}^{t_s}x\otimes xd\tau\Big]^T,\\ \Gamma_{\bar{x}}&=\Big[\int_{t_0}^{t_1}\text{vecv}(x)d\tau,\cdots, \int_{t_{s-1}}^{t_s}\text{vecv}(x)d\tau\Big]^T,\\ \Gamma_{xu}&=\Big[\int_{t_0}^{t_1}x\otimes ud\tau, \int_{t_1}^{t_2}x\otimes ud\tau, \cdots, \int_{t_{s-1}}^{t_s}x\otimes ud\tau\Big]^T,\\ \Gamma_{xw}&=\Big[\int_{t_0}^{t_1}x\otimes wd\tau, \int_{t_1}^{t_2}x\otimes wd\tau, \cdots, \int_{t_{s-1}}^{t_s}x\otimes wd\tau\Big]^T, \qquad (11) \end{aligned}$

where $t_0 \lt t_1 \lt \cdots \lt t_s$ are positive integers. Considering (10) and (11), one can change (9) into the following form

$\Psi_{j}\begin{pmatrix}\text{vecs}(P_j)\\ \text{vec}(K_{j+1})\\ \text{vec}(E^TP_{j})\end{pmatrix}=\Phi_{j}, \qquad (12)$

where $\Psi_{j}=\Big[\delta_{xx}, -2\Gamma_{xx}\Big(I_{n}\otimes \big((K_{j})^TR\big)\Big)-2\Gamma_{xu}(I_{n}\otimes R), -2\Gamma_{xw}\Big]$ and $\Phi_{j}=-\Gamma_{xx}\text{vec}(Q_j)$ .

Assume that $\text{rank}\Big(\big[\Gamma_{xx}, \Gamma_{xu}, \Gamma_{xw}\big]\Big)=\frac{n(n+1)}{2}+(m+q_m)n$ holds, we note that $\Psi_{j}$ is of full column rank. Therefore, the solution of (12) can be written as follows

$\begin{pmatrix}\text{vecs}(P_j)\\ \text{vec}(K_{j+1})\\ \text{vec}(E^TP_{j})\end{pmatrix}=(\Psi_{j}^T\Psi_{j})^{-1}\Psi_{j}^T\Phi_{j}. \qquad (13)$

It follows from (13) that both $P_j$ and $K_{j+1}$ can be calculated. We obtain that

$B=P_{j}^{-1}K_{j+1}^TR, \qquad (14)$

which can help compute the matrix $B$ . Let $j\leftarrow j+1$ . The stopping criterion is $||P_{j+1}-P_j||\leq \varepsilon$ , where $\varepsilon$ is a small positive constant and $j\geq 1$ .

4 Optimal Feedforward Gain Matrix

Having computed the approximation of the optimal feedback gain matrix $K^*$ in the above subsection, our next step is to compute the approximation of the optimal feedforward gain matrix $L^*$ , which is the solution of Problem 2.

To obtain the solutions of the regulator equation (4), we first rewrite (4) as the following form

$\begin{aligned} &\begin{pmatrix}I_{n}&0_{n\times m}\\0_{p\times n}&0_{p\times m}\end{pmatrix}\begin{pmatrix}X\\U\end{pmatrix}J_\Gamma-\begin{pmatrix}A&B\\C&D\end{pmatrix}\begin{pmatrix}X\\U\end{pmatrix}=\begin{pmatrix}ET\\FT\end{pmatrix}. \qquad (15) \end{aligned}$

Furthermore, (15) can be rewritten as the following standard linear equation by using the properties of the Kronecker product

$\bar{\mathcal{A}}\bar{z}=b, \qquad (16)$

where $\bar{\mathcal{A}}=J_\Gamma^T\otimes \mathcal{E} -I_q\otimes\bar{A}\in \mathbb{R}^{\Big(q_m(n+p)\Big)\times \Big(q_m(n+m)\Big)}$ , $\mathcal{E}=\begin{pmatrix}I_{n}&0_{n\times m}\\0_{p\times n}&0_{p\times m}\end{pmatrix}$ , $\bar{A}=\begin{pmatrix}A&B\\C&D\end{pmatrix}$ , $\bar{z}=\text{vec}\begin{pmatrix}X\\U\end{pmatrix}\in \mathbb{R}^{q_m(n+m)}$ , and $b=\text{vec}\begin{pmatrix}ET\\FT\end{pmatrix}\in \mathbb{R}^{q_m(n+p)}$ . Therefore, the matrix $\bar{\mathcal{A}}=\text{blockdiag}(\hat{\mathcal{A}}_{1},\hat{\mathcal{A}}_{2},\cdots,\hat{\mathcal{A}}_{l})$ is a block diagonal matrix with its $j$ th $(j=1,2,\cdots,l)$ block diagonal element as

$\hat{\mathcal{A}}_{j}=\begin{pmatrix}\lambda_j(\Gamma)\mathcal{E}-\bar{A}&0&\cdots&0&0\\ \mathcal{E}&\lambda_j(\Gamma)\mathcal{E}-\bar{A}&\cdots&0&0\\ \vdots&\vdots&\vdots&\vdots&\vdots\\0&0&\cdots&\mathcal{E}&\lambda_j(\Gamma)\mathcal{E}-\bar{A}\end{pmatrix}, \qquad (17)$

where $\hat{\mathcal{A}}_{j}\in \mathbb{R}^{\Big(q_j(n+p)\Big)\times \Big(q_j(n+m)\Big)}$ .

The matrix $\bar{\mathcal{A}}$ is of full row rank. Let $\bar{\mathcal{A}}_{j}$ denotes the $j$ th row of the matrix $\bar{\mathcal{A}}$ for $j=1,2,\cdots,(n+p)q_m$ .
Obviously, $\bar{\mathcal{A}}_{j}$ and $\bar{\mathcal{A}}_{k}$ for $j,k=1,2,\cdots,(n+p)q_m$ , $j\neq k$ are independent vectors, and the condition (16) can be understood as
$(\bar{\mathcal{A}}_{1},\bar{z})=b_{1}, \cdots, (\bar{\mathcal{A}}_{(n+p)q_m},\bar{z})=b_{(n+p)q_m},$ where $b_{j}$ denotes the $j$ th element of the column vector $b$ for $j=1,2,\cdots,(n+p)q_m$ .

The detailed expression of the optimal solution of (4) can be written as

$\bar{z}=\sum\limits_{j=1}^{(n+p)q_m}\alpha_{j}\bar{\mathcal{A}}_{j}, \qquad (18)$

which has the minimum form and these coefficients $\alpha_1,\cdots,\alpha_{(n+p)q_m}$ satisfy

$\begin{aligned} \langle \bar{\mathcal{A}}_{1},\bar{\mathcal{A}}_{1}\rangle\alpha_{1}+\cdots+\langle \bar{\mathcal{A}}_{(n+p)q_m},\bar{\mathcal{A}}_{1}\rangle\alpha_{(n+p)q_m}=&b_{1},\\ \cdots\quad\quad\cdots\quad\quad\cdots\quad\quad \cdots\quad\quad \cdots&\\ \langle \bar{\mathcal{A}}_{1},\bar{\mathcal{A}}_{(n+p)q_m}\rangle\alpha_{1}+\cdots+\langle \bar{\mathcal{A}}_{(n+p)q_m},\bar{\mathcal{A}}_{(n+p)q_m}\rangle\alpha_{(n+p)q_m}=&b_{(n+p)q_m}. \qquad (19) \end{aligned}$

Denote

$\hat{\mathcal{A}}=\begin{pmatrix}\langle \bar{\mathcal{A}}_{1},\bar{\mathcal{A}}_{1}\rangle&\cdots&\langle \bar{\mathcal{A}}_{(n+p)q_m},\bar{\mathcal{A}}_{1}\rangle\\ \vdots&\vdots&\vdots\\ \langle \bar{\mathcal{A}}_{1},\bar{\mathcal{A}}_{(n+p)q_m}\rangle&\cdots&\langle \bar{\mathcal{A}}_{(n+p)q_m},\bar{\mathcal{A}}_{(n+p)q_m}\rangle\end{pmatrix}.$ $\hat{\mathcal{A}}$ is a positive definite matrix. Denote $\alpha={col}(\alpha_{1},\alpha_{2},\cdots,\alpha_{(n+p)q_m})$ . Then one has $\alpha=\hat{\mathcal{A}}^{-1}b$ .
Based on the variable $\alpha$ , we compute the optimal solution of Problem 2, $\begin{pmatrix}X^*\\ U^*\end{pmatrix}$ , and then compute the approximated optimal feedforward control gain $L_j^*=U^*+K_j^*X^*$ .

5 Simulation

In this section, a simulation example is given to prove the correctness of our theory results.

The dynamics is given by (1), where $A=\begin{pmatrix}0&1&0\\0&0&1\\0&0&0\end{pmatrix}$ , $B=\begin{pmatrix}0\\0\\1\end{pmatrix}$ , $C=\begin{pmatrix}1&0&0\end{pmatrix}$ , $D=1$ .
$F=\begin{pmatrix}5\sqrt{3}&5&0&0\end{pmatrix}$ . The system matrix of the exosystem is $\Gamma=\begin{pmatrix}0&-2\pi&0&0\\2\pi&0&0&0\\0&0&0&-3\pi\\0&0&3\pi&0\end{pmatrix}$ and $w(0)={col}(1,0,1,0)$ . Let $Q=I_3$ and $R=1$ . The stopping criterion $||P_{j}-P_{j-1}||\leq 0.001$ is satisfied after six iterations, the approximated optimal feedback gain $K_j^*$ is $K_j^*=\begin{pmatrix}1.0076&2.3614&2.4362\end{pmatrix},$ and the approxiate optimal feedforward control gain $L_j^*$ is $L_j^*=\begin{pmatrix}-4.6071 + 5.6847i&-4.6071 - 5.6847i&0&0\end{pmatrix}.$ The eigenvalues of the matrix $A-BK_j^*$ , namely, $-0.6480 + 0.6810i, -0.6480 - 0.6810i, -1.1402 + 0.0000i$ , all have negative real parts.
Figure 1 shows the evolution of the regulated output.

Figure 1. Evolution of the regulated output $e(t)$

6 Conclusions

In this study, we have addressed the adaptive optimal output regulation problem for general linear systems. To obtain the optimal controller, both optimal feedback control gain matrix and the optimal feedforward control gain matrix are computed by using the adaptive dynamic programming, output regulation theory and the projection theorem in the Hilbert space. Our further work will focus on the optimal output regulation problem for nonlinear systems.

Optimal Output Regulation for General Linear Systems via Adaptive Dynamic Programming

1 Introduction

2 Problem Formulation

3 Optimal Feedback Gain Matrix

4 Optimal Feedforward Gain Matrix

5 Simulation

6 Conclusions

|| Go to the Math & Research main page

You might like!

Network design and control: Shape and topology optimization for the turnpike property for the wave equation

Reinforcement Learning and LQR with special control structure: switched and multilevel systems

FAU MoD Lecture: Data Driven Modeling for Scientific Discovery and Digital Twins

Exact Controllability of Stochastic First-Order Multi-Dimensional Hyperbolic Systems

Optimal Output Regulation for General Linear Systems via Adaptive Dynamic Programming