Generating Skewed-Normally distributed random values

Reading Time: 3 minutes

This weekend I had to generate some random sample data, but instead of it being selected from a normal distribution, I needed the data to have some skew. A distribution is said to be skewed if there are more samples on one side of the mean than on the other. To get a feeling for what skew means, have a look at the following figures.

Histogram of an (almost) symmetrical sample
Histogram of a skewed sample

Now, it is quite easy to generate normally distributed random data, for example using the norm class from scipy.stats. There also is a class for a skewed-normally distributed variable, scipy.stats.skewnorm. However, the variable used to express skew is a bit unintuitive there.

So I searched for another solution, and came across a technical report by D. Ermak and J. Nasstrom from 1995 [1]. Ermak and Nasstrom use the sum of two uniformly distributed random variables with overlapping ranges to generate a skewed quasi-uniformly distributed variable, and then add many of these together to generate a skewed quasi-normally distributed varible.

The paper is worth a read, but here is a summary of the thought process. Going back a bit to basic statistics, we find that the normal distribution is somewhat special in that generally, the distribution of the sum of many identically distributed random variables approaches the normal distribution. This result is known as the Central Limit Theorem. It is also considered the reason that normally distributed random variables are so widespread in nature.

So, if we were to consider the sum of a bunch of — for example — uniformly distributed random variables, then the distribution of that sum would also approach the normal distribution. This is one way of generating approximately normally distributed random data.

Ermak and Nasstrom instead generate random data from what they call a double-block distribution. The probability density function of such a distribution is shown in the following figure.

Double-Block Distribution according to Ermak and Nasstrom, 1995

The idea here is to use two uniformly-distributed random variables with differing means. The sum of these two will have a skewed distribution. Adding many of these skewed random variables up will give us an approximated skewed quasi-normal distribution in the same way that summing up identically distributed random variables will give us an approxomate normal distribution.

Ermak and Nasstrom go through the lengths of calculating the required parameters for the two uniformly-distributed random variables in order to get a specific variance and skew, and then provide an algorithm for drawing samples from such a double-block distribution as well as generating samples from the skewed quasi-normal distribution.

I have coded this up using SciPy in Python. To generate N samples with given mean, standard deviation and skewness, run gen_skewed_continuous(N,mean,stdev,skewness). The optional parameter oversample can be used to modify the number of skewed-uniformly-distributed variables to be added.

import numpy as np;
import scipy.stats as stats;

def gen_skewed_block(N,var,skew):
  # Determine block parameters
  a=np.sqrt(5);
  offset=np.sqrt(skew**2+243*(var**3)/32)
  m1=2/(9*var)*(skew-offset);
  m2=2/(9*var)*(skew+offset);
  p1=m2/(2*a*m1*(m1-m2));
  p2=m1/(2*a*m2*(m1-m2));
  d1=-a*m1;
  d2=a*m2;
  # Uniform random number distribution
  rv = stats.uniform();
  # Get pairs of random numbers
  r = rv.rvs((N,2));
  # Calculate block numbers
  v1=2*d1*(r[:,1]-0.5)+m1;
  v2=2*d2*(r[:,1]-0.5)+m2;
  values=np.where(r[:,0]<2*d1*p1,v1,v2);
  return values;

def gen_skewed_continuous(N,mean,std,skewness,oversample=10):
  # Determine moments
  m2=std**2;
  m3=skewness*std**3;
  # Generate skewed block random values
  rv1=gen_skewed_block(N*oversample,
                       m2,
                       m3*np.sqrt(oversample));
  # Reshape them so we have oversample number of values in a row
  rv2=rv1.reshape(-1,oversample);
  # Sum them together and scale them to approximate a continuous distribution
  rv3=rv2.sum(axis=1)/np.sqrt(oversample)+mean;
  return rv3;

Now, if you think about it, there might also be a way of simplifying this by directly using two non-identically normally distributed random variables, saving the effort for calculating the sums. I’ll have to think about that one.

[1] [doi] D. L. Ermak and J. S. Nasstrom, “A method for generating skewed random numbers using two overlapping uniform distributions,” 1995.
[Bibtex]
@TechReport{Ermak.Nasstrom1995,
author = {D.L. Ermak and J.S. Nasstrom},
title = {A method for generating skewed random numbers using two overlapping uniform distributions},
year = {1995},
month = {feb},
doi = {10.2172/32589},
file = {:Ermak.Nasstrom1995 - A Method for Generating Skewed Random Numbers Using Two Overlapping Uniform Distributions.pdf:PDF},
publisher = {Office of Scientific and Technical Information ({OSTI})},
}

Making sense of Maxwell’s Equations

Reading Time: < 1 minute

I’m currently learning about modelling of electromagnetic systems, and in that process, I’m trying to wrap my head around Maxwell’s Equations. Maxwell’s Equations can be used to determine electric and magnetic fields, which are important for machines using them for work, such as electrical motors.

There are some aspects to those that are similar to fluid dynamics, with electrical fields having their source in electrical charges, while magnetic fields are free of sources. Much of that is expressed using curl and divergence. These are differential operators used to characterise vector fields.

I came across a pretty good video by 3Blue1Brown of Khan Academy fame introducing the concepts of curl and divergence, and also showing the connection to fluid and electrodynamics:

Aktivieren Sie JavaScript um das Video zu sehen.
https://www.youtube.com/watch?v=rB83DpBJQsE

The other video, which goes over Maxwell’s Equations and explains them pretty well is by DrPhysicsA:

Aktivieren Sie JavaScript um das Video zu sehen.
https://www.youtube.com/watch?v=AWI70HXrbG0

Be careful, though: At or around minute 18:13, DrPhysicsA explains how Maxwell’s Second Law (or Gauss’ Law for Magnetism) breaks down when you place your control surface such that only one pole of the magnet is inside and the other is outside. According to DrPhysicsA, that would lead to non-zero flow through the control surface, violating Gauss’ Law for Magnetism.

But actually, that is not the case. One has to consider that the field also flows inside the magnet. Namely, there is flow between the north- and south-pole inside the magnet in opposite direction to that outside the magnet. Thus, the flow inside the magnet and the flow outside that magnet through the control surface still cancel out. Gauss’ Law for Magnetism therefore very elegantly states that magnetic poles always come in pairs.

Lift, Drag and Thrust — Aerodynamics Primer

Reading Time: 12 minutes

Ever wondered how lift is generated on the wing of an aircraft or how a propeller generates thrust? There’s a legend that goes like this: The particles of the air want to stay together, but as the particle that goes along the upper part of the wing needs to go a longer way, it needs to be faster, and according to Bernoulli’s Principle, faster air means lower pressure, leading to lift. Nice explanation, isn’t it? So very romantic, the particles that just won’t let themselves be separated.

It’s just that…it’s completely wrong! Air particles aren’t married to each other, and it may even be that they are completely mixed up — in case of turbulent flow — with the wing still generating lift! In addition, Bernoulli’s Principle does not account for friction in the air, and thus by itself cannot explain drag.

For our further analysis of our quadrocopter model, we need to model the behaviour of the propeller and the engine that drives it. Modelling the behaviour of the propeller involves the use of the formulae for thrust F_T and power P

(1)   \begin{eqnarray*} F_T &= c_T \rho D^4 n^2 \\ P &= c_P \rho D^5 n^3 \end{eqnarray*}

and working with these. Specifically, we need to understand the nature of the thrust and power coefficients c_T and c_P.

So, let’s have a look how lift and drag on an airfoil come to be, how we can use dimensional analysis to get the typical formulae for describing them, and where to find what’s missing. Here’s the route for today’s trekking tour:

Reaction Forces: When a Ball meets a Wall

Let’s first recap a bit of physics basics. Remember Newton’s Third Law of Motion?

For every action, there is an equal and opposite reaction.

Let’s imagine we are sitting in a car at rest, with the motor running. If we now press the accelerator, the engine will provide power that is transferred to the wheels, which in turn will exert a force on the street. Now, the wheel is obviously accelerated to turn so that its bottom moves towards the back of our car — the opposite direction of where we want to go!

So it cannot be the wheel that provides our forward acceleration, because the wheel enacts a force in the wrong direction. Instead, it’s the street surface which makes us move: The force accelerating the wheel makes the wheel enact that force on the street surface, and the street surface will enact an opposite force of equal size on the wheel, accelerating us forward.

In this example, the action is the force acting from the wheel on the street surface, and the reaction is the force acting from the street surface on the wheel and — in consequence — on the whole car. And essentially, that also explains lift and drag. So, thanks for reading, have a good night! …

A ball being thrown against a wall at an angle (viewed from above).

Of course, it’s a bit more involved. For a better understanding, let’s imagine a ball being thrown at a wall, as indicated in the figure to the right. When the ball hits the wall, it cannot proceed further in the same direction, so its velocity vector and with it its momentum must change. The change of velocity is indicated in the figure by \Delta v.

As we know from Newton’s Second Law of Motion, a change of velocity requires a force acting on the ball. It is reasonable to assume that this force is enacted at the contact area of the ball to the wall.

Again, applying Newton’s Third Law of Motion, there must be an opposite force with the same magnitude in the system. But where is it?

Well, there is only one other object touching the contact area, and that is the wall. Thus, the reverse force must be acting on the wall. If we hadn’t fixed the wall in our reference system, we would see it being propelled backwards.

But, how big is this force? Well, in our example, we cannot tell exactly. However, we know the change of the force over time in a qualitative manner:

  1. When the ball first touches the wall, the force is practically non-existent.
  2. Then, when the ball is compressed more and more, it acts against being compressed with increasing force, until it is maximally compressed and at rest.
  3. From then on, it is accelerated again in its new direction, and as it decompresses in the process, the force decreases again.

The total change in momentum until the time t is the integral of this force over time (assuming, we are in a vacuum and there is no internal friction in the ball):

(2)   \begin{equation*} \Delta \vec{p}\left(t\right) = \int_0^t \vec{F}\left(\tau\right) d\tau \end{equation*}

If you want to check the validity of this equation, just consider that \Delta \vec{p} = m \Delta \vec{v} and F = m \vec{a} = m \frac{d}{dt} \vec{v}. If you look closely, you can see that Conservation of Momentum follows from Newton’s Third Law of Motion, according to which for the total forces we have \vec{F} = 0, and thus for the total momentum in the system, \frac{d}{dt} \vec{p}=0 must hold.

We call \vec{F}\left(t\right) the reaction force, as it is a reaction to the change of momentum of part of the system (the ball).

However, although we cannot exactly determine the force, we can derive something about the change of momentum. Looking at the velocity triangle in the figure above, we see that \Delta v is proportional to the magnitudes of v_1 and v_2 (just try scaling the triangle in your head). We know that the latter are equal, so the change in velocity is proportional to the magnitude v_1. It is also proportional to the mass of the ball, so we have

(3)   \begin{equation*} \Delta \vec{p} \sim m v_1 \end{equation*}

Now, if we were to constantly throw balls onto the wall, all having the same mass and initial velocity, we could determine the mean force acting on the wall. Let’s assume that over a time interval of length \Delta t we would throw a mass of \Delta m onto the wall. Then the change in momentum over that time would be

(4)   \begin{equation*} \Delta \vec{p}_{tot} \sim \Delta m v_1 \end{equation*}

Let’s further assume that we throw the balls at a constant frequency, so that we get a mean mass-flow rate of \dot{m} \approx \frac{\Delta m}{\Delta t}. The mean force over the same time would then be

(5)   \begin{equation*} \vec{F} = \frac{\Delta \vec{p}_{tot}}{\Delta t} \sim \frac{\Delta m}{\Delta t} v_1 = \dot{m} v_1 \end{equation*}

If we now think of the balls as being a gas of density \rho moving at velocity v_1 through a tube with cross-section A, the mass flow rate is given by

(6)   \begin{equation*} \dot{m} = \rho A v_1 \end{equation*}

If we plug all of this together, we get the force to be

(7)   \begin{equation*} \vec{F} \sim \rho A \left(v_1\right)^2 \end{equation*}

which already looks eerily like the usual formula for lift and drag:

(8)   \begin{equation*} F = c \rho A v^2 \end{equation*}

Reaction Forces on an Wing

Let’s have a look at the cross-section of a wing and how the air flows around it:

Schematic Airflow around an asymmetric airfoil

The drawing above is only a schematic, but there are some important aspects of the airflow visualised there. What we see is that the air flows in a laminar fashion around the airfoil (that’s what we call the cross-section of a wing). An airflow is laminar if all the sheets of air are nicely separated and there is no turbulence.

We also see that the air on the left side is quite parallel to the horizontal plane, while on the right side it briefly flows downward, before assimilating to the free-flow again. As we have now learned, this indicates a change in momentum, which means that there is a force in play. As there should be: We would expect the wing to generate lift, an upwards force, so the resulting change in momentum should be downwards.

But where does this change in momentum come from? Think about what would happen if there was no change in momentum: The air would simply flow through the wing. The air cannot do that, so it must change direction to avoid the wing. It will again change direction when the wing “moves away” from the flow again, as the pressure of the air flowing beside it will push it that way. That change of direction implies a change of momentum, and that change of momentum must come with a force acting on the air.

So finally, we know where lift and drag come from: They represent the sum of all these forces along the wing, or the net force, and these forces result from the air having to follow the slope of the airfoil. There is a general convention to define lift and drag according to the direction of the free-stream:

  • Lift is the part of the force perpendicular to the direction of the free-stream, with positive lift pointing upwards, and
  • drag is the part of the force parallel to the direction of the free-stream, with positive drag pointing in the direction of the free-stream flow.

Again, we ask ourselves: How big are these forces? And again, we cannot give that number exactly just from this basic model — specifically, if we consider friction –, but we can characterise it quite well using dimensional analysis.

Dimensional Analysis of Lift and Drag

Now it’s time to use what we learned in a previous article about dimensional analysis — but this time we apply it to our wing. From our previous considerations, we already have identified a few parameters that may influence lift and drag:

  • the density of the air in the free-stream \rho_\infty,
  • the size and form of the wing, represented by its projected surface area A, and
  • the free-stream velocity of the air v_\infty.
Angle of Attack on an Airfoil

Geometrically, the so-called angle of attack \alpha will also play a role, as shown in the figure above. This is the angle between the direction of the free stream and the chord line of the airfoil — the theoretical line from the leading to the trailing edge.

If we are working at velocities close to the speed of sound, we need to consider compressibility. Thus, another parameter is the speed of sound in the free stream a_\infty. The speed of sound may differ between the free stream and around the wing, as it depends on the density of the air, and this may well be influenced by the flow around the wing. Therefore, we only consider the free-stream speed of sound — the speeds of sound elsewhere in the stream would be expressible based on this.

Further, we also need to consider internal friction in the air — otherwise there will be no drag at all, according to a finding that is known as D’Alembert’s Paradox. The friction will be represented by the free-stream viscosity \mu_\infty. The friction between two sheets of fluid is proportional to the size of the contact surface and the gradient of velocity between both. The viscosity is the proportionality constant.

So, we finally have the following parameters influencing lift and drag, with their dimensions (M for mass, L for length, and T for time):

QuantitySymbolDimension
Lift/DragF\frac{ML}{T^2}
Angle of Attack\alpha1
Area of the WingAL^2
Free-Stream Velocityv_\infty\frac{L}{T}
Density of the Air\rho_\infty\frac{M}{L^3}
Viscosity\mu\frac{M}{LT}
Speed of Sounda_\infty\frac{L}{T}

Obviously, the lift and drag forces are functions of the other parameters:

(9)   \begin{equation*} F = f\left(\alpha,A,v_\infty,\rho_\infty,\mu,a_\infty\right) \end{equation*}

The dimensions used in these parameters are mass M, length L and time T. We’ll use the density \rho_\infty, the free-stream velocity v_\infty and the wing area A to represent them:

DimensionSymbolRepresentation
MassM\rho \sqrt{A^3}
LengthL\sqrt{A}
TimeT\frac{\sqrt{A}}{v_\infty}

Using these, we find the following expressions with equivalent dimensions:

QuantityDimensionEquivalent Expression
Lift/Drag\frac{ML}{T^2}\rho A {v_\infty}^2
Angle of Attack11
Viscosity \frac{M}{LT}\rho \sqrt{A} v_\infty
Speed of Sound \frac{L}{T}v_\infty

We could now use \frac{\mu}{\rho \sqrt{A} v_\infty} as the dimensionless quantity to represent friction. If you are interested in aerodynamics, that expression may look familiar to you. The Reynolds Number is often used to represent the influence of friction in fluid flows, but it looks slightly different:

(10)   \begin{equation*}  {Re}_\infty := \frac{\rho_\infty l v_\infty}{\mu_\infty} \end{equation*}

So, first of all, nominator and denominator are reversed — larger numbers indicate less friction –, and instead of \sqrt{A} we have the so-called characteristic length l in there. In aerodynamics, the chord length c is usually used for l, so we might not want to deviate from that. Thus, we will use the Reynolds-Number as a dimensionless representation of viscosity.

For the speed of sound, we’d get \frac{a_\infty}{v_\infty}, which is the inverse of the Mach Number. So, we’d rather use the Mach Number M_\infty:=\frac{v_\infty}{a_\infty} directly. Thus, our relationship has the following form:

(11)   \begin{equation*} \frac{F}{\rho {v_\infty}^2 A} = C\left(\alpha,{Re}_\infty,M_\infty\right) \end{equation*}

with

(12)   \begin{eqnarray*} {Re}_\infty &:=& \frac{\rho c v_\infty}{\mu_\infty} \\ M_\infty &:=& \frac{v_\infty}{a_\infty} \end{eqnarray*}

This is the formula in the form that you’ll find in most aerodynamics literature. The value of C\left(\alpha,{Re}_\infty,M_\infty\right) is called the coefficient of lift when calculating lift and the coefficient of drag when calculating drag.

So, where we originally had six parameters to vary in an experiment (\alpha, c resp. A, v_\infty, a_\infty, \rho_\infty and \mu_\infty), we are now left with three parameters to vary (\alpha, {Re}_\infty and M_\infty), and we have a formula that will help us deal with the rest of the variation without additional experiments.

Propeller Aerodynamics

On our multicopter, we don’t have wings. We have propellers, which could be described as rotating wings. Lift turns to thrust, drag turns to torque, torque turns to required power.

The performance of a propeller is determined by a relationship between these quantities:

  • Thrust force F,
  • propeller torque Q,
  • propeller power (due to torque) P,
  • rotational frequency of the propeller n,
  • propeller diameter D,
  • velocity of the air in free-stream v_\infty,
  • density of the air in free-stream \rho_\infty,
  • viscosity of the air in free-stream \mu_\infty,
  • speed of sound in free-stream a_\infty.

Now, here is an excercise for you: See if you can find the usual descriptions of F, Q and P based on dimension analysis:

(13)   \begin{eqnarray*} F &:= c_t\left(J,Re,M\right) \rho D^4 n^2 \\ Q &:= c_q\left(J,Re,M\right) \rho D^5 n^2 \\ P &:= c_p\left(J,Re,M\right) \rho D^5 n^3 \end{eqnarray*}

You will again come across the Reynolds Number Re and the Mach Number M. I’ll give you a few hints for these:

  • The Reynolds Number for propellers is usually based on the chord and tangential velocity at 75% of propeller radius for the characteristic length and velocity. The symbol used for the former is typically c_{\frac{3}{4}}. Yes, you’ll have to add that to the list of dimensional parameters.
  • The Mach Number is typically based on the velocity of the propeller tip.

In addition, you’ll find the advance ratio relevant:

(14)   \begin{equation*} J := \frac{v_\infty}{n D} \end{equation*}

With some basic geometry knowledge you should be able to figure these out. If you get stuck, I recommend looking at this paper by Robert Deters, Gavin Ananda and Michael Selig of the University of Illinois – Urbana-Champaign (UIUC) [1].

Where to find out more

Now that we found out in general what the aerodynamic coefficients can do for us, we’d of course like to know how to get them. Well, there’s always measurement. If you don’t want to do it yourself, the University of Illinois – Urbana-Champaign (UIUC) provides a pretty extensive source of data for airfoils and propellers:

Or perhaps you also want to go back to the origins of flight, to the systematic experiments with large numbers of airfoils at NACA, the National Advisory Committee for Aeronautics, a predecessor of today’s NASA [2]?

However, we can also try to model and simulate the behaviour of airflow around objects. If we assume friction to be negligible (non-viscous or inviscid flow), there’s a whole theory of fluid dynamics modelled around that assumption: potential flow. (Actually, that also requires that the flow is free of rotation.) This is quite a powerful approach as it allows to model fluid flow as a potential field (similar to gravity), and many complicated flows can be described by superposition of basic flows such as uniform flow, flow sources/sinks or circular vortices.

Also, thin airfoil theory makes heavy use of potential flow. It gives us some pretty good first-order estimates of the lift coefficient and the coefficient of the so-called pitch moment for thin airfoils. An airfoil is considered thin if its thickness is small compared to its chord length.

One of the most well-known results from that is the estimated gradient of the lift coefficient of a thin 2D airfoil for small angles of attack (with the angle of attack \alpha being given in radians):

(15)   \begin{equation*} \frac{\partial}{\partial \alpha} c_l = 2 \pi \end{equation*}

However, modelling flow around airfoils using potential flow has one important drawback: there is no drag at all! As drag due to friction is not modelled, it cannot be considered. But even if we sum up the forces due to change in momentum over the airfoil, there is no component along the direction of the free-stream flow.

That means: In potential flow, airfoils generate lift, but no drag. This realisation is known as D’Alembert’s Paradox, after the 18th-century French mathematician Jean le Rond d’Alembert.

Still, potential flow gets us pretty far towards a solution for viscous flow: We can approximate a first solution, and then add corrections, e.g., using boundary layer models. The boundary layer is the area around an object within which the velocity gradient is large enough for friction to matter. By first determining a potential flow solution, the thickness of the boundary layer can be estimated. Then, another potential flow solution is found, increasing the thickness of the airfoil by that of the boundary layer. This is an iterative process which goes on until a good, stable solution is found. Using that solution, it is then possible to derive the amount of drag produced inside the boundary layer.

Then, there’s induced drag, which comes from the fact that our wings are not infinitely wide — although we can approximate infinitely wide wings by making them long and slender. This is what is done with sail planes, who have wings with pretty high aspect ratios. To actually estimate the lift and drag of a real wing, we can use lifting line theory, which again uses a simple model based on potential flow to transform lift and drag data on 2D-airfoils into lift and drag data for a finite wing.

And for the really complicated cases, we can try to explicitly solve the Navier-Stokes equations numerically using computational fluid dynamics (CFD). With these, we can create a virtual wind tunnel, and approximate the air flow quite well. However, there is a lot of computational power involved, and the setup requires much more intricate models of our objects than the other approaches.

Of course there are also tools available for the simpler methods. Modelling the performance of 2D-airfoils in inviscid and viscous, subsonic flow is supported by the well-known XFOIL software. The tool XFLR5 extends this to wing design using lifting-line and vortex sheet methods.

Conclusion

So, now we know where lift, drag, propeller thrust, torque and power come from, and how we can characterise them. These formulae allow us to do a good bit of work, and at least get some pretty good estimates for the performance of a wing — if we know the coefficients. They do not allow us to directly derive these coefficients, but of course there are different methods to approximate them for individual forms of wings: measurements, thin-airfoil theory, potential flow, boundary-layer methods, lifting-line theory or CFD.

Now we are well-equipped for modelling the aerodynamics part of our multicopter engines. Next time, we’ll look into modelling our engine with a DC motor.

[1] [doi] R. W. Deters, G. K. A. Krishnan, and M. S. Selig, “Reynolds number effects on the performance of small-scale propellers,” in 32nd AIAA applied aerodynamics conference, 2014.
[Bibtex]
@InProceedings{Deters.Krishnan.ea2014,
author = {Robert W. Deters and Gavin Kumar Ananda Krishnan and Michael S. Selig},
title = {Reynolds Number Effects on the Performance of Small-Scale Propellers},
booktitle = {32nd {AIAA} Applied Aerodynamics Conference},
year = {2014},
month = {jun},
publisher = {American Institute of Aeronautics and Astronautics},
doi = {10.2514/6.2014-2151},
url = {https://m-selig.ae.illinois.edu/pubs/DetersAnandaSelig-2014-AIAA-2014-2151.pdf},
}
[2] E. N. Jacobs, K. E. Ward, and R. M. Pinkerton, “The characteristics of 78 related airfoil sections from tests in the variable-density wind tunnel,” National Advisory Committee for Aeronautics, Washington, DC, techreport 460, 1933.
[Bibtex]
@TechReport{Jacobs.Ward.ea1933,
author = {Jacobs, Eastman N. and Ward, Kenneth E. and Pinkerton, Robert M.},
title = {The characteristics of 78 related airfoil sections from tests in the variable-density wind tunnel},
institution = {National Advisory Committee for Aeronautics},
year = {1933},
type = {techreport},
number = {460},
address = {Washington, DC},
month = nov,
abstract = {An investigation of a large group of related airfoils was made in the NACA variable-density wind tunnel at a large value of the Reynolds number. The tests were made to provide data that may be directly employed for a rational choice of the most suitable airfoil section for a given application. The variation of the aerodynamic characteristics with variations in thickness and mean-line form were systematically studied.},
file = {:Jacobs.Ward.ea1933 - The characteristics of 78 related airfoil sections from tests in the variable-density wind tunnel.pdf:PDF},
keywords = {AIRFOILS; AERODYNAMIC CHARACTERISTICS; WIND TUNNEL TESTS; THICKNESS; AIRFOIL PROFILES; AERODYNAMIC COEFFICIENTS; LIFT DRAG RATIO; ASPECT RATIO; ANGLE OF ATTACK; HIGH REYNOLDS NUMBER; TABLES (DATA); GRAPHS (CHARTS)},
owner = {ralfg},
timestamp = {2019-11-02},
url = {https://ntrs.nasa.gov/search.jsp?R=19930091108},
}

Dimensional Analysis of a DC-Motor

Reading Time: 9 minutes

For the construction of my first quadrocopter, I bought some brushless DC motors, some propellers, a set of drivers for brushless motors, a battery, built a geometrically matching frame and started working with that. This approach quite probably has not led to a good design: I might get more flight time with the same payload by some better design.

However, there’s so many parameters to tweak: Battery capacity, voltage, size and mass of the engines, size and pitch of the propellers, frame geometry, etc. All of these are interrelated by complicated relationships, most of which are not exactly known to us. Varying all the parameters to find some optimum may prove both time-consuming and costly: We’d have to order different parts with different parameters, measure them under a whole lot of different situations, and then find the optimal solution.

However, there’s a pretty clever tool to reduce the set of variables to consider: Dimensional Analysis. The very short version is: Any relationship between a set of physical quantities can be represented by a smaller set of physical quantities and their dimensionless relationship to the remaining quantities. This allows us to massively simplify experiment design.

In this article, we’ll have a look at the basics of dimensional analysis using the so-called Buckingham \Pi Theorem [1], and try to use a combination of physical insight and dimensional analysis to characterise the performance of a DC motor in comparison to some of its basic parameters such as size or the strength of the magnetic material used. So, this is the plan for today’s tour:

Dimensional Analysis: An example

Schematic view of a mathematical pendulum

Let’s have a look at a mathematical pendulum. We have a mass m fixed to an arm or rope of length l, and the strength of local gravity shall be given by the gravitational acceleration g. We want to determine T, the period of the pendulum. Some basic physical insight tells us, that T may depend on the mass, the length of the arm, local gravity and the angle of initial deflection \alpha_0:

(1)   \begin{equation*} T = f\left(m,l,g,\alpha_0) \end{equation*}

Dimensional analysis allows us to get a general picture of these influences. Let’s try to express T as a dimensionless multiple of some product of powers of m, l and g. Note that we do not consider \alpha_0 in this product, as it has no dimension itself and thus no power of \alpha_0 can contribute to making the product dimensionless:

(2)   \begin{equation*} k_T = T m^{c_m} l^{c_l} g^{c_g} \end{equation*}

Solving this equation so that k_T is dimensionless leads us to

(3)   \begin{equation*} k_T = T \sqrt{\frac{g}{l}} \end{equation*}

Now, we can write our relationship from Eq. 1 in the following way:

(4)   \begin{equation*} T \sqrt{\frac{g}{l}} = \bar{f}\left(m,l,g,\alpha_0) \end{equation*}

However, on the left-hand side, we have a dimensionless number, so the function \bar{f} must also map its parameters to dimensionless numbers. \bar{f} cannot depend on any product of powers of m, l and g, as such a product would not be dimensionless: No power of l and g can cancel the mass dimension present in any non-zero power of m, and the same is true for any other combination. There also cannot be a function that only depends on the value of the dimensioned parameter, as then the value of the function would change if we used feet as unit for the length instead of meters.

Thus, the function on the right can actually only depend on dimensionless quantities, which leaves \alpha_0. So, we finally get

(5)   \begin{equation*} T \sqrt{\frac{g}{l}} = T\left(\alpha_0) \end{equation*}

So, dimensional analysis tells us, that the period of the pendulum only depends on \sqrt{\frac{g}{l}} and the initial deflection angle \alpha_0. We arrived at this conclusion only by analysing the dimensions of the quantities involved.

We might not know T\left(\alpha) in general, but we could measure it for some set of values of g, l and \alpha_0, and could then determine it for any other set of values of g and l if only we keep \alpha_0 constant. We say that \alpha_0 determines the similitude of two problems with different parameters.

If we were to look into the actual equations of motion, we’d get the same result — and we’d also get the form of T\left(\alpha_0\right). Depending on the complexity of the equations involved, this could prove very tedious.

The Buckingham Pi Theorem

What we just applied is known as the Buckingham \Pi Theorem, named for Edgar Buckingham, an American physicist living from 1867 to 1940. Buckingham was not the first to notice and apply the principle, but still today it is named for him.

The theorem essentially states that, if there is a physically meaningful relationship between physical quantities, then there must be a function describing this relationship which only depends on dimensionless ratios of these quantities. He also described a procedure to find these ratios.

Let’s have a look at the proof idea: Assume that we have n physical quantities, which we shall name q_1, \ldots, q_n, and there is some — possibly unknown — relationship between them, which we express in the form

(6)   \begin{equation*} F\left(q_1,\ldots,q_n\right) = 0 \end{equation*}

Obviously, the value of F does not have a dimension. If it did, we could simply divide by that dimension and still have a similar equation without dimension. Thus, F also cannot depend on dimensioned quantities. If it did, its value would differ depending on the units used, even if the quantities were the same.

As a consequence, we can select k of the quantities, and express the others as dimensionless multiples of these. Let’s say, that q_1,\ldots,q_k were the selected quantities. Then we can write Eq. 6 as follows:

(7)   \begin{equation*} F\left(q_1,\ldots,q_k,\frac{q_{k+1}}{{q_1}^{r_{k+1,1}}\ldots{q_k}^{r_{k+1,k}}},\ldots\right) = 0 \end{equation*}

Now, as argued before, F cannot depend on dimensioned quantities, so F must actually be independent of the q_1,\ldots,q_k and can only depend on the dimensionless ratios:

(8)   \begin{equation*} F\left(\frac{q_{k+1}}{{q_1}^{r_{k+1,1}}\ldots{q_k}^{r_{k+1,k}}},\ldots,\frac{q_{n}}{{q_1}^{r_{n,1}}\ldots{q_k}^{r_{n,k}}}\right) = 0 \end{equation*}

Finding the Basis Quantities

Now, how do we find which quantities to use as our basis q_1,\ldots,q_k? It may help to note that term “basis” is used intentionally here, as these quantities must form a basis for the vector space of dimensions used within q_1,\ldots,q_n. In our pendulum example, we had quantities of time, length, mass and acceleration, which is length per time squared. Thus, we have three dimensions and four variables (or “vectors” in linear algebra parlance), and we can select three of these four to represent all the units.

There’s some useful rules for selecting them, but they do not uniquely determine the selection:

  • If one of your variables is a dependent variable — such as the period in our pendulum example — you should not select it as basis variable. Otherwise it will also occur in the ratios for the independent variables, and that would not help us in characterising the relationship properly.
  • Independent variables should be independent of each other. For example, for a propeller you should not include the diameter, the mean chord and the aspect ratio, as the latter is defined by the former two. Instead, use only two of these.
  • Each dimension that occurs in at least one variable must also occur in at least one other variable. Otherwise we cannot create a fraction of variables in which the dimension does not occur.

Specifically the third rule also may give us important insight into the problem, such as when our problem only contains a single variable referencing a specific dimension. In that case our relationship either does not actually depend on that quantity, or we are missing another quantity that influences the relationship. The missing dimension may guide us as to where to look.

How does a DC-Motor work?

Now, let’s use our new knowledge to analyse the performance of a DC motor and its relationship with some of its basic parameters. This will allow us to make better decisions on selecting motors in the future.

So, how does a DC motor work? Essentially, it turns electrical energy into mechanical energy via a magnetic field. For that, it uses two magnetic fields, leading to a magnetic force. That force results in a moment around the axis of the motor. When the motor turns around its axis, one of the fields needs to change its orientation so that the motor keeps turning. This is achieved either mechanically in brushed motors or electronically in brushless motors.

Configuration of a Brushless DC-Motor with coils in the center and permanent magnets on the shell.

The figure above shows the basic mechanical configuration of a Brushless DC (BLDC) motor in outrunner configuration with permanent magnets for one part of the field, and the other field being generated by the coils in the middle. By switching the coils on and off, the inner field can be rotated. In an inrunner configuration, the fixed coils would be outside, and the static magnets would be on the axis in the middle.

For BLDC motors special brushless controllers are required that electronically determine the angular position of the magnets and switch the coils on and off as required. There are different setups for these controllers, some using hall sensors to determine the position of the magnets, others monitoring the induction voltage on the unconnected coil and using that to determine the proper timing for switching.

Schematic of the Electrical Configuration of a DC Motor

The figure to the right shows the basic electrical configuration of a DC motor, indicating the parasitic inductance and resistance of the coils in the motor. In addition, the turning motor also acts as a generator, inducing a counter-voltage, often called counter electromagnetic force (EMF) or back-EMF.

The torque acting on the axis is approximately proportional to the current flowing through the coils, and the reverse voltage generated is proportional to the rotational velocity. The proportionality factors are usually named k_t:=\frac{Q}{I} for the torque coefficient and k_v:=\frac{U}{\omega} for the voltage coefficient.

Efficiency Considerations for the DC-Motor

On the electrical side, we have losses on the resistor. On the magnetic-mechanical side, losses may come from constantly changing the magnetic field inside the metal of the stator. In addition, there may be aerodynamic losses from the turning motor.

The efficiency factor \eta is found as the quotient \frac{P_{out}}{P_{in}}, where P_{in} is the power injected into the system, and P_{out} is the usable power provided by the system.

On the electrical side, we have

(9)   \begin{equation*} \eta_{el} = \frac{k_v \omega I}{k_v \omega I + R I^2} = \frac{1}{1 + \frac{R I}{k_v \omega}} \end{equation*}

On the mechanical side, we have

(10)   \begin{equation*} \eta_{mech} = \frac{k_t I \omega}{k_v \omega I} = \frac{k_t}{k_v} \end{equation*}

Dimensional Analysis of the DC-Motor

We are reaching the final goal of this article: Determining the influences of dimensions on the performance on a DC motor. Specifically, we want to check the influence of some relevant dimensional parameters on the voltage and torque coefficients.

From our previous insight, we assume that k_v and k_t depend on

  • the geometric size of the motor, represented by the motor diameter D and its height h,
  • the number of poles p, and
  • the number of windings n.

Let’s have a look at the dimensions of our quantities. We denote the dimension of a quantity by square brackets. The dimension of D and h is a length: \left[D\right]=\left[h\right]=L. The dimension of k_v is voltage per rotational frequency, which is voltage times time. Voltage is energy per charge, or energy times time per current. So finally, we arrive at \left[k_v\right] = \frac{M L^2}{I T^2}, with M denoting mass, I denoting current and T denoting time.

It is quite obvious that we have a dimension \frac{M}{I T^2} which occurs in only one quantity, notably in k_v. Obviously, the relationship must also include additional quantities. One reasonable quantity to consider is the strength of the of the magnets. The strength of the magnets is given by their remanence B_r, which happens to have the dimension \frac{M}{I T^2}, just what we need.

Now we have all the dimensions we need — L and \frac{M}{I T^2}. Let’s recapitulate the dimensions of all the quantities:

QuantitySymbolDimension
Back-EMF Coefficientk_v\frac{M L^2}{I T^2}
Motor SizeDL
Motor HeighthL
RemanenceB_r\frac{M}{I T^2}

The back-EMF coefficient k_v is our dependent variable, so we cannot use that for expressing the dimensions. Thus, we use D and B_r:

DimensionExpression
LD
\frac{M}{I T^2}B_r

Now, we can express the unit of k_v using these variables:

(11)   \begin{equation*} \left[ B_r D^2 \right] = \frac{M L^2}{I T^2} \end{equation*}

So, as a consequence \frac{k_v}{B_r D^2} is dimensionless. Similarly, the units of D and h are the same, so that \lambda:=\frac{h}{D} is a dimensionless quantity. We call \lambda the aspect ratio of the motor.

Using the Buckingham \Pi Theorem, the following relationship must hold:

(12)   \begin{equation*} \frac{k_v}{B_r D^2} = f_v\left(n,p,\lambda\right) \end{equation*}

We know that the dimensions of k_v and k_t must be the same — otherwise \eta = \frac{k_t}{k_v} cannot be dimensionless. Also, it is reasonable to assume that k_t also depends on the same parameters as k_v. In consequence, the following relationship must also hold:

(13)   \begin{equation*} \frac{k_t}{B_r D^2} = f_t\left(n,p,\lambda\right) \end{equation*}

And as a final finding, we can thus also express the efficiency in a similar manner:

(14)   \begin{equation*} \eta_{mech} = \frac{k_t}{k_v} = \frac{f_t\left(n,p,\lambda\right)}{f_v\left(n,p,\lambda\right)} =: f_\eta\left(n,p,\lambda\right) \end{equation*}

These relationships enable us to compare motors with different sizes and magnet strengths, as long as the number of windings and the aspect ratio are the same. To determine the relationships expressed by f_t, f_v and f_\eta, we only need to determine \frac{k_v}{B_r D^2} and \frac{k_t}{B_r D^2} for motors with different values of n, p and \lambda. We do not need to explicitly vary D and B_r.

Conclusions

In this article, we learned about dimensional analysis, a very powerful tool which allows us to reduce the complexity of the analysis of physical relationships, just by looking at the dimensions. This method is widely used in aerodynamics and is the basis, for example, for the common formulae for lift and drag of wings and airfoils, or for thrust, torque and power for propellers.

Thus, to get a rough estimate on the performance of one system, we can scale up the performance data we have on another system, as long as the similarity parameters — which are the non-dimensional numbers to our coefficient parameters — are close enough. This strongly simplifies the dimensioning of a system, and allows us to get at least a rough ballpark setup for our design. We may later on refine the design and get some actual measurements, but we save a lot of up-front effort by getting to an

[1] [doi] E. Buckingham, “On physically similar systems; illustrations of the use of dimensional equations,” Phys. rev., vol. 4, p. 345–376, 1914.
[Bibtex]
@Article{Buckingham1914,
author = {Buckingham, E.},
title = {On Physically Similar Systems; Illustrations of the Use of Dimensional Equations},
journal = {Phys. Rev.},
year = {1914},
volume = {4},
pages = {345--376},
month = oct,
doi = {10.1103/PhysRev.4.345},
issue = {4},
numpages = {0},
publisher = {American Physical Society},
url = {https://babel.hathitrust.org/cgi/pt?id=uc1.31210014450082&view=1up&seq=905},
}

Rank Considerations for the Observer-Kalman System Identification Procedure

Reading Time: 4 minutes

Last time, we looked in detail at the derivation of the OKID procedure for finding the impulse response of a system from arbitrary input-output data. However, there are some specificities to consider when collecting data, and we can derive these by looking at the rank of the matrices involved.

In this article, we’ll reconsider how the quality and uniqueness of the solution for an ordinary least-squares problem is affected by the ranks of the matrices involved. Based on that, we’ll derive a formula for determining the number of measurements we need to find a good estimate of the Markov parameter matrix.

The Result

Let’s jump to the result real quick here and then look at how it is derived. Assume that we want to determine the Markov parameters

(1)   \begin{equation*} \mathbf{M} =   \begin{bmatrix}     \mathbf{C} \mathbf{A}^{l-1} \mathbf{B} &     \ldots     \mathbf{C} \mathbf{B} &     \mathbf{D}   \end{bmatrix} \end{equation*}

of order l for a system with m inputs and p outputs.

Schematic of a Multiple-Input, Multiple-Output (MIMO) System

The number of measurement points required is given by

(2)   \begin{equation*}   n = om + \left[o\left(m+p\right)+1\right] l \end{equation*}

Here, o is the oversampling factor, which gives the number of data points we have per entry in the Markov parameter matrix. We need this oversampling to average out measurement error due to noise, and thus the larger the oversampling factor, the higher the quality of our estimate for the Markov parameter matrix.

On the other hand, a higher oversampling factor also means that we can either

  • increase the number of measurements while keeping the order of our Markov parameter matrix constant, or
  • decrease the order of our Markov parameter matrix while keeping the number of measurements.

A lower order of the Markov parameter matrix may decrease the quality of the estimate we get from the Eigensystem Realisation Algorithm. A higher number of measurements possibly increases the measurement effort and time.

Rank Considerations for OKID

Let’s review the central equation of the OKID with observer:

(3)   \begin{equation*}   \underbrace{\begin{bmatrix}     \mathbf{y}_{l} &     \mathbf{y}_{l+1} &     \cdots &     \mathbf{y}_{n-1}   \end{bmatrix}}_{=:\mathbf{Y}} \\   \approx \\   \mathbf{\tilde{M}}_l   \underbrace{\begin{bmatrix}     \mathbf{u}_0 & \cdots &\mathbf{u}_{n-l-1} \\     \mathbf{y}_0 & \cdots &\mathbf{y}_{n-l-1} \\     \vdots & \ddots & \vdots \\     \mathbf{u}_{l-1} & \cdots &\mathbf{u}_{n-2} \\     \mathbf{y}_{l-1} & \cdots &\mathbf{y}_{n-2} \\     \mathbf{u}_{l} & \cdots &\mathbf{u}_{n}   \end{bmatrix}}_{=:\mathbf{\tilde{U}}} \end{equation*}

Now have a look at the dimensions of these matrices:

  • \mathbf{Y} \in \mathbb{R}^{p \times \left(n-l\right)}
  • \mathbf{\tilde{M}} \in \mathbb{R}^{p \times \left[m+\left(m+p\right)l\right]}
  • \mathbf{\tilde{U}} \in \mathbb{R}^{\left[m+\left(m+p\right)l\right] \times \left(n-l\right)}

Thus, we have p\left[m+\left(m+p\right)l\right] unknowns and p\left(n-l\right) equations. Looking back at basic linear algebra, thus we know that we need

(4)   \begin{equation*} n-l = m+\left(m+p\right)l \end{equation*}

to hold for the solution to be uniquely determined. We might even need more than that, as this assumes that the measurement data is rich enough so that Equation 3 does not contain linearly dependent rows. However, in any case, the number of our measurements must be at least as large as given by Equation 4.

What happens if we have less measurements than this? Well, in this case the equation is not sufficiently determined, and there are arbitrarily many solutions \mathbf{\tilde{M}} to the equation. The ordinary least-squares approach will deliver a solution, but it is not clear whether that solution accurately describes the system we want to identify — although it will describe a system that will provide the same output given the inputs.

Eliminating Noise by Oversampling

However, if we follow Equation 4 exactly, i.e. if we have exactly as many measurements as specified by this equation, we will also have an exact solution. This means that we will exactly incorporate all the measurement noise into our Markov parameters. Usually, we do not want that. Instead, we want to average out that measurement noise by having multiple samples.

That means that we will need more than the number of measurements given by Equation 4 — a lot more. To quantify this number, we’ll consider an oversampling factor o. This factor gives us the number of samples per parameter we want to have. It also gives the factor by which we can diminish the influence of the measurement noise onto our estimate.

Now, if we want to oversample each parameter by factor o, we need o equations for each parameter. Thus, the following equation must hold

(5)   \begin{equation*} n-l = o\left[m+\left(m+p\right)l\right] \end{equation*}

Solving that equation for n, we get our final result, given in Equation 1.

Impact of Using an Observer

Clearly, this is the result when we use an observer for identifying the Markov parameters. If we have a system which already is sufficiently stable, we do not need the observer approach. In that case, we have a much simpler equation:

(6)   \begin{equation*}   \underbrace{\begin{bmatrix}     \mathbf{y}_{l} &     \mathbf{y}_{l+1} &     \cdots &     \mathbf{y}_{n-1}   \end{bmatrix}}_{=:\mathbf{Y}} \\   \approx \\   \mathbf{M}_l   \underbrace{\begin{bmatrix}     \mathbf{u}_0 & \cdots &\mathbf{u}_{n-l-1} \\     \vdots & \ddots & \vdots \\     \mathbf{u}_{l} & \cdots &\mathbf{u}_{n}   \end{bmatrix}}_{=:\mathbf{U}} \end{equation*}

Here we have m\left(l+1\right) rows in \mathbf{U}, so that the following relationship must hold for an oversampling factor o:

(7)   \begin{equation*} n = o m \left(l+1\right) + l \end{equation*}

This is smaller than the number required for the observer approach, as we do not have to determine many less entries of the Markov parameters. Thus, we may safe some measurement effort if we have a stable system.

Conclusions

It is quite important that we have a sufficient number of measurements

  • to get a uniquely determined set of Markov parameters, and
  • to properly average out our measurement noise.

We can get the number of measurements we need by rank considerations of our basic equations and a simple oversampling approach. If we have a sufficiently stable system, we can avoid using the observer approach and thus reduce the number of measurements required while keeping the order of the Markov parameter set constant.

The Observer/Kalman System Identification Procedure Explained

Reading Time: 10 minutes

Previously, we have had a look at the Eigensystem Realisation Algorithm (ERA) by Joung and Pappa [1]. This algorithm allows us to find the parameters for a linear, time-invariant (LTI) system in discrete-time from measurements of the impulse response.

What, however, do we do when we don’t have measurements of the impulse response? We might not want to hit our system too hard, or have measurements with non-trivial inputs. Maybe we cannot wait for our system to return to a zero state before applying the impulse.

Again, Professor Steve Brunton came to the rescue with two videos on the Observer/Kalman System Identification procedure (or OKID for short) in his series on data-driven control:

Aktivieren Sie JavaScript um das Video zu sehen.
https://youtu.be/TsdBeI6CKUM

Unfortunately, Prof. Brunton skips over the details and only gives a very short overview. Yet, I always try to understand the details of the methods I apply — if not for being able to assess when they may be used, then to learn about the principles being applied.

So I had a look into the papers by a group around Jer-Nan Juang, Minh Phan, Lucas G. Horta and Richard W. Longman, in which the OKID was introduced [2][3][4]. And again, there are some clever ideas in there that I found quite enlightening regarding handling the descriptions of dynamic systems:

  • The basic approach of the OKID uses the fact that the response of an LTI to any input is related to the output by convolution of the impulse response.
  • By reshaping the measurement data, a linear equation between said impulse response and the measurement data is established.
  • This linear equation is solved using an ordinary least-squares approach.

The basic approach only works for asymptotically stable systems which approach zero state fast enough. However, there is a extension for non-stable or too slow systems: Instead of identifying the system itself, the OKID uses a modified system which is made asymptotically stable by a Luenberger observer [5], and then reconstructs the impulse response of the original system from the modified system.

Finally, this observer extension also leads to the “Kalman” part in the name for this procedure: The observer that is identified “on the go” this way turns out to be an optimal Kalman filter [6] given the amount of noise seen on the measurements.

So, all in all, the whole approach is quite ingenious. However, it’s not as complicated as Professor Brunton makes it sound, and we’ll briefly go over it and its basic ideas in this article.

Review: Impulse Response

As for the ERA, we are dealing with discrete-time LTI systems. But different from previously, we’ll directly consider a Multiple Input, Multiple Output (MIMO) system.

Schematic of a Multiple-Input, Multiple-Output (MIMO) System

The dynamics of this system shall be defined by the following recurrence equation:

(1)   \begin{align*}   \mathbf{x}_{k+1} &=     \mathbf{A} \mathbf{x}_k     + \mathbf{B} \mathbf{u}_k \\   \label{eq:system-output}   \mathbf{y}_k &=     \mathbf{C} \mathbf{x}_k     + \mathbf{D} \mathbf{u}_k \end{align*}

Note that this time we also consider the case where the output is directly influenced by the output via the matrix \mathbf{D}. Now, let’s assume that we have an input sequence \mathbf{u}_k and observe the system starting at time step k_0. We can determine the state \mathbf{x}_{k_0+r} (with r \geq 0) to be

(2)   \begin{equation*} \mathbf{x}_{k_0+r} = \mathbf{A}^r \mathbf{x}_{k_0} + \sum_{j=1}^r \mathbf{A}^{r-j} \mathbf{B} \mathbf{u}_{k_0+j-1} \end{equation*}

This can be verified by complete induction. For the case r=0 we get \mathbf{x}_{k_0} = \mathbf{A}^0 \mathbf{x}_{k_0} = \mathbf{x}_{k_0}.

For r \geq 0 we get

(3)   \begin{align*} \mathbf{x}_{k_0+r+1} &= \mathbf{A} \mathbf{x}_{k_0+r} + \mathbf{B} \mathbf{u}_{k_0+r} \\ &= \mathbf{A} \mathbf{A}^r \mathbf{x}_{k_0} + \mathbf{A} \sum_{j=1}^r \mathbf{A}^{r-j} \mathbf{B} \mathbf{u}_{k_0+j-1} + \mathbf{B} \mathbf{u}_{k_0+r} \\ &= \mathbf{A}^{r+1} \mathbf{x}_{k_0} + \sum_{j=1}^r \mathbf{A}^{r+1-j} \mathbf{B} \mathbf{u}_{k_0+j-1} + \mathbf{B} \mathbf{u}_{k_0+r} \\ &= \mathbf{A}^{r+1} \mathbf{x}_{k_0} + \sum_{j=1}^{r+1} \mathbf{A}^{r+1-j} \mathbf{B} \mathbf{u}_{k_0+j-1} \end{align*}

By extension, we can give the following expression for the output \mathbf{y}_{k_0+r}:

(4)   \begin{align*} \mathbf{y}_{k_0+r}   &= \mathbf{C} \mathbf{A}^r \mathbf{x}_{k_0}    + \sum_{j=1}^r \mathbf{C} \mathbf{A}^{r-j} \mathbf{B} \mathbf{u}_{k_0+j-1}    + \mathbf{D} \mathbf{u}_{k_0} \\   &= \mathbf{C} \mathbf{A}^r \mathbf{x}_{k_0} +      \underbrace{\begin{bmatrix}        \mathbf{C} \mathbf{A}^{r-1} \mathbf{B} &        \cdots &        \mathbf{C} \mathbf{B} &        \mathbf{D}      \end{bmatrix}}_{=:\mathbf{M}_r}      \bar{\mathbf{u}}^{\left(r\right)}_{k_0} \end{align*}

Here, we define the column vector \bar{\mathbf{u}}^{\left(r\right)}_{k_0} by stacking the inputs at time steps k_0,\ldots,k_0+r above each other:

(5)   \begin{equation*} \bar{\mathbf{u}}^{\left(r\right)}_{k_0} =   \begin{bmatrix}     \mathbf{u}_{k_0} \\     \vdots \\     \mathbf{u}_{k_0+r}   \end{bmatrix} \end{equation*}

The matrix \mathbf{M}_r gives us the so-called Markov-Parameters of length r, and when reversed, gives the impulse response of the system. If we had these Markov-Parameters, we could extract the system matrices with the help of the Eigensystem Realisation Algorithm (ERA).

Handling Asymptotically Stable Systems

For an asymptotically stable system, the term \mathbf{A}^r approaches zero for increasing values of r. Thus, for some sufficiently large p we may assume that \mathbf{A}^p \approx \mathbf{0}.

This helps us in simplifying Equation 4

(6)   \begin{align*} \mathbf{y}_{k_0+p}   &= \mathbf{C} \underbrace{\mathbf{A}^p}_{\approx \mathbf{0}} \mathbf{x}_{k_0} +      \mathbf{M}_p      \bar{\mathbf{u}}^{\left(p\right)}_{k_0} \\   &\approx \mathbf{M}_p      \bar{\mathbf{u}}^{\left(p\right)}_{k_0} \end{align*}

Now, we can aggregate this into a larger equation:

(7)   \begin{equation*}   \begin{bmatrix}     \mathbf{y}_{k_0+p} &     \mathbf{y}_{k_0+p+1} &     \cdots &     \mathbf{y}_{k_0+n-1}   \end{bmatrix} \\   \approx \\   \mathbf{M}_p   \begin{bmatrix}     \bar{\mathbf{u}}^{\left(p\right)}_{k_0} &     \bar{\mathbf{u}}^{\left(p\right)}_{k_0+1} &     \cdots &     \bar{\mathbf{u}}^{\left(p\right)}_{k_0+n-p}   \end{bmatrix} \end{equation*}

With n measurements (ideally, we have n \gg p) we can thus find \mathbf{M} by solving the equation above. Usually, this will be done using an ordinary least-squarey approach.

Handling Insufficient Stability

In the previous section, we assumed that the system we measure is asymptotically stable and is sufficiently damped so that we can have our measurement count n sufficiently large relative to the length of our impulse response p. So how do we handle systems where this is not the case?

Stabilisation Using a Luenberger Observer

The idea presented by Juang, Phan, Horta and Longman is quite simple: To arbitrarily choose the eigenvalues of the observed system, they construct a Luenberger observer. In general application, such an observer aims to reconstruct the internal state of a system from knowledge about the internal dynamics, about the inputs to the system and the measurements obtained from the system. It has the following structure:

(8)   \begin{align*}   \hat{\mathbf{x}}_{k+1} &= \mathbf{A} \hat{\mathbf{x}}_k + \mathbf{B} \mathbf{u}_k - \mathbf{K} \left(\mathbf{y}_k-\hat{\mathbf{y}}_k\right) \\   \hat{\mathbf{y}}_k &= \mathbf{C} \hat{\mathbf{x}}_k + \mathbf{D} \mathbf{u}_k \end{align*}

In this case, \hat{\mathbf{x}}_k and \hat{\mathbf{y}}_k represent the state of the observer, which according to the construction of the observer shall follow the state of the original system. The observer state is adjusted using the correction term -\mathbf{K} \left(\mathbf{y}_k-\hat{\mathbf{y}}_k\right).

Usually, when constructing a Luenberger observer, we choose \mathbf{K} given our knowledge about the original system in such a way that the observation error \mathbf{e}_k := \hat{\mathbf{x}}_k-\mathbf{x}_k is asymptotically stable, i.e. approaches zero. The dynamics of this error are described as follows:

(9)   \begin{equation*}   \mathbf{e}_{k+1} = \left(\mathbf{A} + \mathbf{K} \mathbf{C}\right) \mathbf{e}_k \end{equation*}

Given a completely observable system, we can select \mathbf{K} in such a way that the eigenvalues of these error dynamics can be set arbitrarily. However, these eigenvalues also become the eigenvalues of the observer system, allowing us to arbitrarily stabilise it.

In our case, we want the observer state to be equal to the actual state, so we set \hat{\mathbf{x}}_k=\mathbf{x}_k, and arrive at the new state equations:

(10)   \begin{align*}      \mathbf{x}_{k+1} &=     \underbrace{      \left(\mathbf{A}+\mathbf{K} \mathbf{C}\right)}_{=:\mathbf{\hat{A}}}     \mathbf{x}_k    +    \underbrace{     \begin{bmatrix}      \mathbf{B}+\mathbf{K} \mathbf{D} & -\mathbf{K}     \end{bmatrix}}_{=:\mathbf{\hat{B}}}    \underbrace{     \begin{bmatrix}      \mathbf{u}_k \\      \mathbf{y}_k     \end{bmatrix}}_{=:\mathbf{\hat{u}}} \\   \mathbf{y}_k &=      \mathbf{C} \mathbf{x}_k      +    \underbrace{     \begin{bmatrix}      \mathbf{D} & \mathbf{0}     \end{bmatrix}}_{=:\mathbf{\hat{D}}}    \mathbf{\hat{u}} \end{align*}

These equations describe the exact same system as in Equations 1 and ??, but with the current output of the system \mathbf{y}_l fed back into the system in such a way that the observed state always equals the actual system state. With this little trick, we have created a stable system to identify.

To find the Markov-Parameters \mathbf{\hat{M}} of the new system, we simply apply the algorithm we used for sufficiently stable systems. This time, we use \mathbf{\hat{u}}_l as input, which combines both the actual input and our measurement.

However, there’s one thing we need to be careful about: \mathbf{\hat{D}} is not completely unconstrained in this problem. Indeed, its second part must be \mathbf{0}. If we were to allow non-zero values there, the best fit would be \mathbf{\hat{D}} = \begin{bmatrix} \mathbf{0} & \mathbf{I} \end{bmatrix}, with all other system matrices being zero. This system would just forward the input \mathbf{y}_k into its output. This would of course be a perfect fit — but it would not properly represent our system.

So instead, we have to find the best fit for

(11)   \begin{equation*}     \begin{bmatrix}     \mathbf{y}_{k_0+p} &     \mathbf{y}_{k_0+p+1} &     \cdots &     \mathbf{y}_{k_0+n-1}   \end{bmatrix} \\   \approx \\   \mathbf{\tilde{M}}_p   \begin{bmatrix}     \bar{\mathbf{\hat{u}}}^{\left(p-1\right)}_{k_0} &     \bar{\mathbf{\hat{u}}}^{\left(p-1\right)}_{k_0+1} &     \cdots &     \bar{\mathbf{\hat{u}}}^{\left(p\right)}_{k_0+n-p} \\     {\mathbf{u}}_{k_0+p} &     {\mathbf{u}}_{k_0+p+1} &     \cdots &     {\mathbf{u}}_{k_0+n}   \end{bmatrix} \end{equation*}

Note how all rows consist of pairs of input and output, but the last row consists only of the input? This way, the identified system cannot feed through the measured output to its output, and instead the equation identifies the dynamics of the system we want to identify.

The result, however, does not yet represent the Markov-Parameters of our observer. Instead, it has the form

(12)   \begin{equation*}   \mathbf{\tilde{M}}   =   \begin{bmatrix}     \mathbf{C} \mathbf{\hat{A}}^{r-1} \mathbf{\hat{B}}     &     \cdots     &     \mathbf{C} \mathbf{\hat{B}}     &     \mathbf{D}   \end{bmatrix} \end{equation*}

For recovery of the Markov-Parameters of the original system, this is quite inconsequential (we just have to be careful in the implementation). If we wanted to actually have the Markov-Parameters of the observer — which we might, as we will see when we look at the specific properties of that observer — then we would have to restructure the response as follows by appending an appropriately sized zero-matrix at the end:

(13)   \begin{equation*}   \mathbf{\hat{M}}   =   \begin{bmatrix}     \mathbf{C} \mathbf{\hat{A}}^{r-1} \mathbf{\hat{B}}     &     \cdots     &     \mathbf{C} \mathbf{\hat{B}}     &     \mathbf{D} &     \mathbf{0}   \end{bmatrix} \end{equation*}

Recovery of Original Markov-Parameters

But how do we get the Markov-Parameters of the original system? Again, Juang et al provide us with a little trick. Let’s first look at the structure of our stabilized Markov-Parameters:

(14)   \begin{align*}   \mathbf{\hat{M}}     &=       \begin{bmatrix}        \mathbf{C} \mathbf{\hat{A}}^{r-1} \mathbf{\hat{B}} &        \cdots &        \mathbf{C} \mathbf{\hat{B}} &        \mathbf{\hat{D}}       \end{bmatrix} \\     &=       \begin{bmatrix}        \mathbf{\hat{M}}_{r} & \ldots & \mathbf{\hat{M}}_1 & \mathbf{\hat{M}}_{0}       \end{bmatrix} \end{align*}

with

(15)   \begin{align*}   \mathbf{\hat{M}}_{0} &=     \begin{bmatrix}       \mathbf{D} &       \mathbf{0}     \end{bmatrix} \\   \mathbf{\hat{M}}_r &=     \mathbf{C}     \left(\mathbf{A}+\mathbf{K} \mathbf{C}\right)^{r-1}     \begin{bmatrix}        \mathbf{B}+\mathbf{K} \mathbf{D} &        -\mathbf{K}     \end{bmatrix} \\     &=:       \begin{bmatrix}         \mathbf{\hat{M}}^{\left(1\right)}_r &         \mathbf{\hat{M}}^{\left(2\right)}_r       \end{bmatrix} \end{align*}

Now, it is quite straightforward to extract \mathbf{D} from \mathbf{\hat{M}}_0. For the others, we find that

(16)   \begin{equation*}   \mathbf{\hat{M}}^{\left(1\right)}_r     + \mathbf{\hat{M}}^{\left(2\right)}_r \mathbf{D}   =   \mathbf{C}   \left(\mathbf{A}+\mathbf{K} \mathbf{C}\right)^{r-1} \mathbf{B} \end{equation*}

For r=1, we can immediately recover \mathbf{M}_1:

(17)   \begin{equation*}   \mathbf{M}_1 := \mathbf{C} \mathbf{B}   =     \mathbf{\hat{M}}^{\left(1\right)}_1     + \mathbf{\hat{M}}^{\left(2\right)}_1 \mathbf{D} \end{equation*}

For r>1 we first expand the last element of the power:

(18)   \begin{align*}   \mathbf{\hat{M}}^{\left(1\right)}_r     + \mathbf{\hat{M}}^{\left(2\right)}_r \mathbf{D}   &=     \mathbf{C}     \left(\mathbf{A}+\mathbf{K} \mathbf{C}\right)^{r-1}     \mathbf{B} \\   &=      \mathbf{C}     \left(\mathbf{A}+\mathbf{K} \mathbf{C}\right)^{r-2}     \mathbf{A}     \mathbf{B} \nonumber\\     &+     \underbrace{       \mathbf{C}       \left(\mathbf{A}+\mathbf{K} \mathbf{C}\right)^{r-2}       \mathbf{K}     }_{=-\mathbf{\hat{M}}^{\left(2\right)}_{r-1}}     \underbrace{       \mathbf{C} \mathbf{B}     }_{=\mathbf{M}_1} \end{align*}

Looking further, we can identify a pattern:

(19)   \begin{equation*}      \mathbf{M}_r   =     \mathbf{\hat{M}}^{\left(1\right)}_r     + \sum_{k=-1}^{r-1}          \mathbf{\hat{M}}^{\left(2\right)}_{r-k-1}          \mathbf{M}_k \end{equation*}

The reader is encouraged to prove this identity by complete induction over \mathbf{r}. Hint: First prove for s,t \geq 0 the following, more general identity:

(20)   \begin{equation*}   \mathbf{C} \left(\mathbf{A} + \mathbf{K} \mathbf{C}\right)^s \mathbf{A}^t \mathbf{B}   = \\   \mathbf{C} \mathbf{A}^{s+t} \mathbf{B}   + \sum_{k=0}^{s-1} \mathbf{C} \left(\mathbf{A} + \mathbf{K} \mathbf{C}\right)^k \mathbf{K} \mathbf{C} \mathbf{A}^{s+t-k-1} \mathbf{B} \end{equation*}

With a few more steps Equation 19 follows for t=0.

Thus, the Markov-Parameters \mathbf{M}_r:=\mathbf{C}\mathbf{A}^{r-1}\mathbf{B} can be easily retrieved from Equation 19.

Why Kalman?

Now we know how the term “observer” got into the name. But how did “Kalman” end up there?

When we described the observer that we would use as model, we actually described a very specific observer, namely one that would immediately provide the same output as the actual system. The authors of the OKID paper [2] use the term “deadbeat observer” for this. This is the fastest possible observer for a discrete time system. Any faster, and the observer would predict the output of the system, violating causality.

Indeed, later in the paper, the authors show that this deadbeat observer is identical with the Kalman filter one would find given the variance of noise seen in the measurement. A more detailed explanation of their proof might follow in a later article.

Thus, extracting not only the original Markov Parameters, but also the parameter \mathbf{K} for the observer may be very beneficial if one wanted to design a Kalman filter for the system.

How do we get the system parameters for this Kalman filter? Well, we already have its Markov parameters, given by \mathbf{\hat{M}}, and can use the ERA on them. That gives us dynamics matrices for a Kalman filter.

Example

Let’s apply this method to our spring pendulum example from the ERA article. Remember: We had a horizontal spring pendulum with the following setup:

Spring Pendulum

Different from the case with ERA, we now use an arbitrary input signal (which is the force F applied to the mass), and determine the output signal. Again, we simulate the system and add noise to the output. The result is shown in the following figure.

Measurement of Pendulum Output (red) with Random Input (blue)

The input is a pseudorandom binary sequence (PRBS). Such sequences can be generated, e.g., using the max_len_seq function from the SciPy signal processing library.

As we know from the ERA experiments, our pendulum is quite stable, so we can use the method for asymptotically stable systems. The result is shown in the following figure:

Comparison of impulse responses – Actual vs. Response estimated by OKID

It seems that the estimate fits the actual impulse response (this time without noise) quite well. However, we’re in a dilemma:

  • If we increase the length of our estimated impulse response, we reduce the amount of data we have available to counter all the noise, and thereby also the quality of our estimate.
  • If we decrease the length of our estimated impulse response, we get a better estimate, but we have less data for our ERA procedure.

So we need to find a good compromise there by playing with the length of our estimated impulse response.

Conclusion

We have seen that the idea of the OKID procedure is not as involved as one might believe. The basis are three very simple ideas:

  • The Markov Parameters determine the impulse response, and thereby define the input-output relationship — together with the initial state.
  • For asymptotically stable systems, the influence of the initial state diminishes over time, eventually becoming so small that we can ignore it.
  • Not sufficiently asymptotically stable systems can be made stable by considering a Luenberger observer in their place. The problem is then reduced to identifying the Markov Parameters for this observer, and retrieving the Markov Parameters for the original system from this intermediate result.
  • The observer such identified actually is a Kalman filter for the system, given the noise seen in the measurement.

It seems useful to always use the observer-enhanced procedure by default. Even for sufficiently asymptotically stable systems the use of the observer would reduce p and thus the amount of measurements required.

Now, finally, this opens up a lot of new possibilities for system identification: We can use the response to any set of inputs to determine the system model. This also allows us to identify a system that already is subject to control and where it would be impractical or impossible to remove that control. One simple example are flight-tests with an already controlled multicopter to improve the controller, but another would be to identify the progression of a disease without having to stop treatment (which of course would be unethical).

To take the design of a multicopter as an example, we could simply record in-flight data from sensors as well as the control inputs sent to our engines, and use that to improve our already existing model. We do not even have to provide specific inputs, but just a general recording of flight data will help, as long as it was rich enough.

[1] [doi] J. Juang and R. S. Pappa, “An Eigensystem Realization Algorithm for Modal Parameter Identification and Model Reduction,” Journal of guidance control and dynamics, vol. 8, iss. 5, 1985.
[Bibtex]
@Article{Juang.Pappa1985,
author = {Juang, Jer-Nan and Pappa, Richard S.},
title = {{A}n {E}igensystem {R}ealization {A}lgorithm for {M}odal {P}arameter {I}dentification and {M}odel {R}eduction},
journal = {Journal of Guidance Control and Dynamics},
year = {1985},
volume = {8},
number = {5},
month = nov,
abstract = {A method called the eigensystem realization algorithm is developed for modal parameter identification and model reduction of dynamic systems from test data. A new approach is introduced in conjunction with the singular-value decomposition technique to derive the basic formulation of minimum order realization which is an extended version of the Ho-Kalman algorithm. The basic formulation is then transformed into modal space for modal parameter identification. Two accuracy indicators are developed to quantitatively identify the system and noise modes. For illustration of the algorithm, an example is shown using experimental data from the Galileo spacecraft.},
doi = {10.2514/3.20031},
file = {:Juang.Pappa1985 - An Eigensystem Realization Algorithm for Modal Parameter Identification and Model Reduction.pdf:PDF},
}
[2] [doi] J. Juang, M. Phan, L. G. Horta, and R. W. Longman, “Identification of Observer/Kalman Filter Markov Parameters: Theory and Experiments,” , Hampton, Virginia 23665, techreport 104069, 1991.
[Bibtex]
@TechReport{Juang.Phan.ea1991,
author = {Jer-Nan Juang and Minh Phan and Lucas G. Horta and Richard W. Longman},
title = {{Identification of Observer/Kalman Filter Markov Parameters: Theory and Experiments}},
year = {1991},
type = {techreport},
number = {104069},
address = {Hampton, Virginia 23665},
month = jun,
abstract = {An algorithm to compute Markov parameters of an observer or Kalman filter from experimental input and output data is discussed. The Markov parameters can then be used for identification of a state space representation, with associated Kalman gain or observer gain, for the purpose of controller design. The algorithm is a non-recursive matrix version of two recursive algorithms developed in previous works for different purposes. The relationship between these other algorithms is developed. The new matrix formulation here gives insight into the existence and uniqueness of solutions of certain equations and gives bounds on the proper choice of observer order. It is shown that if one uses data containing noise, and seeks the fastest possible deterministic observer, the deadbeat observer, one instead obtains the Kalman filter, which is the fastest possible observer in the stochastic environment. Results are demonstrated in numerical studies and in experiments on an ten-bay truss structure.},
doi = {10.2514/6.1991-2735},
file = {:Juang.Phan.ea1991 - Identification of Observer_Kalman Filter Markov Parameters_ Theory and Experiments.pdf:PDF},
keywords = {ALGORITHMS; CONTROL THEORY; CONTROLLERS; KALMAN FILTERS; STOCHASTIC PROCESSES; SYSTEM IDENTIFICATION; TRUSSES; MATRICES (MATHEMATICS); SELECTION; UNIQUENESS},
owner = {ralfg},
school = {NASA Langley Research Center},
timestamp = {2019-09-11},
url = {https://ntrs.nasa.gov/search.jsp?R=19910016123},
}
[3] [doi] M. Phan, J. N. Juang, and R. W. Longman, “Identification of Linear Multivariable Systems by Identification of Observers with Assigned Real Eigenvalues,” Journal of the astronautical sciences, vol. 40, iss. 2, p. 261–279, 1992.
[Bibtex]
@Article{Phan.Juang.ea1992,
author = {M. Phan and J.N. Juang and R.W. Longman},
title = {{Identification of Linear Multivariable Systems by Identification of Observers with Assigned Real Eigenvalues}},
journal = {Journal of the Astronautical Sciences},
year = {1992},
volume = {40},
number = {2},
pages = {261--279},
month = jun,
doi = {https://doi.org/10.2514/6.1991-949},
file = {:Phan.Juang.ea1992 - Identification of Linear Multivariable Systems by Identification of Observers with Assigned Real Eigenvalues.pdf:PDF},
}
[4] [doi] M. Phan, L. G. Horta, J. N. Juang, and R. W. Longman, “Linear system identification via an asymptotically stable observer,” Journal of optimization theory and applications, vol. 79, iss. 1, p. 59–86, 1993.
[Bibtex]
@Article{Phan.Horta.ea1993,
author = {M. Phan and L.G. Horta and J.N. Juang and R.W. Longman},
title = {{Linear system identification via an asymptotically stable observer}},
journal = {Journal of Optimization Theory and Applications},
year = {1993},
volume = {79},
number = {1},
pages = {59--86},
month = oct,
issn = {0022-3239},
abstract = {This paper presents a formulation for identification of linear multivariable systems from single or multiple sets of input-output data. The system input-output relationship is expressed in terms of an observer, which is made asymptotically stable by an embedded eigenvalue assignment procedure. The prescribed eigenvalues for the observer may be real, complex, mixed real and complex, or zero corresponding to a deadbeat observer. In this formulation, the Markov parameters of the observer are first identified from input-output data. The Markov parameters of the actual system are then recovered from those of the observer and used to realize a state space model of the system. The basic mathematical formulation is derived, and numerical examples are presented to illustrate the proposed method.},
doi = {10.1007/BF00941887},
file = {:Phan.Horta.ea1993 - Linear System Identification Via an Asymptotically Stable Observer.pdf:PDF},
owner = {ralfg},
timestamp = {2019-09-11},
}
[5] [doi] D. Luenberger, “Observing the State of a Linear System,” Ieee transactions on military electronics, vol. MIL8, p. 74–80, 1964.
[Bibtex]
@Article{Luenberger1964,
author = {Luenberger, David},
title = {{Observing the State of a Linear System}},
journal = {IEEE Transactions on Military Electronics},
year = {1964},
volume = {MIL8},
pages = {74--80},
month = may,
abstract = {In much of modern control theory designs are based on the assumption that the state vector of the system to be controlled is available for measurement. In many practical situations only a few output quantities are available. Application of theories which assume that the state vector is known is severely limited in these cases. In this paper it is shown that the state vector of a linear system can be reconstructed from observations of the system inputs and outputs. It is shown that the observer, which reconstructs the state vector, is itself a linear system whose complexity decreases as the number of output quantities available increases. The observer may be incorporated in the control of a system which does not have its state vector available for measurement. The observer supplies the state vector, but at the expense of adding poles to the over-all system.},
doi = {10.1109/TME.1964.4323124},
file = {:Luenberger1964 - Observing the State of a Linear System.pdf:PDF},
}
[6] [doi] R. E. Kalman, “A New Approach to Linear Filtering and Prediction Problems,” Transactions of the asme – journal of basic engineering, vol. 82, p. 35–45, 1960.
[Bibtex]
@Article{Kalman1960,
author = {Kalman, R. E.},
title = {{A New Approach to Linear Filtering and Prediction Problems}},
journal = {Transactions of the ASME - Journal of Basic Engineering},
year = {1960},
volume = {82},
pages = {35--45},
doi = {10.1115/1.3662552},
file = {:Kalman1960 - A New Approach to Linear Filtering and Prediction Problems.pdf:PDF},
owner = {ralfg},
timestamp = {2015.08.09},
}

A deeper look into the Eigensystem Realisation Algorithm

Reading Time: 12 minutes

For my quadcopter control project, I am currently in the process of system identification, trying to find a model describing the dynamics of my motor-propeller-combination. Finding the theoretical model is quite simple, but then you’re left with finding the parameters — and not all of them are directly available from the motor vendors. So I have no other choice than to determine these parameters from measurements.

In my endeavour I came across this series of videos on data-driven control by machine learning by Professor Steve Brunton of the University of Washington, Seattle. There, basic principles of modern methods for system identification are explained very well. If you are new to control systems, I can also recommend his control boot camp.

In part of the series, Professor Brunton explains the Eigensystem Realisation Algorithm (ERA) by Juang and Pappa [1]. The ERA is a procedure by which a reduced-order model for a linear and time-invariant system (LTI) can be derived from data of an impulse-response measurement.

Aktivieren Sie JavaScript um das Video zu sehen.
https://youtu.be/neTO1_RuLdU?list=PLMrJAkhIeNNQkv98vuPjO2X2qJO_UPeWR
Aktivieren Sie JavaScript um das Video zu sehen.
https://youtu.be/QDlWRWbFvhg
Aktivieren Sie JavaScript um das Video zu sehen.
https://youtu.be/7rIE7V1Xj2I?list=PLMrJAkhIeNNQkv98vuPjO2X2qJO_UPeWR

However, while the videos do a pretty solid job of explaining the concepts and give the final formulae, I wanted to know more about their derivation. For this, I went to the original paper where the approaches were developed, and tried to reproduce the steps that led to the final results.

I follow this approach quite a lot when acquiring new skills. It helps not only better understanding — and memorizing — the final solution, but I also tend to pick up neat tricks along the way that often prove helpful elsewhere — even completely outside the domain of the original paper.

Here’s some interesting ideas you may pick up by reading this article:

  • When doing system identification, we mostly work with discrete-time systems, and in some aspects they are much easier to handle than continuous-time systems.
  • The discrete-time impulse response, which uniquely characterises the behaviour of LTI systems, can be easily expressed in closed form using the system matrices.
  • By reorganising our data, we can express seemingly non-linear problems in linear form.
  • The singular value decomposition allows us to examine the composition of a matrix and simplify it.
  • There are a few tricks for transforming matrix power expressions that may reduce the order of our problem.

Our Example

Juang and Pappa worked on large, flexible space structure such as the Galileo spacecraft (which should not be mixed up with the European Galileo satellite navigation system), and aimed to identify the oscillation modes of such structures, so that these could be taken into account in control of the spacecraft.

We’ll look at something similar, although much more simple: A spring-loaded, dampened horizontal pendulum. This pendulum consists of a mass m connected to a spring and a linear dampener. The mass can move horizontally, and we measure the position x of its centre of gravity, with x=0 being the position when the system is at rest.

Spring Pendulum

This system can be modelled as a Single-Input, Single-Output system: We have a single input u=F and a single output y=x.

Schematic of a Single-Input, Single-Output (SISO) System

Normally, we could easily provide at least the structure of a model for this system from first principles — namely, the conservation of momentum:

(1)   \begin{equation*}   m \ddot{x} + d \dot{x} + k x = F \end{equation*}

Introducing state variables x_1 = x and x_2 = \dot{x}, we can define the dynamics of this system in state space notation:

(2)   \begin{align*}   \frac{d}{dt} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}     &=     \begin{bmatrix}       0 & 1 \\       -\frac{k}{m} & -\frac{d}{m}     \end{bmatrix}     & \begin{bmatrix} x_1 \\ x_2 \end{bmatrix}     &+ \begin{bmatrix} 0 \\ \frac{1}{m} \end{bmatrix} F \\   y     &=     \begin{bmatrix} 1 & 0 \end{bmatrix}     & \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} \end{align*}

Instead of actually measuring our system, we will be using the continuous-time dynamics to simulate the system and thereby perform “measurements”. We will add white noise to emulate measurement noise.

Impulse-Response Measurement

Now, what happens if we whack this system with a hammer? What if we give it a short kick and then measure the position of the mass? Let’s find out!

Measured Impulse Response of a Dampened Spring-Mass System

The figure shows the position of the mass over time. We see that the system oscillates around the neutral position, but the amplitude of the oscillation falls exponentially with time, until measurement noise exceeds the movement of the system. In this example, we whacked the mass with a short force of 1 Newton, and then let go. So, our input would look like this:

Discrete-Time Impulse Input – Spring Pendulum

This function is called the Kronecker Delta Function, and when used in analysis of signals and systems, it is referred to as the (discrete-time) unit impulse. It is one for exactly one tick of our (discrete) time-base and zero everywhere else. When given as an input to a discrete-time system, the output observed afterwards is called the discrete-time impulse response. Specifically, LTI systems are completely characterised by their impulse response.

As we are working on data that we would acquire by measurement, considering the continuous-time formulation is not useful, as we cannot get continuous-time measurement data. Instead, we’ll have some kind of quantisation in both time and value. This actually simplifies the maths of our problem a bit, as we will see in a moment.

We now want to have a look at how we can mathematically and generally describe the impulse response. This will be the basis of our model to which we will then fit our measurement data. In discrete time, we describe the dynamics of an LTI system using a recurrence relation in state space:

(3)   \begin{equation*}   \mathbf{x}_{k+1} = \mathbf{A} \mathbf{x}_k + \mathbf{b} u_k \end{equation*}

Let us assume that for our initial state \mathbf{x}\left(0\right)=0 holds. This is certainly true for dampened structures if we wait long enough for all oscillations to die down before we whack the structure. Now, if our input is 1 for time step k=0 and 0 for all other time steps, we can see the progression of our state by repeatedly applying Equation 3:

Time Step kState \mathbf{x}\left(k\right)
00
1\mathbf{A} \underbrace{\mathbf{x}\left(0\right)}_{=0} + \mathbf{b} \underbrace{u\left(0\right)}_{=1}=\mathbf{b}
2 \mathbf{A} \underbrace{\mathbf{x}\left(1\right)}_{=\mathbf{b}} + \mathbf{b} \underbrace{u\left(1\right)}_{=0} = \mathbf{A}\mathbf{b}
3 \mathbf{A} \underbrace{\mathbf{x}\left(2\right)}_{=\mathbf{A} \mathbf{b}} + \mathbf{b} \underbrace{u\left(2\right)}_{=0} = \mathbf{A}^2\mathbf{b}
n \mathbf{A}^{n-1}\mathbf{b}

So, in general we have \mathbf{x}_k = \mathbf{A}^{k-1} \mathbf{b} for k>0, and with the output relation

(4)   \begin{equation*}   y_k = \mathbf{c}^T \mathbf{x}_k \end{equation*}

we get

(5)   \begin{equation*}   y_k = \mathbf{c}^T \mathbf{A}^{k-1} \mathbf{b} \end{equation*}

Note that this impulse-response is completely independent of what our state variables \mathbf{x}_k represent. You can try it yourself: Substitute \mathbf{x}_k = \mathbf{T} \tilde{\mathbf{x}}_k with some invertible square matrix \mathbf{T} and determine the impulse response of that system. However, this actually works in our favour, as it allows us to arbitrarily select at least part of the parameters later on.

In general, the terms \mathbf{c}^T \mathbf{A}^{k-1} \mathbf{b} — or, in their Multiple-Input, Multiple-Output form \mathbf{C} \mathbf{A}^{k-1} \mathbf{B} — are called the Markov Parameters of the system.

Finding an Exact Model of Order n

What Juang and Pappa aim to do is to find parameter matrices \mathbf{A}, \mathbf{c}^T and \mathbf{b} so that the impulse response so described fits the measured data exactly. One way of doing so would be to explicitly write down the parametrised impulse-response — as we did in the last chapter — and try to solve this for the parameters directly.

However, that approach is rather ugly, and looking at Equation 5, this is a non-linear problem due to the powers of \mathbf{A} occurring. Instead, Juang and Pappa use quite a nice trick: They restructure the data in such a way that the problem becomes linear. This allows us to use our toolkit from linear algebra to find a solution.

In a first step, Juang and Pappa construct the vectors \bar{\mathbf{y}}_k, which are the column vectors of n observations starting at time step k:

(6)   \begin{equation*}   \bar{\mathbf{y}}_k := \begin{bmatrix} y_k \\ y_{k+1} \\ \vdots \\ y_{k+n-1} \end{bmatrix} \end{equation*}

We will see later that n is the order of the system we will build. For the time being, we shall assume that we have an arbitrary amount of measurements, so there are no limits on n. From Equation 5, we can see that

(7)   \begin{equation*}   \bar{\mathbf{y}}_k = \mathcal{O} \mathbf{A}^{k-1} \mathbf{b} \end{equation*}

with the observability matrix

(8)   \begin{equation*}   \mathcal{O} = \begin{bmatrix} \mathbf{c}^T \\ \mathbf{c}^T  \mathbf{A} \\ \mathbf{c}^T  \mathbf{A}^2 \\ \vdots \\  \mathbf{c}^T  \mathbf{A} ^{n-1} \end{bmatrix} \end{equation*}

Now, remember the function of the observability matrix: If the state has dimension n and the observability matrix has at least rank n, then we can use it to uniquely reconstruct the state from the outputs at time steps k,\ldots,k+n-1. Thus, if our system is observable, we can use the inverse of the observability matrix \mathcal{O} to find the state \mathbf{x}_k:

(9)   \begin{equation*}  \mathcal{O}^{-1} \bar{\mathbf{y}}_k = \mathbf{A}^{k-1} \mathbf{b} = \mathbf{x}_k \end{equation*}

However, from that we can construct any state at any time due to:

(10)   \begin{equation*}   \mathbf{x}_{k+1}=\mathbf{A} \mathbf{x}_k =  \mathbf{A}\mathcal{O}^{-1} \bar{\mathbf{y}}_k \end{equation*}

Applying this to y_k we get the recurrence

(11)   \begin{equation*}   \bar{\mathbf{y}}_{k+1} = \mathcal{O} \mathbf{x}_{k+1} = \mathcal{O} \mathbf{A}\mathcal{O}^{-1} \bar{\mathbf{y}}_k \end{equation*}

Now, let’s look at what we have done: We are able to describe the output at time steps k+1,\ldots,k+n from the output at time steps k,\ldots,k+n-1 using a linear operator! How do we find \mathcal{O} \mathbf{A}\mathcal{O}^{-1}? Equation 11 does not sufficiently specify their value. To increase the rank of the equation system, Joung and Pappa proceed to build the Hankel-matrices

(12)   \begin{align*}   \mathbf{H}_k   &=     \begin{bmatrix}       \bar{\mathbf{y}}_k & \bar{\mathbf{y}}_{k+1} & \cdots &\bar{\mathbf{y}}_{k+n-1}     \end{bmatrix} \\   &=     \begin{bmatrix}       y_k & y_{k+1} & \cdots & y_{k+n-1} \\       y_{k+1} & y_2 & \cdots & y_{n+1} \\       \vdots & \vdots & \ddots & \vdots \\       y_{k+n-1} & y_{k+n} & \cdots & y_{k+2n-2}     \end{bmatrix} \end{align*}

These matrices are constructed by listing n measurements starting at time step k in the first column, then listing n measurements starting at time step k+1 in the second column, and so on. Using Equation 11, we can find the following recurrence for \mathbf{H}_{k+1}:

(13)   \begin{equation*}      \mathbf{H}_{k+1} = \mathcal{O} \mathbf{A} \mathcal{O}^{-1} \mathbf{H}_k \end{equation*}

Thus, if we have any matrices \mathbf{H}_k and \mathbf{H}_{k+1}, with the former being a regular, invertible matrix, we can find

(14)   \begin{equation*}   \mathcal{O} \mathbf{A}^{-1} \mathcal{O}^{-1} = \mathbf{H}_{k+1} {\mathbf{H}_k}^{-1} \end{equation*}

Thus, we can express y_k using

(15)   \begin{align*}   y_k     &= \mathbf{e}^T \mathbf{H}_k \mathbf{e} \\     &= \mathbf{e}^T \left(\mathcal{O} \mathbf{A}^{-1} \mathcal{O}^{-1}\right)^{k-1} \mathbf{H}_1 \mathbf{e} \\     &=        \underbrace{\mathbf{e}^T}_{=:\hat{\mathbf{c}}^T}       {\underbrace{\left(\mathbf{H}_{2} {\mathbf{H}_1}^{-1}\right)}_{=:\hat{\mathbf{A}}}}^{k-1}       \underbrace{\mathbf{H}_1 \mathbf{e}}_{=:\hat{\mathbf{b}}} \\     &:= \hat{\mathbf{c}}^T \hat{\mathbf{A}}^{k-1} \hat{\mathbf{b}} \end{align*}

where \mathbf{e}^T=\begin{bmatrix} 1 & 0 & \ldots & 0 \end{bmatrix}. This exactly describes our measured system response:

Comparison of Impulse Responses: Measured vs. Estimated from Full Order System

The Problem of Noise

However, we also see that our estimated system exactly fits all the noise from our measurement. In our case we know from first-principles considerations that the exact model would only be of order n=2, while the model we developed here is of much larger order. Besides being quite a misuse of resources, this may also lead to considerable estimation errors, as we can see when plotting the step responses:

Comparison of Step Responses: Measured vs. Estimated from Full Order System

Our estimated system clearly deviates from the system we measured. This is no wonder, as the estimated system incorporates all the noise we measured.

Further, our Hankel-matrices are pretty badly conditioned. This means that small rounding errors during calculations may become large errors in the result. We know that — in the absence of measurement error — the rank of the Hankel-matrices cannot be larger than the order of the underlying system. If they are indeed regular, they only are due to measurement noise, which should ideally be much smaller in magnitude than the actual data. Thus, simple inversion is prone to large numerical errors.

In their original paper, Juang and Pappa describe a method of using only a subset of the measurements to reduce the size of the Hankel-matrices, and thereby to improve the conditioning of the matrices. However, today, the method of ERA is almost universally presented (e.g. on Wikipedia) as being based on Singular Value Decomposition (SVD), using the whole set of measurement data.

Singular Value Decomposition

Any matrix \mathbf{M} \in \mathbb{R}^{m \times n} can be represented in the form of its Singular Value Decomposition (SVD):

(16)   \begin{equation*}   \mathbf{M} = \mathbf{P} \mathbf{D} \mathbf{Q}^{T} \end{equation*}

The matrices all have special forms:

  • The matrices \mathbf{P} \in \mathbb{R}^{m \times m} and \mathbf{Q} \in \mathbb{R}^{n \times n} are square, orthogonal matrices, i.e. \mathbf{P}\mathbf{P}^{T}=\mathbf{I}_m and \mathbf{Q}\mathbf{Q}^{T}=\mathbf{I}_n. Note that the identity matrices may have different dimensions if \mathbf{M} is not square.
  • The matrix \mathbf{D} \in \mathbb{R}^{m \times n} is a diagonal matrix, with all diagonal elements being non-negative. As a common convention, the diagonal elements \sigma_k of \mathbf{D} — called the singular values of \mathbf{M} — are listed in decreasing order.

Essentially, the SVD expresses the contents of the matrix as the sum of matrices with decreasing amount of impact:

(17)   \begin{equation*}   \mathbf{M}_{ij} = \sum_{r=1}^{\min\left\{m,n\right\}} \sigma_r \mathbf{P}_{ir} \mathbf{Q}^T_{rj} \end{equation*}

As the matrices \mathbf{P} and \mathbf{Q} are orthogonal, the impact of the individual element is represented by \sigma_r. If we look at the Pareto plot of the singular values, we see that there are some dominant elements in there:

Pareto Plot of Singular Values of the Measurement Hankel Matrix

The plot shows the individual singular values in orange, ordered in decreasing order from left to right, and the cumulative proportion of the singular values, added up from left to right. We can see that the first two singular values are the most prominent, making up for about 55% of the total data values, and the remaining singular values are pretty small in comparison to that — although they still add up to representing 45% of the data. We also see that our matrix is awfully close to being singular, which it would be if any of the singular values were zero.

This is somewhat consistent with our original considerations, where we found out that our system is actually of second order. Thus, we may assume that the data represented by the major two singular values is our actual data, while the remaining singular values represent noise.

Now, what do we do with that information? We can use it to remove what we consider to be noise from our data. In essence, the vectors of \mathbf{P} and \mathbf{Q} for bases of vector spaces, and the singular values determine the magnitude by which the base vectors are represented in the data in the matrix. Thus, by removing some singular values and base vectors, we can project the data in the matrix onto the directions represented by the remaining base vectors, and thereby approximate the original matrix:

(18)   \begin{align*}   \mathbf{M}     &= \mathbf{P} \mathbf{D} \mathbf{Q}^{T} \\     &=       \begin{bmatrix} \tilde{\mathbf{P}} & \mathbf{P}_R \end{bmatrix}       \begin{bmatrix}         \tilde{\mathbf{D}} & \mathbf{0} \\         \mathbf{0} & \mathbf{D}_R       \end{bmatrix}       \begin{bmatrix}          \tilde{\mathbf{Q}}^T \\          {\mathbf{Q}_R}^T       \end{bmatrix} \\     &\approx \tilde{\mathbf{P}} \tilde{\mathbf{D}} \tilde{\mathbf{Q}}^{T} \\     &=: \tilde{\mathbf{M}} \end{align*}

By appropriately selecting \tilde{\mathbf{D}}, we can approximate \mathbf{M} arbitrarily close. Usually, one would determine the size of \tilde{\mathbf{D}} from first-principles considerations, or by defining a cut-off value in relative error.

With that, we can determine the pseudo-inverse of \tilde{\mathbf{M}}:

(19)   \begin{equation*}   \tilde{\mathbf{M}}^{*} = \tilde{\mathbf{Q}} \tilde{\mathbf{D}}^{-1} \tilde{\mathbf{P}}^{T} \end{equation*}

I encourage you to personally verify that indeed \tilde{\mathbf{M}} \tilde{\mathbf{M}}^{*} = \tilde{\mathbf{M}}^{*} \tilde{\mathbf{M}} = \mathbf{I}_{n \times m} holds, where \mathbf{I}_{n \times m} is the n \times m identity matrix (ones on the diagonal, zeroes everywhere else).

Simplified Model

With the simplified Hankel-Matrix

(20)   \begin{equation*}   \tilde{\mathbf{H}}_1 = \tilde{\mathbf{P}} \tilde{\mathbf{D}} \tilde{\mathbf{Q}}^{T} \end{equation*}

and its pseudo-inverse {\tilde{\mathbf{H}}_1}^{*}, we can build a simplified, cleaned-up model:

(21)   \begin{align*}   \tilde{y}_k     &=        \underbrace{\mathbf{e}^T}_{=:\tilde{\mathbf{c}}^T}       {\underbrace{\left(\mathbf{H}_{2} {\tilde{\mathbf{H}}_1}^{*}\right)}_{=:\tilde{\mathbf{A}}}}^{k-1}       \underbrace{\tilde{\mathbf{H}}_1 \mathbf{e}}_{=:\tilde{\mathbf{b}}} \\     &:= \tilde{\mathbf{c}}^T \tilde{\mathbf{A}}^{k-1} \tilde{\mathbf{b}} \end{align*}

Determining the impulse response of that model, we get a pretty clean fit, and it seems that the noise is actually being ignored.

Comparison of Impulse Responses: Measured vs. Estimated from Simplified, Full Order System

Also, our step response fits much better:

Comparison of Step Responses: Measured vs. Estimated from Full Order, Simplified System

Reduced-Order Model

But still, our matrices \tilde{\mathbf{c}}^T, \tilde{\mathbf{A}} and \tilde{\mathbf{b}} are of order \mathbf{n}, while they should be of second order for our system. To reduce that order, we’ll have to make use of some trickery. We’ll re-use Equation 21 and add an identity matrix in there:

(22)   \begin{align*}   \tilde{y}_k   &= \tilde{\mathbf{c}}^T \tilde{\mathbf{A}}^{k-1} \tilde{\mathbf{b}} \\   &= \underbrace{\tilde{\mathbf{c}}^T}_{=:\mathbf{e}^T} \underbrace{\mathbf{I}_n}_{=\tilde{\mathbf{H}}_1 {\tilde{\mathbf{H}}_1}^{*}} {\underbrace{\tilde{\mathbf{A}}}_{=\mathbf{H}_{2} {\tilde{\mathbf{H}}_1}^{*}}}^{k-1} \underbrace{\tilde{\mathbf{b}}}_{=:\tilde{\mathbf{H}}_1 \mathbf{e}} \\   &= \mathbf{e}^T \tilde{\mathbf{H}}_1 {\tilde{\mathbf{H}}_1}^{*} \left(\mathbf{H}_{2} {\tilde{\mathbf{H}}_1}^{*} \right)^{k-1} \tilde{\mathbf{H}}_1 \mathbf{e} \\  &= \mathbf{e}^T \tilde{\mathbf{P}} \tilde{\mathbf{D}} \tilde{\mathbf{Q}}^{T} \tilde{\mathbf{Q}} \tilde{\mathbf{D}}^{-1} \tilde{\mathbf{P}}^{T} \left(\mathbf{H}_{2} \tilde{\mathbf{Q}} \tilde{\mathbf{D}}^{-1} \tilde{\mathbf{P}}^{T} \right)^{k-1} \tilde{\mathbf{P}} \tilde{\mathbf{D}} \tilde{\mathbf{Q}}^{T} \mathbf{e} \end{align*}

Keep in mind that the values in \tilde{\mathbf{D}} are non-negative and all non-zero values are on the diagonal, so that we can take the square-root of both \tilde{\mathbf{D}} and its inverse. We’ll use that to regroup a bit:

(23)   \begin{align*}   \tilde{y}_k   &= \mathbf{e}^T \tilde{\mathbf{P}} \tilde{\mathbf{D}} \underbrace{\tilde{\mathbf{Q}}^{T} \tilde{\mathbf{Q}}}_{\mathbf{I}_n} \tilde{\mathbf{D}}^{-1} \tilde{\mathbf{P}}^{T} \left(\mathbf{H}_{2} \tilde{\mathbf{Q}} \tilde{\mathbf{D}}^{-1} \tilde{\mathbf{P}}^{T} \right)^{k-1} \tilde{\mathbf{P}} \tilde{\mathbf{D}} \tilde{\mathbf{Q}}^{T} \mathbf{e} \\   &= \mathbf{e}^T \tilde{\mathbf{P}} \tilde{\mathbf{D}} \tilde{\mathbf{D}}^{-1} \left(\tilde{\mathbf{P}}^{T} \mathbf{H}_{2} \tilde{\mathbf{Q}} \tilde{\mathbf{D}}^{-1} \right)^{k-1} \underbrace{\tilde{\mathbf{P}}^{T} \tilde{\mathbf{P}}}_{=\mathbf{I}_n} \tilde{\mathbf{D}} \tilde{\mathbf{Q}}^{T} \mathbf{e} \\   &= \mathbf{e}^T \tilde{\mathbf{P}} \underbrace{\tilde{\mathbf{D}} \tilde{\mathbf{D}}^{-\frac{1}{2}}}_{=\tilde{\mathbf{D}}^{\frac{1}{2}}} \left(\tilde{\mathbf{D}}^{-\frac{1}{2}} \tilde{\mathbf{P}}^{T} \mathbf{H}_{2} \tilde{\mathbf{Q}} \tilde{\mathbf{D}}^{-\frac{1}{2}} \right)^{k-1} \underbrace{\tilde{\mathbf{D}}^{-\frac{1}{2}} \tilde{\mathbf{D}}}_{=\tilde{\mathbf{D}}^{\frac{1}{2}}} \tilde{\mathbf{Q}}^{T} \mathbf{e} \\   &= \underbrace{\mathbf{e}^T \tilde{\mathbf{P}} \tilde{\mathbf{D}}^{\frac{1}{2}}}_{=:\hat{\mathbf{c}}^T}{\underbrace{\left(\tilde{\mathbf{D}}^{-\frac{1}{2}} \tilde{\mathbf{P}}^{T} \mathbf{H}_{2} \tilde{\mathbf{Q}} \tilde{\mathbf{D}}^{-\frac{1}{2}} \right)}_{=:\hat{\mathbf{A}}}}^{k-1} \underbrace{\tilde{\mathbf{D}}^{\frac{1}{2}} \tilde{\mathbf{Q}}^{T} \mathbf{e}}_{=:\hat{\mathbf{b}}} \\   &= \hat{\mathbf{c}}^T \hat{\mathbf{A}}^{k-1} \hat{\mathbf{b}} \end{align*}

Notice that in the first step, we have pushed \tilde{\mathbf{P}}^{T} from the left into the parenthesised, inner expression, and out the right side again. This does not change the value of the expression. We have then done the same with \tilde{\mathbf{D}}^{-\frac{1}{2}}, and then simplified the remaining expression. Now, if we check the dimensions of \hat{\mathbf{c}}^T, \hat{\mathbf{A}} and \hat{\mathbf{b}}, they indeed match the reduced order of our system.

In fact, nothing should have changed except for the order of the system, so looking at our impulse response, we should not see any difference:

Comparison of Impulse Responses: Measured vs. Estimated from Reduced Order System

Similarly, our step response should still look the same:

Comparison of Step Responses: Measured vs. Estimated from Reduced Order System

Conclusion

Our expedition into the details of the ERA was quite fruitful: We have learned a few techniques for restructuring, analysis and reduction of measurement data.

Many of these will come handy when we examine the Observer/Kalman System Identification Algorithm (OKID). The OKID is used to extract the impulse response from measurement data that was acquired using arbitrary control input.

Further, some of these ideas also will allow us to identify non-linear systems — or at least approximate them. As the engine/propeller combination includes possibly non-linear elements due to aerodynamic drag, this is specifically relevant for the identification of multicopter propulsion systems.

[1] [doi] J. Juang and R. S. Pappa, “An Eigensystem Realization Algorithm for Modal Parameter Identification and Model Reduction,” Journal of guidance control and dynamics, vol. 8, iss. 5, 1985.
[Bibtex]
@Article{Juang.Pappa1985,
author = {Juang, Jer-Nan and Pappa, Richard S.},
title = {{A}n {E}igensystem {R}ealization {A}lgorithm for {M}odal {P}arameter {I}dentification and {M}odel {R}eduction},
journal = {Journal of Guidance Control and Dynamics},
year = {1985},
volume = {8},
number = {5},
month = nov,
abstract = {A method called the eigensystem realization algorithm is developed for modal parameter identification and model reduction of dynamic systems from test data. A new approach is introduced in conjunction with the singular-value decomposition technique to derive the basic formulation of minimum order realization which is an extended version of the Ho-Kalman algorithm. The basic formulation is then transformed into modal space for modal parameter identification. Two accuracy indicators are developed to quantitatively identify the system and noise modes. For illustration of the algorithm, an example is shown using experimental data from the Galileo spacecraft.},
doi = {10.2514/3.20031},
file = {:Juang.Pappa1985 - An Eigensystem Realization Algorithm for Modal Parameter Identification and Model Reduction.pdf:PDF},
}