Coordinate systems

With the rise of AR and VR, understanding coordinate systems and the transforms between them is becoming more important than ever. However, each 3D library has it's own conventions, so context switching between them is difficult, so I decided to write this series to improve my understanding, provide a future reference, and to help others in the process.

Orthonormal Basis

Orthonormal is defined as:

Axis vectors are all unit vectors.
Axes are all orthogonal to each other.

This means that the vector space has no scale, and no shear, and that it can be represented with only a rotation $\mathbf{R}$ and translation $T$ .

Why is this useful? An orthonormal basis can be used to describe the motion of rigid bodies, which is particularly useful for computer vision because that's how real-world objects move.

Orthonormal transforms also have a convenient property: They are easy to invert, especially in matrix form. Arbitrary matrices require complex operations to invert, and in some cases they may not be invertible, but for orthonormal matrices, less operations are required to invert and they are guaranteed to be invertible.

First, for the $3 \times 3$ rotation $\mathbf{R}$ , take the transpose. The transpose of a rotation matrix is the inverse.
Rotate and negate the translation to find the inverted translation.

Matrix rotation translation decomposition

Image credit: Miyazaki

To write this in glm-style pseudocode:

// Column-major.
mat4 rt_inverse(mat4 M) {
  mat3 R_transpose = transpose(mat3(M));

  mat4 M_inverse = mat4(R_transpose);
  M_inverse[3] = -R_transpose * M[3];
  return M_inverse;
}

This boils down to three dot-products and a transpose:

\begin{bmatrix} \color{red}{u_x} & \color{green}{v_x} & \color{blue}{w_x} & t_x \\ \color{red}{u_y} & \color{green}{v_y} & \color{blue}{w_y} & t_y \\ \color{red}{u_z} & \color{green}{v_z} & \color{blue}{w_z} & t_z \\ 0 & 0 & 0 & 1 \end{bmatrix}^{-1} = \begin{bmatrix} \color{red}{u_x} & \color{red}{u_y} & \color{red}{u_z} & -\color{red}{u} \cdot t \\ \color{green}{v_x} & \color{green}{v_y} & \color{green}{v_z} & -\color{green}{v} \cdot t \\ \color{blue}{w_x} & \color{blue}{w_y} & \color{blue}{w_z} & -\color{blue}{w} \cdot t \\ 0 & 0 & 0 & 1 \end{bmatrix}

How to tell if a matrix is orthonormal?

While a orthonormal basis can conveniently be represented with a rotation $\mathbf{R}$ and translation $\mathbf{t}$ , it is often convenient to encode the transformation as a matrix.

\begin{bmatrix} & & & \\ & \mathbf{R} & & \mathbf{t} \\ & & & \\ 0 & 0 & 0 & 1 \end{bmatrix}

Looking at a random matrix, how can we tell if it is orthonormal? There are a few simple checks:

No Scale: The magnitude (or magnitude squared) of each basis vector should be 1.
Orthogonal Bases: The dot product of two vectors is directly related to the angle between them, with $\cos(\theta)$ being 0 when the vectors are orthogonal. Use the dot product and compare it against 0 to determine if the bases are orthogonal.

$\mathbf{a}\cdot\mathbf{b}=\|\mathbf{a}\|\ \|\mathbf{b}\|\cos(\theta)$

Handedness: If the handedness is incorrect, transforms will be flipped along one axis. Use the cross product to validate that the $\hat{\imath} \times \hat{\jmath}$ is either $\hat{k}$ for right-handed or $-\hat{k}$ for left-handed coordinate systems. The downside of this approach is that a failure of the cross-product test doesn't indicate what is wrong, because failures could be the result of scaling and non-orthogonal bases as well.

\hat{\imath} \times \hat{\jmath} = \begin{cases} \:\:\: \hat{k} & \text{if right-handed} \\ - \hat{k} & \text{if left-handed} \end{cases}

Affine: The last row of the matrix must be $\big[\ 0 \quad 0 \quad 0 \quad 1 \ \big]$ .

// Column-major.
bool is_orthonormal_rh(mat4 M) {
  // No scale?
  if (!near_equal(length2(M[0]), 1.0)
      || !near_equal(length2(M[1]), 1.0)
      || !near_equal(length2(M[2]), 1.0)) {
    assert(false && "Matrix scaled");
    return false;
  }

  // Orthogonal?
  if (!near_equal(dot(M[0], M[1]), 0.0)
      || !near_equal(dot(M[0], M[2]), 0.0)
      || !near_equal(dot(M[1], M[2]), 0.0)) {
    assert(false && "Not orthogonal");
    return false;
  }

  // Right-handed?
  vec3 expected_z = cross(M[0], M[1]);
  // Negate expected_z for left-handed.
  if (!near_equal(M[2], expected_z)) {
    assert(false && "Not right-handed");
    return false;
  }

  // Affine?
  if (M[0][3] != 0.0 || M[1][3] != 0.0
      || M[2][3] != 0.0 || M[3][3] != 1.0) {
    assert(false && "Not affine");
    return false;
  }

  return true;
}

Mental Math

Now, these checks are great if you're in code, but what about if you are looking at a matrix in a debugger? As a first pass, I look for the following:

For each basis, sum the absolute value of the components. Is it close to 1? This will be pretty close, and it is much faster than calculating the magnitude of the vector.
Is the last row of the vector $\big[\ 0 \quad 0 \quad 0 \quad 1 \ \big]$ ?

Multiplication

Notation Recommendations

When operating with transformation matrices, it's common to see transform chains in the form of:

// Column-major, glm-style.
mat4 modelViewProj = proj * view * model;

This makes sense, but only because it's been ingrained in our minds from every tutorial out there. What if there was a better way?

For HoloLens we often dealt with complex transform chains, and to keep track of them we used a novel approach. While HoloLens is DirectX, I'll present OpenGL style first since that's the convention I've been focusing on. Instead of arbitrary transform names, use descriptive names that specify the source and target coordinate systems:

// Column-major, glm-style.
mat4 projFromModel = projFromView * viewFromWorld * worldFromModel;

Given no context, the source and target of each transform is immediately obvious, as well as the ordering and validity of each transform. It's valid to multiply projFromView * viewFromWorld, but not any other combination.

See more details in Sebastian's blog post: Naming Convention for Matrix Math.

DirectX (Row-Major)

The above approach gets even better with row-major transforms. Instead of using targetFromSource, use sourceToTarget:

// Row-major.
mat4 modelToProj = modelToWorld * worldToView * viewToProj;

Other Notations

ARCore, as well as Project Tango, use a convention of base_frame_T_target_frame to denote pose transforms. These pose transforms are composed of a rotation quaternion and translation, so the column-major or row-major classification doesn't apply.

In the documentation, transforms are applied left-to-right (like row-major transforms). Special care must be taken when converting these transforms to matrices because the multiplication order may change to to handedness.

Citations

Tango Coordinate Systems
Inverse matrix of transformation matrix (rotation and translation matrix)
CS 248: Final Solutions (dot-product simplification of RT inverse)
Naming Convention for Matrix Math
Computing the Pixel Coordinates of a 3D Point
World, View and Projection Transformation Matrices

Orthonormal Basis​

How to tell if a matrix is orthonormal?​

Mental Math​

Multiplication​

Notation Recommendations​

DirectX (Row-Major)​

Other Notations​

Citations​