3Blue1Brown has an outstanding series of short videos, "Essence of linear algebra"[1], which covers vectors, matrix transformations, and the related math. What I particularly like about these videos is that the concepts are introduced first without the numbers and calculations (that stuff is covered later). The concepts are first introduced as abstract animations to emphasize the "shape" of what the vector, matrix, etc really represent.
Open question I still have - what is the "geometric interpretation" of the transpose operation? A^T?
Considering the fact that the transpose shows up all the time, I'm very surprised that I've never seen a good explanation of how I should be visualizing it.
So the adjacency matrix is not only an index of edges, it's also a little machine that can push nodes around on a graph. You can even do it with two at a time:
Well, the adjacency matrix I used as an example is orthogonal, but they don't have to be. Any matrix with 1's and 0's can be interpreted as an unweighted adjacency matrix for an undirected graph (if the matrix is symmetric) or directed graph (if it's not symmetric). For example, here's an adjacency matrix that's not an orthogonal matrix:
It's a bit difficult to explain the visualization without pictures, but I'll give it a shot.
The transpose is really about converting the matrix to operate on a different vector space, namely, the dual space. In particular, the dual space of a vector space V is the vector space of "linear functionals", which are linear functions
\phi: V -> R
A linear functional on R^2 looks like a gradient (the "gradient fill" gradient, not a calculus gradient). These gradients are in one-to-one correspondence to vectors in R^2. In particular, given a vector w \in R^2, the direction of the gradient is along the direction of v, and the speed with which the gradient is changing corresponds to the magnitude of w.
The precise mathematical correspondence is that (i) given a vector w \in R^2, the function
f_w(v) = <w, v>
is a linear function (here, <,> is the inner/dot product), and (ii) every linear function has this form. Now, note that f_w is exactly multiplication by the transpose w^T of w! In particular,
f_w(v) = w^T v
More generally, for any linear map A : V -> W, the adjoint A* of A is defined to be the linear map from the dual space of W to the dual space of V that satisfies
<w, A v> = <A* w, v>
The transpose A^T is the adjoint of A when V is a finite-dimensional real vector space:
<A^T w, v> = (A^T w)^T v = w^T A v = <w, A v>
In summary, you can try to visualize A^T as a linear map "acting" on dual vectors. For example, let v \in R^2 and let w be a dual vector (i.e., a gradient), and suppose that A rotates v clockwise by 90 degrees. To preserve the inner product <w, A v>, A^T rotates w counter-clockwise by 90 degrees.
And a practical example of this is checking transformed points against a view frustum. Instead of transforming points into the view space a transposed matrix allows you to transform the frustum into the object space and check untransformed points against it. This works only on non-perspective transforms, of course, but the view transform should not be perspective anyways.
To visualize this, take a simplest case of 2D space and non-homogenous coordinates. A simple frustum would be an angle made by two rays from the origin. You can see that rotating this space is as same as rotating the frustum in the opposite direction (though in this case transposed matrix is the same as inverse) but stretching the space opens/closes the frustum depending on in which direction it pulls its normals.
> Open question I still have - what is the "geometric interpretation" of the transpose operation? A^T?
AFAIK, there is not a solid geometric interpretation. Part of the trouble is that the transpose can change the shape of the matrix.
For example, for a vector the transpose turns it into a matrix outputting a single number. These two objects seem to be rather incomparable geometrically.
The best intuition I have for transposition is that it represents a time reversal (but not necessarily an inverse). In the case of a vector, you have to think of it as a linear transformation that maps 1 to that vector. The transpose instead maps that vector to its length squared. I have more vague intuition for the why its length squared and how it relates to projections, but its hard to put into words.
With rotation matrices, this time reversal results in the inverse. So essentially rotation/skewing is reversed, but scaling is not.
I was hoping this was the top comment. Linear algebra would have been much more interesting if I had watch those videos first. He 3B1B such an amazing teacher!
[1] https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2x...