Right, so your argument would work if you allow the layer widths to tend to infi...

Right, so your argument would work if you allow the layer widths to tend to infinity sequentially (so this corresponds to finite networks where each previous layer is much bigger than the next layer). This is the argument presented by Lee et al. But note it's nontrivial to argue that this limit holds when the widths of all layers tend to infinity at the same time (arguably the more natural limit), which is one of the main contributions of Matthews et al. In my paper here, I also consider this limit where the widths tend to infinity at the same time.

In any case, I'll update the paper to reflect our discussion here. Thanks, Radford!