fjxl is the "JXL_Lossless/f" lines in my full_benchmarks.txt link. It's about mid-table in terms of encode speed. I clocked it at about 10x faster than libpng, not 100x faster.
If you look solely at encode speed, fjxl loses to fpnge (also listed in full_benchmarks.txt) so I'm not convinced yet that JPEG XL beats PNG in all ways.
Did you compile fjxl with or without SIMD? That makes a big difference.
And yes, fpnge is slightly faster than fjxl but also compresses significantly worse. It is likely that you could make an even faster but slightly worse version of fjxl that would beat fpnge. But I think the speed of fjxl is already good enough in practice.
The problem with PNG extensions for multi-threading is that it only works if you control the encode side too — most existing encoders will not use such an extension. If existing deployments need to be replaced anyway, you can just as well use a new format altogether.
> If existing deployments need to be replaced anyway, you can just as well use a new format altogether.
There's still an operational difference between rolling out multi-threaded PNG versus multi-threaded $OTHER_FORMAT.
At the file format level, MT PNG can be designed to be backwards compatible. Older PNG decoders simply ignore the new non-critical chunk.
At the software level, you can roll out encoders that emit MT-capable PNGs without having to also roll out new decoders everywhere (or 99% everywhere, or 'on all major browsers'). You can roll out new decoders only partially, and those upgraded places enjoy the benefits, but you don't break the decoders you don't control.
Also, in terms of additional code size, upgrading your PNG library from version N to version N+1 is probably smaller than adding a whole new $OTHER_FORMAT library.
I think you must be doing something very wrong if you measure less than 1 Mpx/s for fjxl. It should reach at least 50 Mpx/s or so, and of course more if you are using multiple threads.
Anyway, yes, we should improve decode speed — for example, currently every sample value is unnecessarily converted from int to float and then back to int, and obviously that causes some avoidable slowdowns.
I agree that MT PNG does have a gentler transition path than introducing a new format. The point remains though that you need to somehow get both the encoder side and the decoder side upgraded to benefit from the advantage, which can be hard given how many existing deployments of png there are. Any system that has to take arbitrary png as input cannot rely on MT decode being an option, etc.
Also one disadvantage of MT PNG compared to JXL is that it splits the image in stripes, not tiles. For full-image decode, that doesn't matter much, but for region-of-interest (cropped) decode, tiles are a bit more efficient (no need to unnecessarily decode the areas to the left and right of the region of interest).
> The point remains though that you need to somehow get both the encoder side and the decoder side upgraded to benefit from the advantage,
Well, for PNG, Apple have shipped exactly that. It was something they could unilaterally do without e.g. having to get all of the major browser makers on board.
Apple's PNG encoders (i.e iOS dev SDKs) produce multi-thread-friendly PNGs and their decoders (i.e. iOS devices) reap the benefits. And unlike trying to introduce a new but not-backwards-compatible-with-PNG format (JPEG-XL), other PNG decoders can still decode these images. They just don't get the speed benefit.
> Any system that has to take arbitrary png as input cannot rely on MT decode being an option, etc.
It's forwards compatible too. Apple's PNG decoder will use MT decode if the PNG image was encoded with MT metadata, but Apple's PNG decoder can still decode arbitrary PNG (using its pre-existing single-thread code path).
> I think you must be doing something very wrong if you measure less than 1 Mpx/s for fjxl. It should reach at least 50 Mpx/s or so,
In case it wasn't clear, the 0.630 fjxl encode speed number in the top level README.md in that commit is after normalization such that QOIR is 1.000. After all, absolute numbers are hardware dependent.
If you look at the doc/full_benchmarks.txt change in the same commit I linked to, fjxl encode speed clocks at 106.48 megapixels per second. It's just that QOIR encode speed clocks at 168.90 megapixels per second (and fpnge at 312.59).
If you look solely at encode speed, fjxl loses to fpnge (also listed in full_benchmarks.txt) so I'm not convinced yet that JPEG XL beats PNG in all ways.
I believe that Apple have an unofficial PNG extension that facilitates multi-threaded decoding: https://github.com/w3c/PNG-spec/issues/45