bf16 has 7 mantissa bits, so consecutive representable values around |w|
are spaced by roughly |w| · 2-8. When the Adam
update is smaller than that spacing, the post-step weight rounds back to the same byte. Try it.
{indices, values} encoding (int32 + bf16 = 6 bytes per changed element), a delta payload lands at
20 to 35 MB. Crank η up and watch that number explode.