Fun to see neural nets pushed to such extremes, really enjoyed the post.
> The smallest models had to be trained without data augmentation, as they would not converge otherwise.
Was this also the case for the 2-bit model you ended up with?
replies(1):