Francois Chollet is leaving Google

(developers.googleblog.com)

Show context

minimaxir ◴[13 Nov 24 23:23 UTC] No.42131340[source]▶

Genuine question: who is using Keras in production nowadays? I've done a few work projects in Keras/TensorFlow over the years and it created a lot of technical debt and lost time debugging it, with said issues disappearing once I switched to PyTorch.

The training loop with Keras for simple model is indeed easier and faster than PyTorch oriented helpers (e.g. Lightning AI, Hugging Face accelerate) but much, much less flexible.

replies(4): >>42131586 #>>42131775 #>>42133251 #>>42136668 #

1. magicalhippo ◴[14 Nov 24 00:24 UTC] No.42131775[source]▶

>>42131340 #

As someone who hasn't really used either, what's pytorch doing that's so much better?

replies(3): >>42131884 #>>42131972 #>>42133260 #

2. jwjohnson314 ◴[14 Nov 24 00:40 UTC] No.42131884[source]▶

>>42131775 (TP) #

PyTorch is just much more flexible. Implementing a custom loss function, for example, is straightforward in PyTorch and a hassle in Keras (or was last time I used it, which was several years ago).

3. minimaxir ◴[14 Nov 24 00:54 UTC] No.42131972[source]▶

>>42131775 (TP) #

A few things from personal experience:

- LLM support with PyTorch is better (both at a tooling level and CUDA level). Hugging Face transformers does have support for both TensorFlow and PyTorch variants of LLMs but...

- Almost all new LLMs are in PyTorch first and may or may not be ported to TensorFlow. This most notably includes embeddings models which are the most important area in my work.

- Keras's training loop assumes you can fit all the data in memory and that the data is fully preprocessed, which in the world of LLMs and big data is infeasible. PyTorch has a DataLoader which can handle CPU/GPU data movement and processing.

- PyTorch has better implementations for modern ML training improvments such as fp16, multi-GPU support, better native learning rate schedulers, etc. PyTorch can also override the training loop for very specific implementations (e.g. custom loss functions). Implementing them in TensorFlow/Keras is a buggy pain.

- PyTorch was faster to train than TensorFlow models using the same hardware and model architecture.

- Keras's serialization for model deployment is a pain in the butt (e.g. SavedModels) while PyTorch both has better implementations with torch.jit, and also native ONNX export.

replies(1): >>42132254 #

4. perturbation ◴[14 Nov 24 01:38 UTC] No.42132254[source]▶

>>42131972 #

I think a lot of these may have improved since your last experience with Keras. It's pretty easy to override the training loop and/or make custom loss. The below is for overriding training / test step altogether, custom loss is easier by making a new loss function/class.

https://keras.io/examples/keras_recipes/trainer_pattern/

> - Keras's training loop assumes you can fit all the data in memory and that the data is fully preprocessed, which in the world of LLMs and big data is infeasible.

The Tensorflow backend has the excellent tf.data.Dataset API, which allows for out of core data and processing in a streaming way.

replies(1): >>42132363 #

5. minimaxir ◴[14 Nov 24 01:55 UTC] No.42132363{3}[source]▶

>>42132254 #

That's a fair implementation of custom loss. Hugging Face's Trainer with transformers suggests a similar implementation, although their's has less boilerplate.

https://huggingface.co/docs/transformers/main/en/trainer#cus...

6. adultSwim ◴[14 Nov 24 04:54 UTC] No.42133260[source]▶

>>42131775 (TP) #

Being successful is also why it's better. PyTorch has a thriving ecosystem of software around it and a large userbase. Picking it comes with many network benefits.

↑