←back to thread

Francois Chollet is leaving Google

(developers.googleblog.com)
377 points xnx | 1 comments | | HN request time: 0s | source
Show context
minimaxir ◴[] No.42131340[source]
Genuine question: who is using Keras in production nowadays? I've done a few work projects in Keras/TensorFlow over the years and it created a lot of technical debt and lost time debugging it, with said issues disappearing once I switched to PyTorch.

The training loop with Keras for simple model is indeed easier and faster than PyTorch oriented helpers (e.g. Lightning AI, Hugging Face accelerate) but much, much less flexible.

replies(4): >>42131586 #>>42131775 #>>42133251 #>>42136668 #
magicalhippo ◴[] No.42131775[source]
As someone who hasn't really used either, what's pytorch doing that's so much better?
replies(3): >>42131884 #>>42131972 #>>42133260 #
minimaxir ◴[] No.42131972[source]
A few things from personal experience:

- LLM support with PyTorch is better (both at a tooling level and CUDA level). Hugging Face transformers does have support for both TensorFlow and PyTorch variants of LLMs but...

- Almost all new LLMs are in PyTorch first and may or may not be ported to TensorFlow. This most notably includes embeddings models which are the most important area in my work.

- Keras's training loop assumes you can fit all the data in memory and that the data is fully preprocessed, which in the world of LLMs and big data is infeasible. PyTorch has a DataLoader which can handle CPU/GPU data movement and processing.

- PyTorch has better implementations for modern ML training improvments such as fp16, multi-GPU support, better native learning rate schedulers, etc. PyTorch can also override the training loop for very specific implementations (e.g. custom loss functions). Implementing them in TensorFlow/Keras is a buggy pain.

- PyTorch was faster to train than TensorFlow models using the same hardware and model architecture.

- Keras's serialization for model deployment is a pain in the butt (e.g. SavedModels) while PyTorch both has better implementations with torch.jit, and also native ONNX export.

replies(1): >>42132254 #
perturbation ◴[] No.42132254{3}[source]
I think a lot of these may have improved since your last experience with Keras. It's pretty easy to override the training loop and/or make custom loss. The below is for overriding training / test step altogether, custom loss is easier by making a new loss function/class.

https://keras.io/examples/keras_recipes/trainer_pattern/

> - Keras's training loop assumes you can fit all the data in memory and that the data is fully preprocessed, which in the world of LLMs and big data is infeasible.

The Tensorflow backend has the excellent tf.data.Dataset API, which allows for out of core data and processing in a streaming way.

replies(1): >>42132363 #
1. minimaxir ◴[] No.42132363{4}[source]
That's a fair implementation of custom loss. Hugging Face's Trainer with transformers suggests a similar implementation, although their's has less boilerplate.

https://huggingface.co/docs/transformers/main/en/trainer#cus...