The training loop with Keras for simple model is indeed easier and faster than PyTorch oriented helpers (e.g. Lightning AI, Hugging Face accelerate) but much, much less flexible.
The training loop with Keras for simple model is indeed easier and faster than PyTorch oriented helpers (e.g. Lightning AI, Hugging Face accelerate) but much, much less flexible.
source: pyTorch Foundation, news
- LLM support with PyTorch is better (both at a tooling level and CUDA level). Hugging Face transformers does have support for both TensorFlow and PyTorch variants of LLMs but...
- Almost all new LLMs are in PyTorch first and may or may not be ported to TensorFlow. This most notably includes embeddings models which are the most important area in my work.
- Keras's training loop assumes you can fit all the data in memory and that the data is fully preprocessed, which in the world of LLMs and big data is infeasible. PyTorch has a DataLoader which can handle CPU/GPU data movement and processing.
- PyTorch has better implementations for modern ML training improvments such as fp16, multi-GPU support, better native learning rate schedulers, etc. PyTorch can also override the training loop for very specific implementations (e.g. custom loss functions). Implementing them in TensorFlow/Keras is a buggy pain.
- PyTorch was faster to train than TensorFlow models using the same hardware and model architecture.
- Keras's serialization for model deployment is a pain in the butt (e.g. SavedModels) while PyTorch both has better implementations with torch.jit, and also native ONNX export.
https://keras.io/examples/keras_recipes/trainer_pattern/
> - Keras's training loop assumes you can fit all the data in memory and that the data is fully preprocessed, which in the world of LLMs and big data is infeasible.
The Tensorflow backend has the excellent tf.data.Dataset API, which allows for out of core data and processing in a streaming way.
https://huggingface.co/docs/transformers/main/en/trainer#cus...
I don’t need a custom loss function, so keras is just fine.
From the article it sounds like Waymo run on Keras. Last I checked Waymo was doing better than the PyTorch powered Uber effort.
In our case, we made some ensembling with several small models using Keras. Our secret sauce at that time was in the specificity of our data and the labeling.