(machinelearning.apple.com)

171 points pizza | 2 comments | 06 Apr 25 08:53 UTC | HN request time: 0.405s | source

1. benob ◴[06 Apr 25 16:34 UTC] No.43602745[source]▶

A variant I have been thinking of: each parameter matrix (or block) is the sum of a random matrix (generated from a seed) and a low rank matrix (a LoRA). I'd like to experiment training from scratch in that setting.

replies(1): >>43602864 #

2. sadiq ◴[06 Apr 25 16:50 UTC] No.43602864[source]▶

>>43602745 (TP) #

There's a related write-up here you might find interesting: https://wandb.ai/learning-at-home/LM_OWT/reports/Parameter-s...

It covers some experiments on weight tying, one of which is actually LoRA and random weights.

↑

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators