My personal inclination is to make the longer jump, and go straight for a deeply rudimentary Lisp. There's a trick where you start off with Lisp macros that expand to assembly, and I once knew someone who got it working for new hardware during a 10-hour plane flight. It's a slightly longer climb than Forth, but even a primitive Lisp is nice.
However, the deciding factor here really is the 6502 and 65C02 microprocessers. You really want at least 4 general-purpose registers for a primitive Lisp dialect, and that's pushing it. And the 65C02 basically has 1, or 3 if you clap your hands and believe. Even C is a serious challenge on a 65C02.
But Forth thrives in that enviroment. You only need a stack pointer and enough registers to do exactly 1 canned operation. So: victory to Forth.
And wow, I wish I had seen Chipwits back in the day. I was a massive fan of the Rocky's Boots logic game, but Chipwits never showed up in our neck of the woods. Thank you for open sourcing it!
The trick with a macro-assembler that uses Lisp macros to generate assembly was basically folklore when I learned it, and I haven't seen it fully fleshed out anywhere in the literature. For a tiny chip, you'd run this as a cross compiler from a bigger machine. But you basically have Lisp macros that expand to other Lisp macros that eventually expand to assembly representated as s-expressions.
As for why basic Lisps are register-hungry, you usually reserve an "environment pointer register", which points to closure data or scope data associated with the currently running function. And then you might also want a "symbol table base" register, which points to interned symbols. The first symbol value (located directly where the symbol register points) should be 'nil', which is both "false" and the "empty list". This allows you to check Boolean expressions and check for the empty list with a single register-to-register comparison, and it makes checks against other built-in symbols much cheaper. So now you've sacrificed 2 registers to the Lisp gods. If you have 8 registers, this is fine. If you have 4 registers, it's going to hurt but you can do it. If you have something like the 65C02, which has an 8-bit accumulator and two sort-of-flexible index registers, you're going to have to get ridiculously clever.
Of course, working at this level is a bit like using #[no_std] in Rust. You won't have garbage collection yet, and you may not even have a memory allocator until you write one. There are a bunch of Lisp bootstrapping dialects out there with names like "pre-Scheme" if you want to get a feel for this.
Forth is a stack machine, so you basically just need a stack pointer, and a couple of registers that can be used to implement basic operations.
Anyway, Lisp in Small Pieces is fantastic, and it contains a ton of the old tricks and tradeoffs.