My personal inclination is to make the longer jump, and go straight for a deeply rudimentary Lisp. There's a trick where you start off with Lisp macros that expand to assembly, and I once knew someone who got it working for new hardware during a 10-hour plane flight. It's a slightly longer climb than Forth, but even a primitive Lisp is nice.
However, the deciding factor here really is the 6502 and 65C02 microprocessers. You really want at least 4 general-purpose registers for a primitive Lisp dialect, and that's pushing it. And the 65C02 basically has 1, or 3 if you clap your hands and believe. Even C is a serious challenge on a 65C02.
But Forth thrives in that enviroment. You only need a stack pointer and enough registers to do exactly 1 canned operation. So: victory to Forth.
And wow, I wish I had seen Chipwits back in the day. I was a massive fan of the Rocky's Boots logic game, but Chipwits never showed up in our neck of the woods. Thank you for open sourcing it!
The trick with a macro-assembler that uses Lisp macros to generate assembly was basically folklore when I learned it, and I haven't seen it fully fleshed out anywhere in the literature. For a tiny chip, you'd run this as a cross compiler from a bigger machine. But you basically have Lisp macros that expand to other Lisp macros that eventually expand to assembly representated as s-expressions.
As for why basic Lisps are register-hungry, you usually reserve an "environment pointer register", which points to closure data or scope data associated with the currently running function. And then you might also want a "symbol table base" register, which points to interned symbols. The first symbol value (located directly where the symbol register points) should be 'nil', which is both "false" and the "empty list". This allows you to check Boolean expressions and check for the empty list with a single register-to-register comparison, and it makes checks against other built-in symbols much cheaper. So now you've sacrificed 2 registers to the Lisp gods. If you have 8 registers, this is fine. If you have 4 registers, it's going to hurt but you can do it. If you have something like the 65C02, which has an 8-bit accumulator and two sort-of-flexible index registers, you're going to have to get ridiculously clever.
Of course, working at this level is a bit like using #[no_std] in Rust. You won't have garbage collection yet, and you may not even have a memory allocator until you write one. There are a bunch of Lisp bootstrapping dialects out there with names like "pre-Scheme" if you want to get a feel for this.
Forth is a stack machine, so you basically just need a stack pointer, and a couple of registers that can be used to implement basic operations.
Anyway, Lisp in Small Pieces is fantastic, and it contains a ton of the old tricks and tradeoffs.
I recently wrote an assembler in scheme; I'm in the process of adding macros. You need very few primitives to implement what amounts to a lisp compiler. A larger issue is bootstrapping garbage collection from manual memory allocation—while there are a few tricks to do this simply, if inefficiently, high-performance garbage collection needs to be tightly integrated with the compiler in order to implement a) scanning for roots, b) rewriting pointers when you move allocations around, and c) likely pointer tagging. None of this is easy to design around or to bolt on to a macro-oriented compiler-assembler.
And of course, writing the really fancy bits of lisp—escaping closures, continuations, conditions—take a lot more effort and thought and care.
Curiously, garbage collection is far from necessary to achieve bootstrapping. It's just rather odd to use a lisp with manual memory allocation. I've found stealing from Rust's memory model has been very fruitful in this regard (not the borrow checker). RAII is also very easy to implement with careful attention to how scoping and moving is implemented.