Depends on how naive the assembler programmer is, and, I would think rarely, if ever, on modern hardware because the many subroutine calls kill branch prediction benefits. Also, on lots of old 8-bit hardware, defaulting to 16-bit integers will kill performance relative to native assembly in cases where 8-bit integers suffice.
(Of course, you can fairly easily replace hot loops by assembly or (more difficult) change the forth compiler to compile parts to native code, fuse words, etc)
Every Forth that uses conventional threaded-code interpretation pays a considerable performance penalty, execution times are likely to be very roughly quadruple the equivalent assembly. [0]
Forth's runtime performance can be competitive with C if 'proper' compilation is performed, though. [1]
[0] https://benhoyt.com/writings/count-words/
[1] (.fth file with their results in comments) http://www.mpeforth.com/arena/benchmrk.fth