I am building a Hardware Design Language for FPGA accelerators.
The big trick or the language is that it doesn't hide the pipelining you have to do to up your FMax, instead, you can manually add register stages in the places they're important, and the compiler will synchronize the other paths.
A really neat trick with this pipelining system is that submodules can respond to the amount of pipelining around them (through inferring template parameters). This way the programmer really doesn't have to think about the pipelining they do add. Examples are a FIFO's almost_full treshold, inferring how many simultaneous state there needs to be for a pipelined loop, inferring the depth of BRAM shift regs, etc.