The Readme sais "zero heap allocations" but the code uses list and unordered map and moves, did you mean "zero allocations after state tree building"?
Also for embedded it would be useful to separate all in/out, dot export etc. to a second library that you can omit on small targets.