The question here is about all other branching the interpreter will do. i.e. even if you have a unpredictable `if (a+b < 0)`, there's still the dispatching to the "load-variable" and "add" and "load-constant" and "less-than" and "do-branch" opcodes, that still will benefit from being predicted, and they could very well if you have it repeated in a loop (despite still having a single unpredictable branch), or potentially even if you just have a common pattern in the language (e.g. comparison opcodes being followed by a branch opcode).
Tail call is a different matter…
https://arcb.csc.ncsu.edu/~mueller/cluster/ps3/SDK3.0/docs/a...
The impact of a branch miss is a particular pipeline stalls to flush the incorrect prediction. If there were resources available for the other branch to be speculatively executed concurrently and in parallel it might take less wall time.
Very few architectures have conditional indirect branches and they don't get used all that much:
- subroutine return: better predicted with a stack - virtual method dispatch: needs a predictor (for the destination, not the 'taken' - a different thing with multiple destinations chosen by the history than a normal branch destination which typically has a single destination and a history choosing whether taken or not) - dense case statements: similar to virtual method dispatch but maybe with a need for far more destinations
All these cases often involve a memory load prior to the branch, in essence what you are predicting is what is being loaded, and you want to keep feeding the pipe while you wait for the load to complete
https://www.ece.ucdavis.edu/~akella/270W05/mcfarling93combin...
It's old, but very clear. I tried to read the ITTAGE paper but it assumes you know all that already. Also it doesn't actually fully specify a branch predictor because there are various hashes you need to calculate and it simply doesn't say what they use.