It concerns me that these defensive techniques themselves often require even more llm inference calls.
Just skimmed the GitHub repo for this one and the read me mentions four additional llm inferences for each incoming request - so now we’ve 5x’ed the (already expensive) compute required to answer a query?