1. Model "performance" judged by proxy metrics of intelligence have improved significantly over the past two years.
2. These capabilities are yet to be stitched together in the most appropriate manner for the cybersecurity scenarios the author is talking about.
In my experience, the best usage of Transformer models has come from a deep integration into an appropriate workflow. They do not (yet) replace the new exploration part of a workflow, but they are very scarily performant at following mid level reasoning assertions in a massively parallelized manner.
The question you should be asking yourself is if you can break down your task into however many small chunks that are constrained by feasiility in time to process , chunk those up into appropriate buckets or even better, place them in-order as though you were doing those steps with your expertise - an extension of self. Here's how the two approaches differ:
"Find vulnerabilities in this code" -> This will saturate across all models because the intent behind this mission is vast and loosely defined, while the outcome is expected to be narrow.
" (a)This piece of code should be doing x, what areas is it affecting, lets draw up a perimeter (b) Here is the dependency graph of things upstream and downstream of x, lets spawn a collection of thinking chains to evaluate each one for risk based on the most recent change . . . (b[n]) Where is this likely to fail (c) (Next step that a pentester/cybersecurity researcher would take) "
This has been trial and error in my experience but it has worked great in domains such as financial trading and decision support where experts in the field help sketch out the general framework of the process where reasoning support is needed and constantly iterate as though it is an extension of their selves.