LLM's are pre-trained to minimize perplexity (PPL), which essentially means that they're trained to model the likelihood distribution of the next words in a sequence.
The amazing thing was that minimizing PPL allowed you to essentially guide the LLM output and if you guided it in the right direction (asked it questions), it would answer them pretty well. Thus, LLMs started to get measured on how well they answered questions.
LLMs aren't trained from the beginning to answer questions or solve problems. They're trained to model word/token sequences.
If you want an LLM that's REALLY good at something specific like solving math problems or finding security bugs, you probably have to fine tune.