I made extremely minor changes to the way the question was phrased and it failed badly, not just getting the answer wrong but falling into incoherence, claiming that T was a vowel, or that 3 was an even number.
The largeness of its training set can give an incorrect impression of its reasoning capabilities. It can apply simple logic to situations, even situations it hasn't seen, but the logic can't get much beyond the first couple of lectures in an introductory First-order Logic course before it starts to fall apart if it can't lean on its large training set data.
The fact that it can do logic at all is impressive to me though, I'm interested to see how much deeper its genuine capability goes as we get more advanced models.