IQ is a
construct representing intelligence. The aim is to refine this construct so that it is a) externally consistent with what constitutes our understanding of human intelligence and b) internally consistent so that various things we deem high or low IQ are consistent with other things we deem high or low IQ. The things here are
items, which are question:answer pairs.
IQ is an example of factor analysis [0] where an unobserved "general intelligence factor" g is derived from observed items. The items are chosen so that responses correlate with each other and with g (basically, if the same person took two IQ tests with different questions then the score should be about the same). There may be some intermediate factors like verbal ability or spatial ability. The items here will be chosen so that they correlate only with items of the same factor--e.g. verbal items correlate with verbal items but do not correlate with spatial items.
A raw score on the test is not meaningful; individuals are compared against the population of test takers to determine their rank in the population. First, raw scores of the population are normalized so that the mean is 100 and the standard deviation is 15 (this is arbitrary; it's just the scale they use). Then an individual test taker can be compared with the population on this scale. The percentile rank can be obtained directly from the IQ score and vice versa.
You have some questions about the shifting mean of 100. In practice the normed distribution is computed from a norming group rather than recomputing the norm after ever test. A particular population can shift from the norming group (either over time or because it's a group with different characteristics) which is where things like the Flynn Effect come from. So a lot depends on the norming group.
I hope that answered some of your questions.
[0] https://en.wikipedia.org/wiki/Factor_analysis