To try to expand on the information measure part from a more abstract starting point: Consider a probability distribution, some set of probabilities p. We can consider it as indicating our degree of certainty about what will happen. In an equiprobable distribution, e.g. a fair coin flip (1/2, 1/2) there is no skew either which way, we are admitting that we basically have no reason to suspect any particular outcome. Contrarily, in a split like (1/4, 3/4) we are stating that we are more certain that one particular outcome will happen.
If you wanted to come up with a number to represent the amount of uncertainty, it's clear that the number should be higher the closer the distribution is to being completely equiprobable (1/2, 1/2)—complete lack of certainty about the result, and the number should be smallest when we are 100% certain (0, 1).
This means that the function has to be an order inversion on the probability values—that is I(1) = 0 (no uncertainty). The logarithm, to arbitrary base (selecting a base is just a change of units) has this property under the convention that I(0) = inf (that is, a totally improbable event carries infinite information—after all, an impossibility occurring would in fact be the ultimate surprise).
Entropy is just the average of this function taken over the probability values (multiply each probability in the distribution by the log of the inverse of the probabilities and sum them). In info theory you also usually assume the probabilities are independent, and so the further condition that I(pq) = I(p) + I(q) is also stipulated.