Lecture 7: Information and Entropy

Note

My greatest concern was what to call it. I thought of calling it ‘information,’ but the word was overly used, so I decided to call it ‘uncertainty.’ When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, ‘You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage. Claude Shannon, 1971

Warning

This lecture corresponds to Chapter 15 of the textbook.

Summary

Attention

In this lecture, we look at how one can quantify information. We all have a pretty good intuitive understanding of the amount of information available in one claim and can easily tell, among two claims, which one has the most information. For example, take the two pieces of information: “I live on Earth.” and “I live in NY State”. It is clear that the two pieces of information do not convey the same amount of new knowledge. Claude Shannon realized that the amount of information is proportional to the inverse of the probability for the claim to occur. In other words, if a less likely event takes place, you will get more information if someone tells you something about that event compared to added knowledge about the more likely on.

Formally, this leads to the definition of information Q (in units of bits that has a probability P to take place:

Q=-k \log P.

We understand the need for the logarithmic function: if you are given two independent statements, knowing the two statements increases the chances by multiplying the probabilities.

This leads to the notion of average information, or Shannon Entropy:

S=\langle Q\rangle=\sum_{i} Q_{i} P_{i}=-k \sum_{i} P_{i} \log P_{i}.

This definition is reminiscent of Gibbs’ definition of entropy we saw in Lecture 6: Entropy. (the difference is that k is no longer the Boltzmann constant).

The big leap is that information, since it carries entropy, can be considered as a physical quantity (Rolf Landauer). After all, this is not surprising since, in thermodynamic, we defined entropy as a measure of a number of microstates a system can be in to realize a macrostate. This uncertainty (that is: lack of knowledge) is certainly related to information!

Interestingly, this allows us to resolve the issue with Maxwell’s demon related to the irreversibility of the Joule expansion we saw in the previous lecture. The demon must lose information and thus increase entropy during the sorting of the gas molecules!

Finally, in this lecture, we look into the issues of data compression and discussed a couple of examples of application of Bayes theorem for conditional probabilities:

P(A \mid B)=\frac{P(B \mid A) \cdot P(A)}{P(B)}.

Where we defined P(A), P(B) as the independent probabilities of A and B and  we further define :math: as the probability of A given B is true. Likewise, P(B
\mid A) is the probability of B given A is true.

Learning Material

Copy of Slides

The slides for Lecture 7 are available in pdf format here: pdf

Screencast

Test your knowledge

  1. Consider the following two statements. (a) Students who graduate with a bachelor in physics do so by passing IQM and (b) Students who graduate with a bachelor in applied physics do so by passing IQM. Statement (a) occurs with probability P=1 and statement (b) occurs with a probability P=1/4. What are is Shannon information of each statement, in bits (we use \log_2 basis and suppose k=1)?

    1. Q_a=1 and Q_b=2 bits. There is more information in statement b.

    2. Q_a=0 and Q_b=2 bits. There is more information in statement b.

    3. Q_a=0 and Q_b=-2 bits. There is more information in statement a.

    4. Q_a=0 and Q_b=-2 bits. There is more information in statement b.

  2. Mrs. Bonnie T. has three kittens. Two of them are male. What is the probability that the third one is a female?

    1. 75%

    2. 50%

    3. 37.5%

    4. 25%

  3. Mrs. Bonnie T. has three kittens. The two tallest ones are male. What is the probability that the third one is a female?

    1. 75%

    2. 50%

    3. 37.5%

    4. 25%

  4. The less information you know about a system, the largest its entropy.

    1. True

    2. False

    3. It depends

Hint

Find the answer keys on this page: Answers to selected test your knowledge questions. Don’t cheat! Try solving the problems on your own first!

Homework Assignment

Solve the following problems from the textbook: