.. _lecture7: Lecture 7: Information and Entropy ++++++++++++++++++++++++++++++++++ .. note:: *My greatest concern was what to call it. I thought of calling it 'information,' but the word was overly used, so I decided to call it 'uncertainty.' When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, 'You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage'.* -- Claude Shannon, 1971 .. warning:: This lecture corresponds to Chapter 15 of the textbook. Summary ------- .. attention:: In this lecture, we look at how one can quantify information. We all have a fairly intuitive understanding of the amount of information available in one claim and can easily tell, among two claims, which one has the most information. For example, take the two pieces of information: "I live on Earth." and "I live in NY State". It is clear that the two pieces of information do not convey the same amount of new knowledge. Claude Shannon realized that the amount of information is proportional to the inverse of the probability for the claim to occur. In other words, if a less likely event takes place, you will get more information if someone tells you something about that event compared to learning more about a more likely one. Formally, this leads to the definition of information :math:`Q` (in units of *bits* that has a probability :math:`P` to take place: .. math:: Q=-k \log P. We understand the need for the logarithmic function: if you are given two independent statements, knowing the two statements increases the chances by multiplying the probabilities (the logarithm of a product is the sum of the logarithms). This leads to the notion of :index:`average information`, or :index:`Shannon Entropy`: .. math:: S=\langle Q\rangle=\sum_{i} Q_{i} P_{i}=-k \sum_{i} P_{i} \log P_{i}. This definition is reminiscent of Gibbs' definition of entropy we saw in :ref:`lecture6`. (the difference is that :math:`k` is no longer the Boltzmann constant). The big leap is that information, since it carries entropy, can be considered as a physical quantity (Rolf Landauer). After all, this is not surprising since, in thermodynamic, we defined entropy as a measure of the number of microstates a system can be in to realize a macrostate. This uncertainty (that is: lack of knowledge) is certainly related to information! Interestingly, this allows us to resolve the issue with Maxwell's demon related to the irreversibility of the Joule expansion we saw in the previous lecture. The demon must lose information and thus increase entropy during the sorting the gas molecules! Finally, in this lecture, we look into the issues of data compression and discussed a few examples of application of Bayes theorem for conditional probabilities: .. math:: P(A \mid B)=\frac{P(B \mid A) \cdot P(A)}{P(B)}, where we defined :math:`P(A)`, :math:`P(B)` as the independent probabilities of :math:`A` and :math:`B` and we further define :math:`P(A \mid B)` as the probability of :math:`A` given :math:`B` is true. Likewise, :math:`P(B \mid A)` is the probability of :math:`B` given :math:`A` is true. Learning Material ----------------- Copy of Slides ~~~~~~~~~~~~~~ The slides for Lecture 7 are available in pdf format here: :download:`pdf <_pdfs/slides/lecture7.pdf>` Screencast ~~~~~~~~~~ .. raw:: html .. raw:: latex This lecture is available as a YouTube recording at : \href{https://www.youtube.com/embed/LJhwFw4fRkA}{chapter 15}. Test your knowledge ------------------- 1. Consider the following two statements. (a) Students who graduate with a bachelor in physics do so by passing IQM and (b) Students who graduate with a bachelor in applied physics do so by passing IQM. Statement (a) occurs with probability :math:`P=1` and statement (b) occurs with a probability :math:`P=1/4`. What is the Shannon information of each statement, in bits (we use :math:`\log_2` basis and suppose :math:`k=1`)? A. :math:`Q_a=1` and :math:`Q_b=2` bits. There is more information in statement (b). B. :math:`Q_a=0` and :math:`Q_b=2` bits. There is more information in statement (b). C. :math:`Q_a=0` and :math:`Q_b=-2` bits. There is more information in statement (a). D. :math:`Q_a=0` and :math:`Q_b=-2` bits. There is more information in statement (b). 2. Mrs. Bonnie T. has three kittens. Two of them are male. What is the probability that the third one is a female? Assume each kitten’s sex is independent and equally likely. A. 75\%. B. 50\%. C. 37.5\%. D. 25\%. 3. Mrs. Bonnie T. has three kittens. The two tallest ones are male. What is the probability that the third one is a female? Assume each kitten’s sex is independent and equally likely. A. 75\%. B. 50\%. C. 37.5\%. D. 25\%. 4. The less you know about a system, the greater its entropy. A. True. B. False. C. It depends. .. hint:: Find the answer keys on this page: :ref:`answerkeys`. Don't cheat! Try solving the problems on your own first!