L01. Basics of Information

Download slides [PDF]

Check yourself: If the amount of information received is proportional to the amount uncertainty resolved, which piece of data conveys the most information? The least?
The card is a heart The card is not the Ace of spades The card is a face card (J, Q, K) The card is the Queen of Hearts The card is a heart The card is not the Ace of spades The card is a face card (J, Q, K) The card is the Queen of Hearts
Check yourself: An unfair coin with p(TAILS)=1/8 is tossed. How much information is received upon learning the outcome is... ± 0.001. You can use log2() ± 0.001. You can use log2()
Check yourself: What is the entropy of a fair 6-sided die roll? ± 0.001. You can use log2() What is the entropy of the unfair coin (p(TAILS)=1/8)? ± 0.001. You can use log2()
Check yourself: When our sheep moved to the city, it started bleating "10011110". We think it might be using the video's encoding:
  • B->0,
  • A->11,
  • C->100,
  • D->101.
Check yourself: We map the integers 1 through 6 to the outcome of a fair six-sided die by using a 3-bit fixed-length encoding. ± 0.001. You can use log2() How does it compare to the entropy of the die? Is that expected? Length greater than entropy Length same as entropy Length smaller than entropy Ben Bitdiddle finds the die and rolls 5, 2, then 6. What would Bitdiddle's message be? 0b01101001...
Check yourself: Using two's complement, what are the largest and smallest 5-bit numbers? You can use binary (0b1010), hex (0xA) or decimal (10). You can use binary (0b1010), hex (0xA) or decimal (10).
Check yourself: Ben Bitdiddle just heard of variable-length encodings and uses the following to encode the outcome of rolling a fair six-sided die:
  • 1->1,
  • 2->01,
  • 3->001,
  • 4->0001,
  • 5->00001,
  • 6->000001.
± 0.001. 1/8*3+...
Check yourself: Alyssa P. Hacker, unconvinced by Ben's efforts in the previous problem, uses Huffman's algorithm and gets:
  • 1->000,
  • 2->001,
  • 3->010,
  • 4->011,
  • 5->10,
  • 6->11.
Check yourself:Consider messages built from four symbols with the following probabilities: A:1/6, B:1/6, C:2/6, D:2/6. The Huffman algorithm sometimes involves making arbitrary choices that can lead to trees with different shapes. However, their expected length should all be identical. Find two trees for the above probabilities, and compute their expected length. ±0.001 The two trees have a different height, what is the height of the tallest? Sequoia?
Check yourself: In order to more efficiently communicate unfair coin tosses (p(TAILS)=1/8), we want to encode them in pairs. Hint: you've already computed the entropy of one toss, how is it related to that of two tosses? ± 0.001 Using 1-bit per coin toss, what is the expected length of a message communicating the outcome of a pair of tosses? ± 0.001 Knowing the probabilities are:
  • HH: 49/64,
  • HT: 7/64,
  • TH: 7/64,
  • TT: 1/64.
Build a Huffman encoding of the pairs of outcomes. ± 0.001 How much shorter is it? ± 0.001
To detect errors in transmission, we add an even parity bit to our 3-bit fixed length encoding of the of a fair six-sided die. That is, we add a bit to make the total number of 1s even. What is the encoding for each roll? 0b001 + parity? 0b010 + parity? 0b011 + parity? 0b100 + parity? 0b101 + parity? 0b110 + parity?
We add more parity bits to allow the receiver to correct transmission errors: we write 3 rolls in a 3x3 matrix and add an even parity bit for every row and column.
For example, Ben's role of 5, 2 then 6 would be:
5: 101 0 <
2: 010 1 <- row parity bits
6: 110 0 <
   001 <- column parity bits

First transmission:
0b... 0b... 0b...
Second transmission:
0b... 0b... 0b...
Third transmission:
0b... 0b... 0b...