[P&N] Chapter 7: Information

Citation: Edward A. Lee, 2017: Plato and the Nerd - the creative partnership of humans and technology. MIT Press, Cambridge, MA

We will step into the YIN part of this book from now on, in which professor Lee will introduce some fundamental limits of digital technology.

Pessimism Becomes Optimism

The author has emphasized the importance of keeping distinctly separate in our minds the model and the thing being modeled. This requires us to be very careful because that the physical world is much messier than the model we are used to deal with.

Unfortunately, this is really hard to do. Because so much of our thought process is structured around models, we have an enormous backdrop of unknown knowns. But a failure to make this separation inevitably leads us to invalid conclusions.

While it is very hard to do so, failing to achieve this goal can easily lead to the confusion between the “map and the territory”, which comes from the following picture:

Drilling through a map. [Rusi Mchedlishvili]

What is information?

In short, we have the following statement:

Information is the resolution of alternatives.

Shannon measured information in bits, however, this measure only works well when the alternative arrangements are discrete and finite, or when attempting to communicate over an imperfect channel.

Now suppose that we have a unfair coin, if we conduct a coin toss using this coin, the probability of the heads coming up is 0.9 while the other one is 0.1.

According to Shannon, if our unfair coin comes up heads, then we get the following amount of information:

\[ - log_2(0.9) \approx 0.15 \ bits \]

The logarithm base 2 was used by Shannon so that the information measure would be in units of bits. If you use the natural logarithm instead, then the information measure has units of “nats”. If you use base 10, then it has units of decimal digits. In all cases, however, it measures information.

Furthermore, Shannon says that the average information given to us in a single coin toss, which is called “Entropy of a coin toss” by Shannon is:

\[ -0.9 log_2(0.9) - 0.1 log_2(0.1) \approx 0.47 \ bits \]

Note that in thermodynamics, entropy is a measure of randomness or disorder in a physical system, but it assigns little meaning to the actual numbers, but only to their relative magnitudes.

So optimally and averagely, we can use only \( 10 \times 0.47 = 4.7 \ bits \) to encode the results of 10 coin toss, and there’s no better encodings.

Continuous Information

Specifically, the entropy of a continuous random experiment is given by the formula:

\[ H(X) = - \int_{\Omega} f(x) log_2(f(x)) dx \]

where \( f(x) \) is called the probability density function.

But in the context of continuous information, the integration \( H(X) \) has little actual meaning assigned to it compared to the clear meaning of the discrete information, it can even be negative, which makes little sense empirically. And we cannot even use bits to encode this kind of information.

Because the communication channel used is imperfect, noise cannot be avoided in any measurement of the physical world. While continuous information cannot be represented digitally, a noisy observation of a continuous-valued experiment yield much less information, and can be represented with a finite number of bits. This is called the channel capacity theorem.

But exactly how much information it tells us?

We can come up with a probability density function that represents the relative likelihoods of values \( x \) that could have yielded the measurement \( y \).

This new probability density is called a conditional probability density because it is a valid probability density for \( x \) only if we have a measurement \( y \).

Shannon’s channel capacity theorem tells us that the information yielded by a measurement is, on average,

\[ C = H(X) - H(X \vert Y) \]

where \( H(X) \) is the information we would gain with a perfect observation, and \( H(X \vert Y) \) is the information that is not revealed by the experiment.

Note that The result \( C \) can be represented by bits, which is really the remarkable point of this theorem.

December 31, 2017 · 读书