Probability for Machine Learning, undefined

Information Content

Information content quantifies how much information is conveyed by observing a specific outcome. When a rare event occurs, it provides more information than when a common event occurs—exactly like receiving unexpected news is more informative than hearing something you already anticipated.

For a specific outcome x with probability P(x), the information content I(x) is defined as:

I(x) = -log₂ P(x)

This formula shows that as an event becomes less probable, its information content increases logarithmically. Very rare events (P(x) approaching 0) carry very high information content, while certain events (P(x) = 1) provide zero information.

Information content connects directly to entropy—entropy is simply the expected (average) information content across all possible outcomes of a random variable. This relationship means entropy can be expressed as:

H(X) = E[I(X)] = E[-log₂ P(X)]

In machine learning applications, information content helps assess the significance of observations, guides feature selection processes, and underlies many information-theoretic approaches to model evaluation and comparison.