Probability for Machine Learning, undefined

Entropy

Entropy represents the average unpredictability or uncertainty in a random variable. Intuitively, it measures how 'surprising' outcomes are on average—a high-entropy system is highly unpredictable, while a low-entropy system is more ordered and predictable.

For a discrete random variable X with possible values {x₁, x₂, ..., xₙ} and probability mass function P(X), the entropy H(X) is defined as:

H(X) = -∑ P(xᵢ) log₂ P(xᵢ)

The logarithm base determines the units—base 2 gives entropy in bits, while natural logarithm (base e) gives entropy in nats. This formula captures several intuitive properties:

Events with probability 1 (certainty) contribute zero entropy
Maximum entropy occurs with uniform distributions (maximum uncertainty)
Entropy is always non-negative

Entropy provides the foundation for information theory, connecting directly to information content by quantifying the average number of bits needed to encode messages from a given source. This relationship makes entropy essential for data compression, communication systems, and machine learning algorithms that must identify patterns amid noise.