Probability for Machine Learning, undefined

Cross-Entropy

Cross-entropy measures how many bits (on average) are needed to encode events from distribution P using a code optimized for distribution Q:

H(P,Q) = -∑ P(x) log₂ Q(x)

When P represents the true data distribution and Q the model's predicted distribution, cross-entropy quantifies the inefficiency of using the wrong distribution for encoding. Lower values indicate better alignment between the true and predicted distributions.

Applications in Machine Learning:

Classification Loss: Cross-entropy loss trains neural networks to output probability distributions matching true class labels
Natural Language Processing: Measuring model performance in next-token prediction tasks
Information Retrieval: Evaluating relevance rankings in search algorithms