Cross-Entropy
Cross-entropy measures how many bits (on average) are needed to encode events from distribution P using a code optimized for distribution Q:
H(P,Q) = -∑ P(x) log₂ Q(x)
When P represents the true data distribution and Q the model's predicted distribution, cross-entropy quantifies the inefficiency of using the wrong distribution for encoding. Lower values indicate better alignment between the true and predicted distributions.
Applications in Machine Learning:
- Classification Loss: Cross-entropy loss trains neural networks to output probability distributions matching true class labels
- Natural Language Processing: Measuring model performance in next-token prediction tasks
- Information Retrieval: Evaluating relevance rankings in search algorithms