Probability for Machine Learning, undefined

Expectation & Moments

Expectation and moments quantify the center, spread, shape, and other properties of probability distributions. These statistical measures are essential for evaluating model performance, quantifying uncertainty in predictions, and understanding the tradeoffs in different learning approaches.

Expectation (Mean):

Definition: The weighted average of all possible values, denoted as E[X] = Σ xᵢ P(xᵢ) for discrete or E[X] = ∫ x f(x) dx for continuous variables.
Properties: Linearity (E[aX + bY] = aE[X] + bE[Y]), used to define loss functions (MSE, cross-entropy), and basis for model optimization through expected risk minimization.

Variance:

Definition: Measures the spread or dispersion from the mean, calculated as Var(X) = E[(X - E[X])^2].
Applications: Quantifies prediction uncertainty, used in bias-variance decomposition, guides regularization strength in models, and essential for confidence intervals and hypothesis testing.

Related Concepts:

Covariance: Measures relationship between variables: Cov(X,Y) = E[(X-E[X])(Y-E[Y])].
Standard Deviation: Square root of variance, used for interpretability.
Moments: Higher-order statistics that fully characterize distributions.

Model Evaluation Connection: When building models, use variance to detect overfitting—if training error is low but validation error is high, your model has high variance and is capturing noise. For high-stakes applications, ensemble methods like Random Forests intentionally increase model variance (through randomization) but decrease overall prediction variance (through averaging), making them more robust for real-world deployment. Understanding this bias-variance tradeoff helps you select appropriate regularization techniques and model complexity for your specific application.