Random Variables
Random variables are the mathematical foundation for quantifying and analyzing uncertain outcomes, allowing us to bridge between real-world phenomena and probability theory.
A random variable is a function that assigns a numerical value to each outcome in a probability experiment. It converts qualitative events into numbers we can analyze.
Example: When rolling two dice, define a random variable X as the sum of the dice. Instead of tracking complex outcomes like 'first die shows 3, second die shows 4,' we work with X = 7.
Formally, X is a function from the sample space Ω to the real numbers ℝ.
Applications in Machine Learning:
- Feature representation: Random variables serve as inputs to ML models, representing measurable attributes of data points like pixel values, user demographics, or sensor readings.
- Target variables: The outcomes we aim to predict, such as class labels in classification, numerical values in regression, or generated content in generative models.
- Model parameters: Weights and biases in neural networks are treated as random variables in Bayesian approaches, capturing uncertainty in model specification.
- Latent variables: Unobserved factors in unsupervised learning that explain patterns in data, like topics in topic modeling or hidden states in dimensionality reduction.
Model Selection Insight: When building models, match your model type to your random variable characteristics. For discrete targets (like click/no-click), choose classification models; for continuous targets (like house prices), select regression models. Remember that transforming variables (e.g., log-transforming skewed data) often improves model performance by better aligning with underlying probability assumptions.