Data Science Introduction, undefined

Probability Distributions

Probability distributions represent mathematical models that describe how random variables behave—mapping possible outcomes to their associated probabilities. The normal (Gaussian) distribution forms perhaps the most fundamental pattern, describing phenomena where many small independent effects combine additively. Its elegant bell curve appears across natural and social systems—from measurement errors and human heights to economic indicators and test scores. This distribution's mathematical properties make it the foundation for countless statistical methods, with its parameters (mean and standard deviation) completely characterizing its behavior.

Beyond the normal, other distributions model different data-generating processes: binomial distributions represent binary outcomes over multiple trials (like success/failure across independent attempts); Poisson distributions model rare events occurring at constant rates (such as website traffic spikes or manufacturing defects); exponential distributions describe waiting times between independent events. Understanding which distribution naturally models your data guides the selection of appropriate statistical tests and modeling approaches. The central limit theorem—one of statistics' most profound results—explains why so many real-world measurements approximate normal distributions even when individual components don't follow Gaussian patterns. This powerful result states that averages of independent random variables tend toward normal distributions regardless of their original distributions, providing theoretical justification for many statistical methods applied to aggregated data.