Probability Rules

Probability rules govern how we combine and manipulate probabilities to derive new insights. These rules form the backbone of probabilistic reasoning, enabling us to calculate the likelihood of complex events based on simpler components. Understanding these rules is essential for building intuition about how probabilities interact and for applying them effectively in machine learning contexts.

Conditional probability measures the likelihood of an event occurring given that another event has already occurred. It helps us update probabilities when we have partial information about an outcome.

Example: If 25% of students play sports and study music, and 50% of students play sports, then the probability a sports player also studies music is 25% ÷ 50% = 50%.

The conditional probability of event A given that event B has occurred is defined as:

P(A|B) = P(A ∩ B)/P(B) for P(B) > 0.

This formula represents the proportion of B's probability that also includes A.

Two events are independent if the occurrence of one does not affect the probability of the other.

Example: The outcome of a coin flip doesn't affect the outcome of a dice roll—these events are independent.

Formally, events A and B are independent if and only if: P(A ∩ B) = P(A)P(B) or equivalently,

P(A|B) = P(A).

The multiplication rule determines the probability of two events happening together (the intersection). It multiplies the probability of one event by the conditional probability of the second event, given that the first has occurred.

Example: If 5% of people have a certain disease, and the test is 90% accurate for those with the disease, then the probability of having the disease and testing positive is 5% × 90% = 4.5%.

For any two events A and B, the probability of their intersection is given by:

P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A).

For independent events, this simplifies to P(A ∩ B) = P(A)P(B).

The chain rule extends this to multiple events.

The addition rule helps us calculate the probability of either of two events occurring. When calculating the probability of 'A or B' happening, we add their individual probabilities and subtract the probability of their overlap (to avoid counting the overlap twice).

Example: If there's a 30% chance of rain and 20% chance of wind, with a 10% chance of both occurring together, then the chance of either rain or wind is 30% + 20% - 10% = 40%.

Formally, for any two events A and B from the same sample space, the probability of their union is: P(A ∪ B) = P(A) + P(B) - P(A ∩ B). For disjoint events where A ∩ B = ∅, this simplifies to P(A ∪ B) = P(A) + P(B). This principle extends to multiple events with the inclusion‐exclusion principle.

The Law of Total Probability allows us to calculate the total probability of an event by breaking it down into different scenarios or partitions.

Example: To find the probability of being late to work, consider the probability of lateness in various weather conditions and combine these based on the probabilities of each condition.

Formally, if B₁, B₂, …, Bₙ form a partition of the sample space, then for any event A: P(A) = ∑₍ᵢ₌₁ⁿ₎P(A|Bᵢ)P(Bᵢ).