[ML] Bayesian Methods Overview

Chain Rule for Probability

\[P(X,Y) = P(X|Y)P(Y)\] \[P(X,Y,Z) = P(X|Y,Z)P(Y|Z)P(Z)\] \[P(X_1, ... ,X_N) = \prod_{i=i}^{N} P(X_i|X_1, ... , X_{i-1})\]

Sum Rule

\[p(X) = \int_{\infty}^{-\infty}p(X,Y)dY\]

Bayes Theorem

Given \( \theta \) denotes parameters and \( X \) denotes observations:

\[P(\theta|X) = \frac{P(X,\theta)}{P(X)} = \frac{P(X|\theta)P(\theta)}{P(X)}\]

And we have the following terms:

\( P(X) \) can be calculated by \( P(X) = \sum P(X \vert \theta_i) \times P(\theta_i) \)

Components of Bayes' rule

A Bayesian approach always requires a prior probability, which is used without being tested. But instead of confirming or falsifying a hypothesis, a Bayesian will adjust the prior probability based on new evidence.

This theorem is very important because that it allows prior related knowledge of an event update the probability of the specific event. It is a game of degree of belief:

Bayesian Networks

A model is not set in stone, it is a representation of how we believe in how this world works.

A Bayesian network is:

Chain Rule for BNs

The BN represents a joint distribution via the chain rule for Bayesian networks:

\[ P(X_1, …, X_n) = \prod_i^n P(X_i \vert Par_G(X_i)) \]

which we can also say: “P factorizes over G”.

Flow of Probabilistic Influence

  1. A trail \( X_1 - … - X_k \) is active if: it has no V-structure \( X_{i-1} \to X_i \gets X_{i+1} \).

  2. \( X_1 - … - X_k \) is active given \( Z \)if:

    • For any V-structure \( X_{i-1} \to X_i \gets X_{i+1} \), we have that \( X_i \) or one of its descendants \( \in Z \).
    • No other \( X_i \) is in \( Z \).

If one random variable in the trail is observed, we say that “it blocks the trail”.

D-separation

Definition: \( X \) and \( Y \) are d-separated in \( G \) given \( Z \) if there is no active trail in \( G \) between \( X \) and \( Y \) given \( Z \), denoted as follows:

\[ \text{d-sep} _ G(X, Y \vert Z) \]

Any node is d-separated from its non-descendants given its parents.

I-maps

\[ I(G) = { (X \perp Y \vert Z) : \text{d-sep} _ G(X, Y \vert Z) } \]

If P satisfies I(G), we say that G is an I-map (Independency map) of P.

2 Theorems

  1. If P factorizes over G, and \( \text{d-sep} _ G(X, Y \vert Z) \), then \( P \models (X \perp Y \vert Z) \).

  2. If G is an I-map for P, then P factorizes over G.

Factorization & I-Map Summary

There are 2 equivalent ways of viewing the graph:

  1. Factorization: G allows P to be represented.
  2. I-map: Independencies encoded by G hold in P.
· 机器学习