[ML] Naive Bayes

Difference between “Naive Bayes” and “Bayesian Networks”

According to Omkar Lad:

Naive Bayes assumes that all the features are conditionally independent of each other. This therefore permits us to use the Bayesian rule for probability. Usually this independence assumption works well for most cases, if even in actuality they are not really independent.
Bayesian network does not have such assumptions. All the dependence in Bayesian Network has to be modeled. The Bayesian network (graph) formed can be learned by the machine itself, or can be designed in prior, by the developer, if he has sufficient knowledge of the dependencies.

That is to say, a “Naive” Bayes is actually a special case of Bayesian Networks.

Bayesian Networks for Naive Bayes' rule

Details

As introduced in the blog titled Bayesian Methods Overview, we have the following equations as the heart of the method:

\[P(\theta|X) = \frac{P(X,\theta)}{P(X)} = \frac{P(X|\theta)P(\theta)}{P(X)}\]

But given the training data, how to compute \( P(X \vert \theta) \) is not clearly specified. Actually, we have two major ways to do this depending on the type of the data we have:

  1. Discrete feature: we can simply compute the likelihood using the table.

  2. Continuous feature:

For continuous feature, we compute its mean and standard deviation to come up with the PDF we assume.

· 机器学习