Mixed Naïve Bayes Classifier

There are two types of data, continuous and categorical.

For continuous, estimate the mean and variance for each continuous feature, i.e., $\mu_{cj}$ and $\sigma^2_{cj}$

The conditional probabilities are given by

P(x_j | c) = \frac{1}{\sqrt{2\pi}\sigma_{cj}}\exp\left\{-\frac{(x_j-\mu_{cj})^2}{2\sigma^2_{cj}}\right\}

For categorical features

Compute the following

$\text{count}(c,j,v)$ which is the number of instances of category $v$ from feature $j$ (Pilot, from Occupation) with of class $c$ , i.e., income >=50.
We apply a small smoothing parameter $\alpha$
$N_c$ is the number of instances of class $c$
$K_j$ is the number of distinct values of feature $j$ (Number of occupations in occupation column)

Then, we can compute asjdnajs

P(x_j=v | c) = \frac{(\text{count}(c,j,v))}{N_c + \alpha \times K_j}

The prior probability for each class is $P(c) = N_c/N$

To classify an instance, we compute the posterior for each class

P(c|x)\propto P(c) \times \prod_j P (x_j|c)

and then, we select the class with the highest posterior