Nirav Pandey
All writing

4 April 2026

Mixed Naïve Bayes Classifier

What I have learnt about the Naïve Bayes Classifier

Machine Learning

There are two types of data, continuous and categorical.

  • For continuous, estimate the mean and variance for each continuous feature, i.e., μcj\mu_{cj} and σcj2\sigma^2_{cj}

The conditional probabilities are given by

P(xjc)=12πσcjexp{(xjμcj)22σcj2}P(x_j | c) = \frac{1}{\sqrt{2\pi}\sigma_{cj}}\exp\left\{-\frac{(x_j-\mu_{cj})^2}{2\sigma^2_{cj}}\right\}

For categorical features

Compute the following

  • count(c,j,v)\text{count}(c,j,v) which is the number of instances of category vv from feature jj (Pilot, from Occupation) with of class cc, i.e., income >=50.
  • We apply a small smoothing parameter α\alpha
  • NcN_c is the number of instances of class cc
  • KjK_j is the number of distinct values of feature jj (Number of occupations in occupation column)

Then, we can compute asjdnajs

P(xj=vc)=(count(c,j,v))Nc+α×KjP(x_j=v | c) = \frac{(\text{count}(c,j,v))}{N_c + \alpha \times K_j}

The prior probability for each class is P(c)=Nc/NP(c) = N_c/N

To classify an instance, we compute the posterior for each class

P(cx)P(c)×jP(xjc)P(c|x)\propto P(c) \times \prod_j P (x_j|c)

and then, we select the class with the highest posterior