Less-Naive Bayesian

DH 2015-01-30

In my latest blog posting, I discuss making the PP Bayesian learner less naive by pairing up fingerprint features. The posting was inspired by an experiment I did a couple of years ago and recently re-discovered. Since I haven't had time to explore it further, I figured I'd compute a few results on some new data and then just post the protocol for anyone who might find it useful.

The question is: by pairing up fingerprint features to account for pairwise interaction effects, can we get a better model than with a conventional fingerprint that only considers the features in isolation? The answer appears to be "Yes." But while the improvement is consistent, it is small. So it might not be worth the effort in any given case.

To test the approach, I used the same Ames mutagenicity data set that I used for some previous model comparison calculations. I created a Pipeline Pilot component to generate the fingerprints-with-interaction-terms as descibed in the blog posting. The component appends an "X" to the base fingerprint name to distinguish it from the original fingerprint. (Note that this component is designed so that if you save it in your PP component database, the properties FCFP_2X, ECFP_2X, etc. become automatically calculable.) Finally, I used fingerprints with and without interactions to build and cross-validate a set of Bayesian models.

In each case, the cross-validated ROC score for the fingerprint-with-interactions is slightly better than for the original fingerprint. Here are the results. (The standard error for the ROC scores is about 0.005.)

Fingerprint ROC(original) ROC(interactions) 
 FCFP_2          0.747           0.773 
 ECFP_2          0.795           0.819 
 FCFP_4          0.809           0.826 
 ECFP_4          0.824           0.836 
 FCFP_6          0.825           0.835 
 ECFP_6          0.828           0.833

The difference in the ROC scores is significant at the p < 0.01 level, but the mean improvement is only 0.016. An MAO inhibitor data set shows similar results.

To sum up: The no-interactions assumption of the naive Bayesian classifier holds up surprisingly well. Inclusion of pairwise interactions does improve the model, but not by much.

See the attached protocol if you want to try the calculations yourself. (To do all the calculations takes about 2 hours on a reasonably fast server, so you may want to reduce the list of fingerprints in the "Delimited Text Text Reader" component. You will also need to download the Ames data and change the Source and Destination parameters of readers and writers.)