Do you trust the leave-one-out cross-validation in MultiCat models?

Hello,

I had an issue with the leave-one-out cross-validation when building some multi category bayesians models, and did not really find a way to avoid it (the best was to create a collection of models, not handy).

The problem I had was that by looking to the statistics, the number of model with wrong data is huge (by wrong I mean Std Deviation for the score of 0 even with 100 compounds).

As some of you may have seen, Willem published a paper (well it is published ASAP, but already available) Article.

He is building a multi category model with (in the available example) more than 4000 categories/clusters. Then in the paper:
quote:
However, building a model automatically performs a fast leave-one-out cross-validation, and results of that process can be used to create absolute estimators that are comparable: “Enrichment” and “EstPGood”.


I run the protocol and out of 2305 models/categories with more than 10 compounds, only 74 of them have valid stats.

My question is: if the displayed stats are wrong do you still believe in the value of the Enrichment and EstPGood? (and if yes why?).

A side question for Accelrys, why is the behaviour like this?
The protocol available online with Willem article can be used to test it.

Cheers,

Jérémy