How Are Molecular Descriptors Combined with ECFP6 Fingerprints in Bernoulli Naïve Bayes Models in Pipeline Pilot?

  1. Are the molecular descriptors binarized (e.g., into deciles or bins) before being included in the model?
  2. Does Pipeline Pilot use the same BernoulliNB node for both ECFP6 bits and descriptor features?
  3. How are continuous descriptors aligned with the binary fingerprint format expected by the Bernoulli Naïve Bayes model?
  4. Is the model built using one combined feature matrix, or are fingerprints and descriptors treated separately and merged later?

My goal is to better understand how descriptors contribute to the final posterior probability, especially in the context of fragment-based interpretation (e.g., NP scores) and feature importance.

If anyone has insights, examples, or internal documentation explaining this process, I’d greatly appreciate it!