Bayesian categorization model based on document text

YX 2015-02-04

hi...

currently, i am trying to figure out how to accurately predict the relevancy of an unknown document using the "Bayesian categorization model based on document text" component.

i have created a test protocol to try to understand how the Bayesian categorization model works and how accurate it is.

i have attached the test protocol (Forum-Test_Title.xml) here.

in this protocol, the good list comprises of entries related to "electric vehicle" while the bad list are entries related to "HIV" (the list is "patent list.xml").

a model is created based on analysing the titles of the list.

i was wondering, after creating the model and re-appplying it in the source list, why do i get entries of the good list marked bad and entries of the bad list marked good?

thanks.