Simpson's Paradox

Has anyone investigated the various Pipeline Pilot / R stats learners with respect to their ability to deal with

data sets for which Simpson's Paradox plays a significant role?

For example, has anyone tried a fake data set comprised of two separate sets for which trends go

in opposite directions compared to the entire data set.  This can be recast as a "global" model

vs "local" model problem (see attached image, Figure 1, from Levick, S.R. and Rogers, K.H.,

Landscape Ecol., 26 (2011) 515).  Will a decision tree model properly pull these two sets (and

models) apart?  What about other machine learning algorithms?

I would greatly appreciate hearing from those of you who have wrestled with this issue.  What

analysis techniques did you use, etc.

Thank you.

Regards,

Jim Metz

James.Metz@AbVie.com