Pipeline Pilot / R stats / PCA experts,
I have read in a set of > 10,000 primary and secondary amines and calculated various continuous
physical properties - MW, AlogP, polar surface area, etc (10 descriptors total). I am using the R
stats package to calculate a principal components. If I output the PCA loadings for each
compound, I obtain the following crescent-shaped plot below.
I do not have much experience with PCA calculations, but the overall appearance of this
plot seems odd compared to other PC plots I have viewed in various cheminformatics
publications. I typically observe more of a buckshot pattern in 2D.
Also, I am surprised that PC1 alone is capturing approx 95% of the variability in the
data - that also seems unusual to me.
Can someone please explain what is going on. Are my calculations correct? Or,
am I over or underfitting the data set or ...?
Thank you.
Regards,
Jim Metz