Parallel Scaling in Forcite Plus

Hello all.

I am running a NVT MD simulation with ForcitePlus on 16 processors (Linux 64bit machines). I am using a custom forcefield (created with the forcefield manager)and group based cutoffs (15A). The system has some 100k atoms and is in cubic PBC with cell length ~200A. I have seen excellent scaling from 1 to 8 CPUs, but when I tried to use 16 ... well, time increased (a little) if compared to the 8 CPUs run. Could anyone give me any tip about this? Is this somehow expected (but a 8 CPU limit for scaling seems a little low to me) or is there possibly some problems with my system?

Thank you for your attention

Giulio