parallel computation by Dmol3

I have a general question about parallel computation using Dmol3 package.

I perform surface relaxation simulations with the help of supercomputer of our university,(laval university, Quebec). My surface has 46 atoms forming 5 layers. I have realized that when I ask one node or four nodes for the calculations, there is no difference in the speed of my calculations. Should the parallelization be efficient by asking more nodes or the problem is somewhere else ( the choice of parameters for calculations). I have always had the same observation with the other surfaces in the past.

Thank you for your kind response.