Clarification on using GPU

H2 2021-04-02

I tried to inform myself from the NVIDIA white paper on using GPUs with standard solver. I am a bit confused after that. It would be great if one of you could help me out here.

1) MPI parallel = distributed memory parallel process?

2) While using GPU, MPI parallel is adviced over thread based?

3) DMP split + GPU would give the best acceleration? I have access to a 24 CPU Single Socket + Tesla V100S 32GB GPU system. Doe it make sense to use a DMP Split here? Is it even possible? I thought I would need at least 2 Socket system.

I don't find lot of information pertaining to GPGPU in the manual, would there be any other recent document with consolidated information?

Also a specific hardware question, has anyone noticed similar (to Tesla V100) performance boost with Nvidia - A100 40GB GPU?