Hello Community,
i want to run castep on a cluster with two machines. I followed all the instructions of the installation guide but still the gateway machine completely ignores the second machine.
Castep is installed on both machines and both machines have connection to the licence server. Both PCs have 16 CPU cores.
If I start a job and set the amount of processors above 16, the gateway will not run 16 processes on itself and the rest on the other machine, but will run all the processes on itself.
I also watched the second machine, if there is anyone connecting through ssh, but nothing happens as i start a job. The gateway will just start as many processes as I set on itself.
What i did till now:
- got passwordless connection over ssh from one machine to another and to themselves
- edited the "machines.LINUX" file in ../castep/share/data/ and added both machines with 16 processors (do I have to move this file somewhere else?)
- restarted the machine
- restarted the gateway multiple times
The mpicommand, which is used, is the default one: "/opt/hpmpi/bin/mpirun -e MPI_REMSH=/usr/bin/ssh -prot -np". The paths of mpi and ssh are both correct.
Both hosts I defined in the machines.LINUX file are accessible.
The gateway and the second machine (the node) are in an internal network and just the gateway is connected to the external network from which i send the jobs via Materials Studio.
Is the gateway managing the node, or does the node need to have access to the external network?
Please help me!
Best regards, Konrad Zesewitz
