Hello,
I am encountering an error for the jobs that I have sent to the cluster. Jobs die unexpectedly.We also get the "MatServer terminated with an unknown error status" error. According to the support desk of our IT, we cannot send the jobs to a queuing system but it uses the headnodes, and this makes the login node unavailable, therefore our job is stopped. Is there a log file or any special commands to use for a queuing system?
They have tried the following configuration until now:
[root@lufer ~]# su - accelrys
-bash-3.2\$ qhost # THE SGE VARIABLES WERE SET
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
compute-0-0 lx26-amd64 16 4.54 47.1G 677.5M 996.2M 0.0
compute-0-1 lx26-amd64 16 0.00 47.1G 995.9M 996.2M 0.0
compute-0-10 lx26-amd64 16 1.35 47.1G 1007.5M 996.2M 0.0
compute-0-11 lx26-amd64 16 1.34 47.1G 1003.6M 996.2M 0.0
compute-0-12 lx26-amd64 16 15.46 47.1G 4.9G 996.2M 0.0
compute-0-13 lx26-amd64 16 4.69 47.1G 2.2G 996.2M 0.0
compute-0-14 lx26-amd64 16 17.22 47.1G 5.1G 996.2M 0.0
compute-0-15 lx26-amd64 16 11.30 47.1G 3.2G 996.2M 0.0
compute-0-2 lx26-amd64 16 14.09 47.1G 1.2G 996.2M 0.0
compute-0-3 lx26-amd64 16 25.92 47.1G 1.1G 996.2M 0.0
compute-0-4 lx26-amd64 16 15.29 47.1G 1.2G 996.2M 0.0
compute-0-5 lx26-amd64 16 13.21 47.1G 1.0G 996.2M 0.0
compute-0-6 lx26-amd64 16 16.55 47.1G 1.2G 996.2M 0.0
compute-0-7 lx26-amd64 16 - 47.1G - 996.2M -
compute-0-8 lx26-amd64 16 7.17 47.1G 949.9M 996.2M 0.0
compute-0-9 lx26-amd64 16 6.61 47.1G 1.0G 996.2M 0.0
compute-1-0 lx26-amd64 16 - 47.2G - 996.2M -
compute-1-1 lx26-amd64 24 6.87 47.2G 2.3G 996.2M 63.5M
compute-1-10 lx26-amd64 16 24.75 47.2G 7.9G 996.2M 16.8M
compute-1-11 lx26-amd64 16 8.10 47.2G 4.3G 996.2M 17.8M
compute-1-2 lx26-amd64 16 3.92 47.2G 3.1G 996.2M 14.7M
compute-1-3 lx26-amd64 16 - 47.2G - 996.2M -
compute-1-5 lx26-amd64 40 8.04 47.1G 12.2G 996.2M 17.1M
compute-1-8 lx26-amd64 40 13.38 47.1G 1.6G 996.2M 16.0K
compute-1-9 lx26-amd64 40 5.05 47.1G 30.6G 996.2M 498.1M
lufer lx26-amd64 16 - 47.1G - 996.2M -
-bash-3.2\$ cd /share/apps/accelrys/MaterialsStudio2017.1/MaterialsStudio17.1/etc/Gateway
-bash-3.2\$ config/configure queue SGE
Could not detect queuingsystem dsd_sge.
Please check your environment!
-bash-3.2\$
Thank you, kind regards.
Cigdem.
I am encountering an error for the jobs that I have sent to the cluster. Jobs die unexpectedly.We also get the "MatServer terminated with an unknown error status" error. According to the support desk of our IT, we cannot send the jobs to a queuing system but it uses the headnodes, and this makes the login node unavailable, therefore our job is stopped. Is there a log file or any special commands to use for a queuing system?
They have tried the following configuration until now:
[root@lufer ~]# su - accelrys
-bash-3.2\$ qhost # THE SGE VARIABLES WERE SET
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
compute-0-0 lx26-amd64 16 4.54 47.1G 677.5M 996.2M 0.0
compute-0-1 lx26-amd64 16 0.00 47.1G 995.9M 996.2M 0.0
compute-0-10 lx26-amd64 16 1.35 47.1G 1007.5M 996.2M 0.0
compute-0-11 lx26-amd64 16 1.34 47.1G 1003.6M 996.2M 0.0
compute-0-12 lx26-amd64 16 15.46 47.1G 4.9G 996.2M 0.0
compute-0-13 lx26-amd64 16 4.69 47.1G 2.2G 996.2M 0.0
compute-0-14 lx26-amd64 16 17.22 47.1G 5.1G 996.2M 0.0
compute-0-15 lx26-amd64 16 11.30 47.1G 3.2G 996.2M 0.0
compute-0-2 lx26-amd64 16 14.09 47.1G 1.2G 996.2M 0.0
compute-0-3 lx26-amd64 16 25.92 47.1G 1.1G 996.2M 0.0
compute-0-4 lx26-amd64 16 15.29 47.1G 1.2G 996.2M 0.0
compute-0-5 lx26-amd64 16 13.21 47.1G 1.0G 996.2M 0.0
compute-0-6 lx26-amd64 16 16.55 47.1G 1.2G 996.2M 0.0
compute-0-7 lx26-amd64 16 - 47.1G - 996.2M -
compute-0-8 lx26-amd64 16 7.17 47.1G 949.9M 996.2M 0.0
compute-0-9 lx26-amd64 16 6.61 47.1G 1.0G 996.2M 0.0
compute-1-0 lx26-amd64 16 - 47.2G - 996.2M -
compute-1-1 lx26-amd64 24 6.87 47.2G 2.3G 996.2M 63.5M
compute-1-10 lx26-amd64 16 24.75 47.2G 7.9G 996.2M 16.8M
compute-1-11 lx26-amd64 16 8.10 47.2G 4.3G 996.2M 17.8M
compute-1-2 lx26-amd64 16 3.92 47.2G 3.1G 996.2M 14.7M
compute-1-3 lx26-amd64 16 - 47.2G - 996.2M -
compute-1-5 lx26-amd64 40 8.04 47.1G 12.2G 996.2M 17.1M
compute-1-8 lx26-amd64 40 13.38 47.1G 1.6G 996.2M 16.0K
compute-1-9 lx26-amd64 40 5.05 47.1G 30.6G 996.2M 498.1M
lufer lx26-amd64 16 - 47.1G - 996.2M -
-bash-3.2\$ cd /share/apps/accelrys/MaterialsStudio2017.1/MaterialsStudio17.1/etc/Gateway
-bash-3.2\$ config/configure queue SGE
Could not detect queuingsystem dsd_sge.
Please check your environment!
-bash-3.2\$
Thank you, kind regards.
Cigdem.