.. _submitting_jobs: Submitting Slurm jobs ===================== Slurm allows for interactive and non-interactive work. Most users will use ``sbatch`` or ``srun`` to submit non-interactive scripts or commands. In each case, specifying a ``queue`` and ``qos`` is mandatory. .. warning:: Without valid ``--partition`` and ``--qos`` specified for each job, Slurm won't accept the job. Please check your associations using ``entropy_account_info``. Time limit ---------- One of the most important parameters is the ``--time`` parameter. Without setting this flag explicitly the system will assign the default value for the specified QoS, but it most probably will be suboptimal. .. note:: Slurm's scheduling algorithm is quite complex. Properly estimating and setting your job's running time will result in a faster access to the resources and better overall system utilization. Time format is presented in the table below. .. csv-table:: :header: "Format", "Example", "Description" , 30, 30 minutes :, 20:20, 20 minutes and 20 seconds
::, 1:30:45, 1 hour 30 minutes and 45 seconds -
, 2-0, 2 days -
, 2-6, 2 days and 6 hours -
::, 1-12:30:00, 1 day 12 hours and 30 minutes Using srun ---------- The ``srun`` command allows for running a single job on the cluster. Each time a valid ``partition`` and ``qos`` needs to be specified. If not redirected, all program output will be printed to the standard output. 1. Running a single command and printing the results to the standard output. .. code-block:: shell :linenos: $ srun --partition=common --qos=1gpu1h --time=10 --gres=gpu:1 nvidia-smi -L GPU 0: TITAN V (UUID: GPU-6426f3d6-4cec-9167-5035-4e9129551d9b) GPU 1: TITAN V (UUID: GPU-bcaaee86-bd21-4735-edc2-d18b5fed40a7) GPU 2: TITAN V (UUID: GPU-109e5f3c-c2e8-3a9d-486a-0df29fb6c905) GPU 3: TITAN V (UUID: GPU-e3d1f883-02b2-1da6-80e1-32efd4ab7453) 2. Running a single command with a specific node selected and printing the results to the standard output. .. code-block:: shell :linenos: $ srun --nodelist arnold --partition=common --qos=1gpu1h --time=20 --gres=gpu:1 nvidia-smi -L GPU 0: TITAN V (UUID: GPU-6426f3d6-4cec-9167-5035-4e9129551d9b) GPU 1: TITAN V (UUID: GPU-bcaaee86-bd21-4735-edc2-d18b5fed40a7) GPU 2: TITAN V (UUID: GPU-109e5f3c-c2e8-3a9d-486a-0df29fb6c905) GPU 3: TITAN V (UUID: GPU-e3d1f883-02b2-1da6-80e1-32efd4ab7453) 3. Running a single command with a specific node and card selected and saving the output to a file. .. code-block:: shell :linenos: $ srun --nodelist=arnold --partition=common --qos=1gpu1h --output=username_out.txt --time=1:00 --gres=gpu:titanv:1 nvidia-smi -L $ $ cat /results/username_out.txt GPU 0: TITAN V (UUID: GPU-6426f3d6-4cec-9167-5035-4e9129551d9b) GPU 1: TITAN V (UUID: GPU-bcaaee86-bd21-4735-edc2-d18b5fed40a7) GPU 2: TITAN V (UUID: GPU-109e5f3c-c2e8-3a9d-486a-0df29fb6c905) GPU 3: TITAN V (UUID: GPU-e3d1f883-02b2-1da6-80e1-32efd4ab7453) Default Time, CPU and Memory values ----------------------------------- Each partition has predefined memory, CPU and time values for a submitted job. These are set to fill nodes optimally -- please **do not** change them without a reason. To fully allocate memory and time within the assigned **qos**, please use the flags specified in the table below. .. csv-table:: :header: "Parameter", "Flag" "Memory (RAM)", "``--mem``" "Time", "``--time``" .. warning:: The ``--mem`` flag should be used with caution! The default values for ``DefCpuPerGPU`` and ``DefMemPerCPU`` will allocate optimal number of resources and it is thus recommmended not to tinker with the ``--mem`` flag without a very specific reason. For example: .. code-block:: shell :linenos: $ srun --partition=common --qos=16gpu14d --output=username_out.txt --time=1-0 --gres=gpu:titanv:1 a_command_to_run Using sbatch ------------ Using ``sbatch`` involves writing a script with all needed details for job submission. Passing all required parameters is similar to the ``#DEFINE`` stanzas known in the C language. Slurm uses ``#BATCH``. In the batch mode, defining the ``--output`` file is mandatory. 1. Running a single command. .. code-block:: shell :linenos: $ cat job.sh #!/bin/bash # #SBATCH --job-name=test_job_username #SBATCH --partition=common #SBATCH --qos=1gpu1d #SBATCH --gres=gpu:1 #SBATCH --time=1-0 #SBATCH --output=test_job.txt nvidia-smi -L $ sbatch job.sh 2. Running a single command with a specific node and a GPU selected. .. code-block:: shell :linenos: $ cat job.sh #!/bin/bash # #SBATCH --job-name=test_job_n #SBATCH --partition=research #SBATCH --qos=lecturer #SBATCH --gres=gpu:rtx2080ti:8 #SBATCH --output=test_job_n.txt #SBATCH --time=3-0 #SBATCH --nodelist=asusgpu2 nvidia-smi -L $ sbatch job.sh Environmental variables ----------------------- By default Slurm will copy (and as a consequence overwrite) all environmental variables from the submission node to the compute nodes. Thus, using full paths to binaries or changes to the ``PATH`` variable are required. For example, let us try to run ``nvcc`` without any ``PATH`` or script modifications: .. code-block:: shell :linenos: $ cat job.sh #!/bin/bash # #SBATCH --job-name=test_job_n #SBATCH --partition=common #SBATCH --qos=student #SBATCH --gres=gpu:rtx2080ti:1 #SBATCH --time=30 #SBATCH --output=/results/test_job_n.txt nvcc --version $ sbatch job.sh $ $ cat /results/test_job_n.txt /var/spool/slurm/d/job00124/slurm_script: line 9: nvcc: command not found Specifying the full path would work: .. code-block:: shell :linenos: /usr/local/cuda/bin/nvcc --version We could also add the ``--export`` option to a batch script: .. code-block:: shell :linenos: $ cat job.sh #!/bin/bash # #SBATCH --job-name=test_job_n #SBATCH --partition=common #SBATCH --qos=student #SBATCH --gres=gpu:rtx2080ti:1 #SBATCH --output=/results/test_job_n.txt #SBATCH --time=0-8 #SBATCH --export=ALL,PATH="/usr/local/cuda/bin:${PATH}" nvcc --version $ sbatch job.sh $ $ cat /results/test_job_n.txt nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Sun_Jul_28_19:07:16_PDT_2019 Cuda compilation tools, release 10.1, V10.1.243 Please read the ``--export`` explanation in the manual: https://slurm.schedmd.com/sbatch.html.