Queues and resources ==================== Each *entropy* user can reserve and use a specific amount of resources defined by two most important cluster elements: **partitions**, called **queues** and **QOS** (quality of service) assigned to each user on account creation. Queues (partitions) ------------------- In the Slurm lingo, a **queue (partition)** is a logical partition of available machines into named sets (each machine can be in more than one partition). Each queue may serve different purposes and each user is assigned to at least one queue called ``common``. Each partition may have defined specific restrictions, for example, to limit maximum number of GPUs available to each user. One can see the defined queues by running the ``sinfo`` command: .. code-block:: shell :linenos: $ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST common* up 14-00:00:0 8 idle asusgpu[1-6],steven,sylvester a6000 up 14-00:00:0 1 idle bruce a100 up 14-00:00:0 1 idle a100a h100 up 14-00:00:0 1 idle h100a The ``NODELIST`` column shows servers assigned to each queue. This is the basic view of the cluster and as one can see, that there are only basic limits imposed on the job length. This is because most limits are defined using **QOS (quality of service)**. Quality of service (QOS) ------------------------ The **QOS** defines sets of limits imposed on each user (it complements partition limits in certain hierarchy). Each user has been assigned at least one **qos**, which defines the user's capabilities regarding available resources. Each QoS can be used in the context of a specific queue (partition). Both **QOS** and **queue** (with two other, but immutable parameters) form **associations**. Associations define the ways a user can use the cluster by showing all combinations of available queues and ``qos`` vales. To display associations available to a user use ``entropy_account_info`` command. .. code-block:: shell :linenos: $ entropy_account_info ______________ < Slurm limits > -------------- \ ,-^-. \ !oYo! \ /./=\.\______ ## )\/\ ||-----w|| || || +---------------+-------------------+ | Partition | Available QoS | +---------------+-------------------+ | common | kmwil_common | +---------------+-------------------+ +------------------+----------+----------------------+----------+----------------------+------------+------------------+ | QoS | GPUs | Used GPU Minutes | CPUs | Used CPU Minutes | Memory | Maximum Wall | +------------------+----------+----------------------+----------+----------------------+------------+------------------+ | kmwil_common | 8 | 0 out of 10000 | -- | 1 out of -- | -- | 1-00:00:00 | +------------------+----------+----------------------+----------+----------------------+------------+------------------+ GPUMinutes ---------- Each user has a number of ``GPUMinutes`` available for use on the cluster. Once this resource is depleted, new jobs won't be accepted. The limit is visible in the ``entropy_account_info`` command output as ``GrpTRESMins``. Double dash ``--`` means that there is currently no limit for a resource.