dc.contributor.author
Linnert, Barry
dc.contributor.author
De Rose, Cesar Augusto F.
dc.contributor.author
Heiss, Hans-Ulrich
dc.date.accessioned
2025-01-06T10:08:45Z
dc.date.available
2025-01-06T10:08:45Z
dc.identifier.uri
https://refubium.fu-berlin.de/handle/fub188/45468
dc.identifier.uri
http://dx.doi.org/10.17169/refubium-45180
dc.description.abstract
As high-performance computing (HPC) becomes a tool used in many different workflows, quality of service (QoS) becomes increasingly important. In many cases, this includes the reliable execution of an HPC job and the generation of the results by a certain deadline. The resource and job management system (RJMS) or simply RMS is responsible for receiving the job requests and executing the jobs with a deadline-oriented policy to support the workflows. In this article, we evaluate how well static resource management policies cope with deadline-constrained HPC jobs and explore two variations of a dynamic policy in this context. As the Hilbert curve-based approach used by the SLURM workload manager represents the state-of-the-art in production environments, it was selected as one of the static allocation strategies. The Manhattan median approach as a second allocation strategy was introduced as a research work that aims to minimize the communication overhead of the parallel programs by providing compact partitions more than the Hilbert curve approach. In contrast to the static partitions provided by the Hilbert curve approach and the Manhattan median approach, the leak approach focuses on supporting dynamic runtime behavior of the jobs and assigning nodes of the HPC system on demand at runtime. Since the contiguous leak version also relies on a compact set of nodes, the noncontiguous leak can provide additional nodes at a greater distance from the nodes already used by the job. Our preliminary results clearly show that a dynamic policy is needed to meet the requirements of a modern deadline-oriented RMS scenario.
en
dc.format.extent
16 Seiten
dc.rights.uri
https://creativecommons.org/licenses/by/4.0/
dc.subject
application runtime behavior
en
dc.subject
deadline-oriented policies
en
dc.subject
dynamic resource allocation
en
dc.subject
high-performance computing (HPC)
en
dc.subject
quality of service (QoS)
en
dc.subject
resource and job management system (RJMS)
en
dc.subject.ddc
000 Informatik, Informationswissenschaft, allgemeine Werke::000 Informatik, Wissen, Systeme::004 Datenverarbeitung; Informatik
dc.title
Toward a Dynamic Allocation Strategy for Deadline-Oriented Resource and Job Management in HPC Systems
dc.type
Wissenschaftlicher Artikel
dcterms.bibliographicCitation.articlenumber
e8310
dcterms.bibliographicCitation.doi
10.1002/cpe.8310
dcterms.bibliographicCitation.journaltitle
Concurrency and Computation: Practice and Experience
dcterms.bibliographicCitation.number
1
dcterms.bibliographicCitation.volume
37
dcterms.bibliographicCitation.url
https://doi.org/10.1002/cpe.8310
refubium.affiliation
Mathematik und Informatik
refubium.affiliation.other
Institut für Informatik
refubium.funding
DEAL Wiley
refubium.note.author
Die Publikation wurde aus Open Access Publikationsgeldern der Freien Universität Berlin gefördert.
refubium.resourceType.isindependentpub
no
dcterms.accessRights.openaire
open access
dcterms.isPartOf.eissn
1532-0634