Intelligent Gpu Scheduling and Fairness Mechanisms for Multi-Tenant Ai Workloads in Kubernetes–Openstack Environments

Download PDF

Download e-PUB

Full Text

Scholars Bulletin (SB)

Volume-4 | Issue-12 | 936-944

Review Article

Intelligent Gpu Scheduling and Fairness Mechanisms for Multi-Tenant Ai Workloads in Kubernetes–Openstack Environments

Lova Gautham Pemmadi, Hema Sree Chunduri, Praveen Veeramachaneni

Published : Dec. 30, 2018

DOI : 10.36348/sb.2018.4.12.9

Abstract

The proliferation of artificial intelligence (AI) and deep learning workloads has intensified demand for Graphics Processing Unit (GPU) resources in cloud computing environments. Multi-tenant infrastructures, particularly those leveraging Kubernetes orchestration within OpenStack platforms, face critical challenges in efficiently sharing GPU resources while maintaining fairness across diverse tenants and workloads. This paper investigates intelligent GPU scheduling and fairness mechanisms tailored for multi-tenant AI workloads in Kubernetes–OpenStack environments. Building upon recent advances in container orchestration and GPU virtualization, this study examines the architectural integration of Kubernetes device plugins with OpenStack Nova and Ironic GPU management components. The analysis explores fairness versus performance trade-offs, evaluating how priority-based queuing, workload-aware preemption, and policy-driven scheduling impact training latency, inference throughput, and cost efficiency. Through comprehensive examination of existing GPU sharing techniques, virtualization approaches, and scheduling algorithms, this research identifies critical design considerations for achieving balanced resource allocation. The findings reveal that hybrid scheduling approaches combining time-slicing with spatial partitioning, coupled with adaptive fairness policies, offer superior performance isolation and tenant satisfaction compared to static allocation schemes. Furthermore, the integration of capacity-based resource models with dynamic workload profiling enables fine-grained quality-of-service (QoS) guarantees essential for latency-sensitive inference tasks while maximizing utilization for batch training workloads. This work contributes to the growing body of knowledge on GPU resource management in containerized cloud environments and provides practical insights for deploying fair and efficient multi-tenant AI infrastructures.