Thuật ngữ AI
Từ điển đầy đủ về Trí tuệ nhân tạo
GPU Passthrough
Technique allowing a virtual machine to directly and exclusively access physical GPU hardware without an intermediate virtualization layer. This approach offers native performance but limits GPU sharing between multiple VMs.
Virtual GPU (vGPU)
Virtualization technology that divides a physical GPU into multiple virtual instances shared between different virtual machines or containers. Each vGPU functions as an independent GPU with its own allocated resources.
Multi-Instance GPU (MIG)
NVIDIA architecture allowing partitioning of an Ampere GPU into multiple isolated instances with dedicated resources (compute, memory, cache). MIG ensures strict isolation between instances to guarantee quality of service.
Time-Sliced Sharing
GPU sharing method where multiple users alternate access to the GPU through time slices. This approach maximizes utilization but may introduce variable latency depending on the load.
CUDA Virtualization
Specific virtualization of the CUDA API allowing GPU applications to run in virtualized environments with optimized performance. Includes intercepting and routing CUDA calls to appropriate GPU resources.
API Forwarding
Mechanism that intercepts graphics or compute API calls from VMs and redirects them to the host physical GPU. Enables compatibility with existing applications without code modification.
Profile-based Allocation
GPU allocation strategy based on predefined resource profiles (memory, compute, bandwidth). Allows precise adaptation of GPU resources to the specific needs of different workloads.
GPU Partitioning
Process of logical or physical division of GPU resources into smaller segments assignable to different applications or VMs. Includes partitioning of memory, compute units, and memory controllers.
Mediated Passthrough
Hybrid between direct passthrough and full virtualization, offering near-native GPU access with minimal mediation layer. Combines optimal performance with better resource management and isolation.
GPU Scheduler
Component that manages scheduling and allocation of GPU resources between multiple concurrent requests. Optimizes GPU usage while respecting priorities and quality of service constraints.
Direct GPU Access
Architecture allowing virtualized applications to directly access GPU resources without going through software emulation layers. Reduces latency and maximizes computational performance.
Virtual GPU Manager
Centralized administration software that manages the lifecycle of vGPU instances, their allocation and monitoring. Coordinates available GPU resources according to policies defined by the administrator.
GPU Memory Virtualization
Technique for abstracting physical GPU memory allowing multiple VMs to share VRAM while maintaining the illusion of dedicated memory. Includes paging, dynamic allocation and memory isolation.
SR-IOV for GPUs
Adaptation of the Single Root I/O Virtualization standard for GPUs, enabling creation of virtual functions (VFs) with direct hardware access paths. Offers isolation and near-bare metal performance.
GPU Containerization
Integration of GPU resources into lightweight containers with driver and CUDA library isolation. Enables rapid deployment of GPU applications with minimal overhead compared to VMs.
Remote GPU Virtualization
Architecture allowing access to remote GPU resources over the network as if they were local. Uses optimized protocols to minimize latency and preserve computational performance.
Dynamic GPU Allocation
Ability to dynamically allocate and deallocate GPU resources according to the immediate needs of applications. Optimizes GPU usage by adjusting resource quotas in real-time.
GPU Pooling
Aggregation of multiple physical GPUs into a unified resource pool that can be distributed on demand. Enables load balancing and elasticity of GPU computational resources at the datacenter scale.