Thuật ngữ AI
Từ điển đầy đủ về Trí tuệ nhân tạo
TVM (Tensor Virtual Machine)
An open-source compilation framework designed to optimize and execute tensors across various hardware architectures, lowering the abstraction level of deep learning models.
Just-In-Time (JIT) Compilation
A compilation technique that translates bytecode or intermediate code into native machine code at runtime, enabling optimizations based on the actual system state.
Ahead-of-Time (AOT) Compilation
The process of compiling source code into native machine code before execution, reducing startup latency and enabling aggressive optimizations independent of the runtime environment.
Graph IR (Intermediate Representation)
An abstract representation of an AI model's computation graph, used by compilers to analyze dependencies and apply optimization transformations before code generation.
Operator Fusion
An optimization technique that combines multiple elementary operations from the computation graph into a single computation kernel, reducing memory overhead and improving data locality.
Auto-scheduling
An automated process of searching for the best execution configuration (tiling, vectorization, parallelization) for a computation kernel on a given target hardware architecture.
Target-specific Optimization
A set of compilation techniques that adapt the generated code to the unique characteristics of a hardware architecture (CPU, GPU, TPU, ASIC) to maximize performance.
Relay IR
A high-level functional intermediate representation in TVM, supporting computation graphs with control flow and enabling complex semantic optimizations.
Tensor Expression (TE)
Domain-specific language in TVM for describing tensor computations at a high level of abstraction, facilitating automatic generation of optimized code for various targets.
Kernel Auto-tuning
Process of systematically exploring the optimization parameter space of a computational kernel to identify the configuration offering the best performance on specific hardware.
HLO (High-Level Optimizer) IR
Intermediate representation used by XLA, describing computations as high-level tensor operations, optimized before code generation for accelerators.
Codegen (Code Generation)
Final phase of compilation where the optimized intermediate representation is translated into executable machine code for the specific target architecture.
Polyhedral Model
Mathematical model used to represent and transform nested loops, enabling complex optimizations like tiling and automatic parallelization.
LLVM (Low Level Virtual Machine)
Modular compilation infrastructure used by many AI compilers to generate optimized machine code for different CPU architectures.
Memory Layout Optimization
Technique of reorganizing data in memory to improve spatial and temporal locality, reducing access latencies and increasing computational throughput.
Hardware Abstraction Layer (HAL)
Software interface that hides the specific details of the underlying hardware, allowing compilers to generate portable code while leveraging native optimizations.
Vectorization
Optimization technique that transforms scalar operations into vector operations (SIMD), leveraging the parallel computing units of modern processors.
Tiling
Data partitioning strategy into blocks (tiles) to improve cache reuse and parallelization efficiency in tensor computations.
Graph Rewriting
Systematic transformation of the computation graph by applying rewriting rules to replace subgraphs with more efficient equivalents.