KI-Glossar
Das vollständige Wörterbuch der Künstlichen Intelligenz
Conscious Artifact Quantification (AQ)
Advanced quantization method that identifies and preserves the layers or neurons most sensitive to precision reduction, thus minimizing model performance degradation while optimizing its size and speed.
Neural Network Pruning
Process of systematically removing weights, neurons, or entire layers deemed non-essential in a neural network, aiming to reduce its computational complexity and memory footprint for efficient deployment on edge devices.
MobileNetV3 Architecture
Family of convolutional neural network architectures optimized for mobile and embedded applications, using neural architecture search (NAS) and inverted residual blocks to balance accuracy and latency on low-resource hardware.
Deployment with TensorRT
NVIDIA's optimizer and runtime that converts trained AI models into a highly optimized inference engine for NVIDIA GPUs, applying techniques such as layer fusion, mixed-precision inference calibration, and kernel selection to maximize throughput.
OpenVINO Toolkit
Intel's toolkit designed to accelerate the deployment of computer vision and AI models across a wide range of Intel hardware, optimizing models via an intermediate representation (IR) and leveraging specific vector instructions (AVX, VNNI).
Mixed Precision Inference
AI model execution technique where calculations are performed using a combination of floating-point data types, such as FP16 for activations and FP32 for accumulations, to accelerate computations and reduce memory footprint on compatible GPUs.
ONNX Runtime
Cross-platform inference engine that allows running models in the Open Neural Network Exchange (ONNX) format, optimizing operations for the target hardware (CPU, GPU, NPU) and providing a unified API for deploying AI applications on various edge devices.
AI Microcontroller (TinyML)
Machine learning domain aimed at running ultra-lightweight AI models on very low-power microcontrollers (with kilobytes of RAM and megahertz of CPU), requiring extreme optimization techniques like binary quantization and aggressive pruning.
Neural Processing Unit (NPU)
Specialized processing unit (ASIC or accelerator) designed to accelerate neural network operations, such as matrix multiplications and activation functions, with significantly higher energy efficiency than general-purpose CPUs and GPUs for AI workloads.
Layer Fusion
Compilation optimization technique that combines multiple successive layers of a neural network (for example, a convolution followed by batch normalization and an activation function) into a single operation, thereby reducing memory overhead and the number of data passes.
Accelerator Compilation
Process of translating a computational graph of an AI model into a set of instructions executable by a specific hardware accelerator (NPU, TPU, FPGA), mapping the model's operations to the optimized primitives of the target hardware.
Latency Optimization
Set of techniques aimed at minimizing the response time of an embedded vision system, including reducing model complexity, optimizing the processing pipeline, and using dedicated hardware to ensure real-time processing.
On-Chip Memory Management
Strategy for allocating and using the fast, limited SRAM memory available on a processor or accelerator, crucial for minimizing accesses to slower, more energy-consuming DRAM memory, a key factor in edge computing performance.
Neural Architecture Search (NAS) for Edge
Automated process of designing neural network architectures optimized for specific constraints such as latency, energy consumption, and model size, typical of edge devices, to find the best performance-efficiency trade-off.
Real-Time Object Detection on Edge
Computer vision application where highly optimized models like YOLO or SSD are deployed on edge devices to identify and locate objects in a video stream with latency on the order of tens of milliseconds, enabling instant reactions.
Lightweight Semantic Segmentation
Pixel-by-pixel classification task performed by simplified architecture models (e.g., BiSeNet, Fast-SCNN) designed to run on resource-constrained devices, balancing segmentation accuracy with real-time requirements.
Performance Profiling on Edge
Detailed analysis of AI model execution on a target device to identify computational bottlenecks, energy consumption, and resource usage, guiding optimization efforts to achieve performance objectives.